Connect with us

Technology

Many AI model safety assessments have significant limitations

Published

on

Many safety evaluations for AI models have significant limitations

Despite the growing demand for AI security and accountability, today’s tests and benchmarks will not be enough, a brand new report finds.

Generative AI models—models that may analyze and generate text, images, music, video, and more—are coming under increasing scrutiny for his or her tendency to make mistakes and usually behave unpredictably. Now, organizations from public sector agencies to big tech firms are proposing recent benchmarks to check the safety of those models.

At the tip of last yr, the startup Scale AI created lab dedicated to assessing how well models adhere to security guidelines. This month, NIST and the U.K. AI Safety Institute released tools designed to evaluate model risk.

Advertisement

However, these tests and model testing methods could also be insufficient.

The Ada Lovelace Institute (ALI), a British non-profit organization dedicated to artificial intelligence research, conducted test who interviewed experts from academic, civil society, and vendor modeling labs and examined recent research on AI security assessments. The co-authors found that while current assessments will be useful, they should not comprehensive, will be easily fooled, and don’t necessarily provide guidance on how models will perform in real-world scenarios.

“Whether it’s a smartphone, a prescription drug, or a car, we expect the products we use to be safe and reliable; in these sectors, products are rigorously tested to ensure they’re safe before being deployed,” Elliot Jones, a senior researcher at ALI and co-author of the report, told TechCrunch. “Our research aimed to examine the limitations of current approaches to assessing AI safety, assess how assessments are currently being used, and explore their use as a tool for policymakers and regulators.”

Benchmarks and red teaming

The study’s co-authors first surveyed the tutorial literature to determine an summary of the harms and risks that current models pose and the state of existing assessments of AI models. They then interviewed 16 experts, including 4 employees of unnamed technology firms developing generative AI systems.

Advertisement

The study revealed that there’s wide disagreement across the AI ​​industry on the perfect set of methods and taxonomies for evaluating models.

Some evaluations only tested how well the models matched benchmarks within the lab, not how the models might impact real-world users. Others were based on tests designed for research purposes, not on evaluating production models—yet vendors insisted on using them in production.

We’ve written before concerning the problems with AI benchmarking. This study highlights all of those issues and more.

Experts cited within the study noted that it’s hard to extrapolate a model’s performance from benchmark results, and it’s unclear whether benchmarks may even show that a model has a certain capability. For example, while a model may perform well on a state exam, that doesn’t mean it can have the ability to resolve more open legal challenges.

Advertisement

Experts also pointed to the issue of knowledge contamination, where benchmark results can overstate a model’s performance if it was trained on the identical data it’s being tested on. Benchmarks, in lots of cases, are chosen by organizations not because they’re the perfect assessment tools, but due to their convenience and ease of use, experts said.

“Benchmarks run the risk of being manipulated by developers who may train models on the same dataset that will be used to evaluate the model, which is equivalent to looking at an exam paper before an exam or strategically choosing which assessments to use,” Mahi Hardalupas, a researcher at ALI and co-author of the study, told TechCrunch. “Which version of the model is being evaluated also matters. Small changes can cause unpredictable changes in behavior and can override built-in safety features.”

The ALI study also found problems with “red-teaming,” the practice of getting individuals or groups “attack” a model to discover gaps and flaws. Many firms use red-teaming to judge models, including AI startups OpenAI and Anthropic, but there are few agreed-upon standards for red-teaming, making it difficult to evaluate the effectiveness of a given effort.

Experts told the study’s co-authors that finding individuals with the correct skills and experience to steer red teaming efforts will be difficult, and the manual nature of the method makes it expensive and labor-intensive, a barrier for smaller organizations that don’t have the mandatory resources.

Advertisement

Possible solutions

The foremost the reason why AI rankings have not improved are the pressure to release models faster and the reluctance to run tests that might cause issues before launch.

“The person we spoke to who works for a foundation modeling company felt that there is more pressure within companies to release models quickly, which makes it harder to push back and take assessments seriously,” Jones said. “The major AI labs are releasing models at a speed that outpaces their ability or society’s ability to ensure they are safe and reliable.”

One ALI survey respondent called evaluating models for safety an “intractable” problem. So what hopes does the industry—and those that regulate it—have for solutions?

Mahi Hardalupas, a researcher at ALI, believes there’s a way forward, but it can require greater commitment from public sector entities.

Advertisement

“Regulators and policymakers need to be clear about what they expect from ratings,” he said. “At the same time, the ratings community needs to be transparent about the current limitations and potential of ratings.”

Hardalupas suggests that governments mandate greater public participation in the event of assessments and implement measures to support an “ecosystem” of third-party testing, including programs to offer regular access to any required models and datasets.

Jones believes it could be mandatory to develop “context-aware” assessments that transcend simply testing a model’s response to a command, and as an alternative consider the sorts of users a model might affect (akin to people of a certain background, gender, or ethnicity), in addition to the ways wherein attacks on models could bypass security measures.

“This will require investment in fundamental evaluation science to develop more robust and repeatable evaluations based on an understanding of how the AI ​​model works,” she added.

Advertisement

However, there’s never a guarantee that a model is protected.

“As others have noted, ‘safety’ is not a property of models,” Hardalupas said. “Determining whether a model is ‘safe’ requires understanding the contexts in which it is used, to whom it is sold or shared, and whether the safeguards that are implemented are appropriate and robust to mitigate those risks. Baseline model assessments can serve exploratory purposes to identify potential risks, but they cannot guarantee that the model is safe, much less ‘completely safe.’ Many of our interviewees agreed that assessments cannot prove that a model is safe and can only indicate that the model is unsafe.”

This article was originally published on : techcrunch.com
Advertisement

Technology

Anysphere, which makes the cursor supposedly collect USD 900 million with a valuation of USD 9 billion

Published

on

By

AI robot face and programming code on a black background.

Anysphere, producer of coding cursor with AI drive, attracted $ 900 million in the recent financing round by Thrive Capital, Financial Times He informed, citing anonymous sources familiar with the contract.

The report said that Andreessen Horowitz (A16Z) and ACCEL also participate in the round, which values ​​about $ 9 billion.

The cursor collected $ 105 million from Thrive, and A16Z with a valuation of $ 2.5 billion, as TechCrunch said in December. Capital Thrive also led this round and in addition participated in A16Z. According to Crunchbase data, the startup has collected over $ 173 million thus far.

Advertisement

It is alleged that investors, including index ventures and a reference point, attempt to support the company, but plainly existing investors don’t want to miss the opportunity to support it.

Other coding start-ups powered by artificial intelligence also attract the interest of investors. Techcrunch announced in February that Windsurf, a rival for Aklesphere, talked about collecting funds at a valuation of $ 3 billion. Openai, an investor in Anysphere, was supposedly I’m attempting to get windsurf for about the same value.

(Tagstransate) A16Z

(*9*)This article was originally published on : techcrunch.com

Continue Reading

Technology

This is the shipping of products from China to the USA

Published

on

By

Shein and Temu icons are seen displayed on a phone screen in this illustration photo

The Chinese retailer has modified the strategy in the face of American tariffs.

Thanks to the executive ordinance, President Donald Trump ended the so -called de minimis principle, which allowed goods value 800 USD or less entering the country without tariffs. It also increases tariffs to Chinese goods by over 100%, forcing each Chinese firms and Shein, in addition to American giants, similar to Amazon to adapt plans and price increases.

CNBC reports that this was also affected, and American buyers see “import fees” from 130% to 150% added to their accounts. Now, nevertheless, the company is not sending the goods directly from China to the United States. Instead, it only displays the offers of products available in American warehouses, while goods sent from China are listed as outside the warehouse.

Advertisement

“He actively recruits American sellers to join the platform,” said the spokesman ago. “The transfer is to help local sellers reach more customers and develop their companies.”

(tagstotransate) tariffs

This article was originally published on : techcrunch.com
Continue Reading

Technology

One of the last AI Google models is worse in terms of safety

Published

on

By

The Google Gemini generative AI logo on a smartphone.

The recently released Google AI model is worse in some security tests than its predecessor, in line with the company’s internal comparative test.

IN Technical report Google, published this week, reveals that his Flash Gemini 2.5 model is more likely that he generates a text that violates its security guidelines than Gemini 2.0 Flash. In two indicators “text security for text” and “image security to the text”, Flash Gemini 2.5 will withdraw 4.1% and 9.6% respectively.

Text safety for the text measures how often the model violates Google guidelines, making an allowance for the prompt, while image security to the text assesses how close the model adheres to those boundaries after displaying the monitors using the image. Both tests are automated, not supervised by man.

Advertisement

In an e-mail, Google spokesman confirmed that Gemini 2.5 Flash “performs worse in terms of text safety for text and image.”

These surprising comparative results appear when AI is passing in order that their models are more acceptable – in other words, less often refuse to answer controversial or sensitive. In the case of the latest Llam Meta models, he said that he fought models in order to not support “some views on others” and answers to more “debated” political hints. Opeli said at the starting of this yr that he would improve future models, in order to not adopt an editorial attitude and offers many prospects on controversial topics.

Sometimes these efforts were refundable. TechCrunch announced on Monday that the default CHATGPT OPENAI power supply model allowed juvenile to generate erotic conversations. Opeli blamed his behavior for a “mistake”.

According to Google Technical Report, Gemini 2.5 Flash, which is still in view, follows instructions more faithfully than Gemini 2.0 Flash, including instructions exceeding problematic lines. The company claims that regression might be partially attributed to false positives, but in addition admits that Gemini 2.5 Flash sometimes generates “content of violation” when it is clearly asked.

Advertisement

TechCrunch event

Berkeley, California
|.
June 5

Book now

Advertisement

“Of course, there is a tension between (after instructions) on sensitive topics and violations of security policy, which is reflected in our assessment,” we read in the report.

The results from Meepmap, reference, which can examine how models react to sensitive and controversial hints, also suggest that Flash Gemini 2.5 is much less willing to refuse to reply controversial questions than Flash Gemini 2.0. Testing the TechCrunch model through the AI ​​OpenRoutter platform has shown that he unsuccessfully writes essays to support human artificial intelligence judges, weakening the protection of due protection in the US and the implementation of universal government supervisory programs.

Thomas Woodside, co -founder of the Secure AI Project, said that the limited details given by Google in their technical report show the need for greater transparency in testing models.

“There is a compromise between the instruction support and the observation of politics, because some users may ask for content that would violate the rules,” said Woodside Techcrunch. “In this case, the latest Flash model Google warns the instructions more, while breaking more. Google does not present many details about specific cases in which the rules have been violated, although they claim that they are not serious. Not knowing more, independent analysts are difficult to know if there is a problem.”

Advertisement

Google was already under fire for his models of security reporting practices.

The company took weeks to publish a technical report for the most talented model, Gemini 2.5 Pro. When the report was finally published, it initially omitted the key details of the security tests.

On Monday, Google published a more detailed report with additional security information.

(Tagstotransate) Gemini

Advertisement
This article was originally published on : techcrunch.com
Continue Reading
Advertisement

OUR NEWSLETTER

Subscribe Us To Receive Our Latest News Directly In Your Inbox!

We don’t spam! Read our privacy policy for more info.

Trending