Technology

Many AI model safety assessments have significant limitations

Published

10 months ago

August 5, 2024

IAM

Many safety evaluations for AI models have significant limitations

Despite the growing demand for AI security and accountability, today’s tests and benchmarks will not be enough, a brand new report finds.

Generative AI models—models that may analyze and generate text, images, music, video, and more—are coming under increasing scrutiny for his or her tendency to make mistakes and usually behave unpredictably. Now, organizations from public sector agencies to big tech firms are proposing recent benchmarks to check the safety of those models.

At the tip of last yr, the startup Scale AI created lab dedicated to assessing how well models adhere to security guidelines. This month, NIST and the U.K. AI Safety Institute released tools designed to evaluate model risk.

However, these tests and model testing methods could also be insufficient.

The Ada Lovelace Institute (ALI), a British non-profit organization dedicated to artificial intelligence research, conducted test who interviewed experts from academic, civil society, and vendor modeling labs and examined recent research on AI security assessments. The co-authors found that while current assessments will be useful, they should not comprehensive, will be easily fooled, and don’t necessarily provide guidance on how models will perform in real-world scenarios.

“Whether it’s a smartphone, a prescription drug, or a car, we expect the products we use to be safe and reliable; in these sectors, products are rigorously tested to ensure they’re safe before being deployed,” Elliot Jones, a senior researcher at ALI and co-author of the report, told TechCrunch. “Our research aimed to examine the limitations of current approaches to assessing AI safety, assess how assessments are currently being used, and explore their use as a tool for policymakers and regulators.”

Benchmarks and red teaming

The study’s co-authors first surveyed the tutorial literature to determine an summary of the harms and risks that current models pose and the state of existing assessments of AI models. They then interviewed 16 experts, including 4 employees of unnamed technology firms developing generative AI systems.

The study revealed that there’s wide disagreement across the AI industry on the perfect set of methods and taxonomies for evaluating models.

Some evaluations only tested how well the models matched benchmarks within the lab, not how the models might impact real-world users. Others were based on tests designed for research purposes, not on evaluating production models—yet vendors insisted on using them in production.

We’ve written before concerning the problems with AI benchmarking. This study highlights all of those issues and more.

Experts cited within the study noted that it’s hard to extrapolate a model’s performance from benchmark results, and it’s unclear whether benchmarks may even show that a model has a certain capability. For example, while a model may perform well on a state exam, that doesn’t mean it can have the ability to resolve more open legal challenges.

Experts also pointed to the issue of knowledge contamination, where benchmark results can overstate a model’s performance if it was trained on the identical data it’s being tested on. Benchmarks, in lots of cases, are chosen by organizations not because they’re the perfect assessment tools, but due to their convenience and ease of use, experts said.

“Benchmarks run the risk of being manipulated by developers who may train models on the same dataset that will be used to evaluate the model, which is equivalent to looking at an exam paper before an exam or strategically choosing which assessments to use,” Mahi Hardalupas, a researcher at ALI and co-author of the study, told TechCrunch. “Which version of the model is being evaluated also matters. Small changes can cause unpredictable changes in behavior and can override built-in safety features.”

The ALI study also found problems with “red-teaming,” the practice of getting individuals or groups “attack” a model to discover gaps and flaws. Many firms use red-teaming to judge models, including AI startups OpenAI and Anthropic, but there are few agreed-upon standards for red-teaming, making it difficult to evaluate the effectiveness of a given effort.

Experts told the study’s co-authors that finding individuals with the correct skills and experience to steer red teaming efforts will be difficult, and the manual nature of the method makes it expensive and labor-intensive, a barrier for smaller organizations that don’t have the mandatory resources.

Possible solutions

The foremost the reason why AI rankings have not improved are the pressure to release models faster and the reluctance to run tests that might cause issues before launch.

“The person we spoke to who works for a foundation modeling company felt that there is more pressure within companies to release models quickly, which makes it harder to push back and take assessments seriously,” Jones said. “The major AI labs are releasing models at a speed that outpaces their ability or society’s ability to ensure they are safe and reliable.”

One ALI survey respondent called evaluating models for safety an “intractable” problem. So what hopes does the industry—and those that regulate it—have for solutions?

Mahi Hardalupas, a researcher at ALI, believes there’s a way forward, but it can require greater commitment from public sector entities.

“Regulators and policymakers need to be clear about what they expect from ratings,” he said. “At the same time, the ratings community needs to be transparent about the current limitations and potential of ratings.”

Hardalupas suggests that governments mandate greater public participation in the event of assessments and implement measures to support an “ecosystem” of third-party testing, including programs to offer regular access to any required models and datasets.

Jones believes it could be mandatory to develop “context-aware” assessments that transcend simply testing a model’s response to a command, and as an alternative consider the sorts of users a model might affect (akin to people of a certain background, gender, or ethnicity), in addition to the ways wherein attacks on models could bypass security measures.

“This will require investment in fundamental evaluation science to develop more robust and repeatable evaluations based on an understanding of how the AI model works,” she added.

However, there’s never a guarantee that a model is protected.

“As others have noted, ‘safety’ is not a property of models,” Hardalupas said. “Determining whether a model is ‘safe’ requires understanding the contexts in which it is used, to whom it is sold or shared, and whether the safeguards that are implemented are appropriate and robust to mitigate those risks. Baseline model assessments can serve exploratory purposes to identify potential risks, but they cannot guarantee that the model is safe, much less ‘completely safe.’ Many of our interviewees agreed that assessments cannot prove that a model is safe and can only indicate that the model is unsafe.”

This article was originally published on : techcrunch.com

Related Topics:Ada Lovelace Institute AI Generative AI research studies

Up Next

TikTok Lite: EU closes addictive design case after TikTok pledges not to reinstate rewards mechanism

Don't Miss

From golf to hunting, a new group of startups wants to make the experience even better

Click to comment

Technology

The next large Openai plant will not be worn: Report

Published

19 hours ago

May 22, 2025

IAM

Sam Altman speaks onstage during The New York Times Dealbook Summit 2024.

Opeli pushed generative artificial intelligence into public consciousness. Now it might probably develop a very different variety of AI device.

According to WSJ reportThe general director of Opeli, Altman himself, told employees on Wednesday that one other large product of the corporate would not be worn. Instead, it will be compact, without the screen of the device, fully aware of the user’s environment. Small enough to sit down on the desk or slot in your pocket, Altman described it each as a “third device” next to MacBook Pro and iPhone, in addition to “Comrade AI” integrated with on a regular basis life.

The preview took place after the OpenAI announced that he was purchased by IO, a startup founded last 12 months by the previous Apple Joni Ive designer, in a capital agreement value $ 6.5 billion. I will take a key creative and design role at Openai.

Altman reportedly told employees that the acquisition can ultimately add 1 trillion USD to the corporate conveyorsWearing devices or glasses that got other outfits.

Altman reportedly also emphasized to the staff that the key would be crucial to stop the copying of competitors before starting. As it seems, the recording of his comments leaked to the journal, asking questions on how much he can trust his team and the way rather more he will be able to reveal.

(Tagstotransate) devices

This article was originally published on : techcrunch.com

Technology

The latest model AI Google Gemma can work on phones

Published

2 days ago

May 20, 2025

IAM

It grows “open” AI Google, Gemma, grows.

While Google I/O 2025 On Tuesday, Google removed Gemma 3N compresses, a model designed for “liquid” on phones, laptops and tablets. According to Google, available in a preview starting on Tuesday, Gemma 3N can support sound, text, paintings and flicks.

Models efficient enough to operate in offline mode and without the necessity to calculate within the cloud have gained popularity within the AI community lately. They will not be only cheaper to make use of than large models, but they keep privacy, eliminating the necessity to send data to a distant data center.

During the speech to I/O product manager, Gemma Gus Martins said that GEMMA 3N can work on devices with lower than 2 GB of RAM. “Gemma 3N shares the same architecture as Gemini Nano, and is also designed for incredible performance,” he added.

In addition to Gemma 3N, Google releases Medgemma through the AI developer foundation program. According to Medgemma, it’s essentially the most talented model to research text and health -related images.

“Medgemma (IS) OUR (…) A collection of open models to understand the text and multimodal image (health),” said Martins. “Medgemma works great in various imaging and text applications, thanks to which developers (…) could adapt the models to their own health applications.”

Also on the horizon there may be SignGEMMA, an open model for signaling sign language right into a spoken language. Google claims that Signgemma will allow programmers to create recent applications and integration for users of deaf and hard.

“SIGNGEMMA is a new family of models trained to translate sign language into a spoken text, but preferably in the American sign and English,” said Martins. “This is the most talented model of understanding sign language in history and we are looking forward to you-programmers, deaf and hard communities-to take this base and build with it.”

It is value noting that Gemma has been criticized for non -standard, non -standard license conditions, which in accordance with some developers adopted models with a dangerous proposal. However, this didn’t discourage programmers from downloading Gemma models tens of tens of millions of times.

(Tagstransate) gemma

This article was originally published on : techcrunch.com

Technology

Trump to sign a criminalizing account of porn revenge and clear deep cabinets

Published

3 days ago

May 19, 2025

IAM

President Donald Trump is predicted to sign the act on Take It Down, a bilateral law that introduces more severe punishments for distributing clear images, including deep wardrobes and pornography of revenge.

The Act criminalizes the publication of such photos, regardless of whether or not they are authentic or generated AI. Whoever publishes photos or videos can face penalty, including a advantageous, deprivation of liberty and restitution.

According to the brand new law, media firms and web platforms must remove such materials inside 48 hours of termination of the victim. Platforms must also take steps to remove the duplicate content.

Many states have already banned clear sexual desems and pornography of revenge, but for the primary time federal regulatory authorities will enter to impose restrictions on web firms.

The first lady Melania Trump lobbyed for the law, which was sponsored by the senators Ted Cruz (R-TEXAS) and Amy Klobuchar (d-minn.). Cruz said he inspired him to act after hearing that Snapchat for nearly a 12 months refused to remove a deep displacement of a 14-year-old girl.

Proponents of freedom of speech and a group of digital rights aroused concerns, saying that the law is Too wide And it will probably lead to censorship of legal photos, similar to legal pornography, in addition to government critics.

(Tagstransate) AI

This article was originally published on : techcrunch.com