Connect with us

Technology

Many AI model safety assessments have significant limitations

Published

on

Many safety evaluations for AI models have significant limitations

Despite the growing demand for AI security and accountability, today’s tests and benchmarks will not be enough, a brand new report finds.

Generative AI models—models that may analyze and generate text, images, music, video, and more—are coming under increasing scrutiny for his or her tendency to make mistakes and usually behave unpredictably. Now, organizations from public sector agencies to big tech firms are proposing recent benchmarks to check the safety of those models.

At the tip of last yr, the startup Scale AI created lab dedicated to assessing how well models adhere to security guidelines. This month, NIST and the U.K. AI Safety Institute released tools designed to evaluate model risk.

However, these tests and model testing methods could also be insufficient.

The Ada Lovelace Institute (ALI), a British non-profit organization dedicated to artificial intelligence research, conducted test who interviewed experts from academic, civil society, and vendor modeling labs and examined recent research on AI security assessments. The co-authors found that while current assessments will be useful, they should not comprehensive, will be easily fooled, and don’t necessarily provide guidance on how models will perform in real-world scenarios.

“Whether it’s a smartphone, a prescription drug, or a car, we expect the products we use to be safe and reliable; in these sectors, products are rigorously tested to ensure they’re safe before being deployed,” Elliot Jones, a senior researcher at ALI and co-author of the report, told TechCrunch. “Our research aimed to examine the limitations of current approaches to assessing AI safety, assess how assessments are currently being used, and explore their use as a tool for policymakers and regulators.”

Benchmarks and red teaming

The study’s co-authors first surveyed the tutorial literature to determine an summary of the harms and risks that current models pose and the state of existing assessments of AI models. They then interviewed 16 experts, including 4 employees of unnamed technology firms developing generative AI systems.

The study revealed that there’s wide disagreement across the AI ​​industry on the perfect set of methods and taxonomies for evaluating models.

Some evaluations only tested how well the models matched benchmarks within the lab, not how the models might impact real-world users. Others were based on tests designed for research purposes, not on evaluating production models—yet vendors insisted on using them in production.

We’ve written before concerning the problems with AI benchmarking. This study highlights all of those issues and more.

Experts cited within the study noted that it’s hard to extrapolate a model’s performance from benchmark results, and it’s unclear whether benchmarks may even show that a model has a certain capability. For example, while a model may perform well on a state exam, that doesn’t mean it can have the ability to resolve more open legal challenges.

Experts also pointed to the issue of knowledge contamination, where benchmark results can overstate a model’s performance if it was trained on the identical data it’s being tested on. Benchmarks, in lots of cases, are chosen by organizations not because they’re the perfect assessment tools, but due to their convenience and ease of use, experts said.

“Benchmarks run the risk of being manipulated by developers who may train models on the same dataset that will be used to evaluate the model, which is equivalent to looking at an exam paper before an exam or strategically choosing which assessments to use,” Mahi Hardalupas, a researcher at ALI and co-author of the study, told TechCrunch. “Which version of the model is being evaluated also matters. Small changes can cause unpredictable changes in behavior and can override built-in safety features.”

The ALI study also found problems with “red-teaming,” the practice of getting individuals or groups “attack” a model to discover gaps and flaws. Many firms use red-teaming to judge models, including AI startups OpenAI and Anthropic, but there are few agreed-upon standards for red-teaming, making it difficult to evaluate the effectiveness of a given effort.

Experts told the study’s co-authors that finding individuals with the correct skills and experience to steer red teaming efforts will be difficult, and the manual nature of the method makes it expensive and labor-intensive, a barrier for smaller organizations that don’t have the mandatory resources.

Possible solutions

The foremost the reason why AI rankings have not improved are the pressure to release models faster and the reluctance to run tests that might cause issues before launch.

“The person we spoke to who works for a foundation modeling company felt that there is more pressure within companies to release models quickly, which makes it harder to push back and take assessments seriously,” Jones said. “The major AI labs are releasing models at a speed that outpaces their ability or society’s ability to ensure they are safe and reliable.”

One ALI survey respondent called evaluating models for safety an “intractable” problem. So what hopes does the industry—and those that regulate it—have for solutions?

Mahi Hardalupas, a researcher at ALI, believes there’s a way forward, but it can require greater commitment from public sector entities.

“Regulators and policymakers need to be clear about what they expect from ratings,” he said. “At the same time, the ratings community needs to be transparent about the current limitations and potential of ratings.”

Hardalupas suggests that governments mandate greater public participation in the event of assessments and implement measures to support an “ecosystem” of third-party testing, including programs to offer regular access to any required models and datasets.

Jones believes it could be mandatory to develop “context-aware” assessments that transcend simply testing a model’s response to a command, and as an alternative consider the sorts of users a model might affect (akin to people of a certain background, gender, or ethnicity), in addition to the ways wherein attacks on models could bypass security measures.

“This will require investment in fundamental evaluation science to develop more robust and repeatable evaluations based on an understanding of how the AI ​​model works,” she added.

However, there’s never a guarantee that a model is protected.

“As others have noted, ‘safety’ is not a property of models,” Hardalupas said. “Determining whether a model is ‘safe’ requires understanding the contexts in which it is used, to whom it is sold or shared, and whether the safeguards that are implemented are appropriate and robust to mitigate those risks. Baseline model assessments can serve exploratory purposes to identify potential risks, but they cannot guarantee that the model is safe, much less ‘completely safe.’ Many of our interviewees agreed that assessments cannot prove that a model is safe and can only indicate that the model is unsafe.”

This article was originally published on : techcrunch.com
Continue Reading
Advertisement
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Technology

Bluesky addresses trust and security issues related to abuse, spam and more

Published

on

By

Bluesky butterfly logo and Jay Graber

Social media startup Bluesky, which is constructing a decentralized alternative to X (formerly Twitter), provided an update Wednesday on the way it’s approaching various trust and security issues on its platform. The company is in various stages of developing and piloting a variety of initiatives focused on coping with bad actors, harassment, spam, fake accounts, video security and more.

To address malicious users or those that harass others, Bluesky says it’s developing recent tools that can have the option to detect when multiple recent accounts are created and managed by the identical person. This could help curb harassment when a foul actor creates several different personas to attack their victims.

Another recent experiment will help detect “rude” replies and forward them to server moderators. Like Mastodon, Bluesky will support a network where self-hosters and other developers can run their very own servers that connect to Bluesky’s server and others on the network. This federation capability is still in early access. But in the long term, server moderators will have the option to resolve how they need to take care of individuals who post rude responses. In the meantime, Bluesky will eventually reduce the visibility of those responses on its app. Repeated rude labels on content will even lead to account-level labels and suspensions, it says.

To curb using lists to harass others, Bluesky will remove individual users from the list in the event that they block the list creator. Similar functionality was recently introduced to Starter Packs, a sort of shared list that will help recent users find people to follow on the platform (check TechCrunch Starter Pack).

Bluesky will even scan lists with offensive names or descriptions to limit the potential of harassing others by adding them to a public list with a toxic or offensive name or description. Those who violate Bluesky’s Community Guidelines might be hidden from the app until the list owner makes changes that align with Bluesky’s policies. Users who proceed to create offensive lists will even face further motion, though the corporate didn’t provide details, adding that the lists are still an area of ​​energetic discussion and development.

In the approaching months, Bluesky also intends to move to handling moderation reports through its app, using notifications relatively than counting on email reports.

To combat spam and other fake accounts, Bluesky is launching a pilot that can attempt to routinely detect when an account is fake, scamming or sending spam to users. Combined with moderation, the goal is to have the option to take motion on accounts inside “seconds of receiving a report,” the corporate said.

One of the more interesting developments is how Bluesky will comply with local laws while still allowing free speech. It will use geotags that allow it to hide some content from users in a particular area to comply with the law.

“This allows Bluesky’s moderation service to maintain flexibility in creating spaces for free expression while also ensuring legal compliance so that Bluesky can continue to operate as a service in these geographic regions,” the corporate shared in a blog post. “This feature will be rolled out on a country-by-country basis, and we will endeavor to inform users of the source of legal requests when legally possible.”

To address potential trust and safety issues with videos which have recently been added, the team is adding features like the flexibility to disable autoplay, ensuring videos are labeled, and providing the flexibility to report videos. They are still evaluating what else might need to be added, which might be prioritized based on user feedback.

When it comes to abuse, the corporate says its general framework is “a question of how often something happens versus how harmful it is.” The company focuses on addressing high-impact, high-frequency issues, in addition to “tracking edge cases that could result in significant harm to a few users.” The latter, while only affecting a small number of individuals, causes enough “ongoing harm” that Bluesky will take motion to prevent abuse, it says.

User concerns will be reported via reports, emails and mentions @safety.bsky.app account.

This article was originally published on : techcrunch.com
Continue Reading

Technology

Apple Airpods Now With FDA-Approved Hearing Aid Feature

Published

on

By

The newest AirPods are a part of a growing group of hearing aids available over-the-counter.


Apple’s latest Airpods could help those with hearing impairments. The tech company’s software update has been approved by the FDA to be used as hearing aids.

The FDA approved Apple’s hearing aid feature on September 12. The free update, available on AirPods Pro 2, will amplify sounds for the hearing impaired. However, the feature is simply available to adults 18 and older with an iPhone or iPad compatible with iOS 18.

“Today’s approval of over-the-counter hearing aid software for a commonly used consumer audio product is another step that will increase the availability, affordability, and acceptability of hearing support for adults with mild to moderate hearing loss,” said Dr. Michelle Tarver, acting director of the FDA’s Center for Devices and Radiological Health, in a press release. obtained by .

They confirmed the feature’s use after a clinical trial with 118 participants. The results showed that users “achieved similar perceived benefits to those who received a professional fit on the same device.” Apple also announced the brand new development just days before the agency’s approval.

“Hearing health is an essential part of our overall well-being, yet it is often overlooked — in fact, according to Apple’s Hearing Study, as many as 75 percent of people diagnosed with hearing loss go untreated,” said Sumbul Desai, MD, vice chairman of Health at Apple. press release“We’re excited to deliver breakthrough software features in AirPods Pro that put users’ hearing health first, offering new ways to test and get help for hearing loss.”

What’s more, Apple intends its recent AirPods to supply a “world-first” hearing health experience. Noting that 1.5 billion people suffer from hearing loss, the device also goals to forestall and detect hearing problems.

“Your AirPods Pro will transform into your own personalized hearing aid, amplifying the specific sounds you need in real time, such as parts of speech or elements of your environment,” Desai added in a video announcing the event.

The latest AirPods are a part of a growing variety of over-the-counter (OTC) hearing aids. They usually are not only more accessible, but additionally significantly cheaper than prescription medical devices. While they’re designed for individuals with mild to moderate hearing loss, they’ll initially treat those with limited abilities.

AirPods Pro 2 is available now for $249.


This article was originally published on : www.blackenterprise.com
Continue Reading

Technology

LinkedIn collected user data for training purposes before updating its terms of service

Published

on

By

LinkedIn scraped user data for training before updating its terms of service

LinkedIn could have trained AI models on user data without updating its terms.

LinkedIn users within the United States — but not within the EU, EEA, or Switzerland, likely as a consequence of data privacy laws in those regions — have the choice to opt out toggle on the settings screen, revealing that LinkedIn collects personal data to coach “AI models to create content.” The toggle isn’t recent. But, as in early reported According to 404 Media, LinkedIn didn’t initially update its privacy policy to handle data use.

The Terms of Service have already been published. updatedbut that sometimes happens well before an enormous change, equivalent to using user data for a brand new purpose like this. The idea is that this offers users the choice to make changes to their account or leave the platform in the event that they do not like the changes. It looks like that is not the case this time.

So what models does LinkedIn train? Its own, the corporate’s says in a Q&A session, including models to put in writing suggestions and post recommendations. But LinkedIn also says that generative AI models on its platform could be trained by a “third-party vendor,” equivalent to its corporate parent Microsoft.

“As with most features on LinkedIn, when you use our platform, we collect and use (or process) data about your use of the platform, including personal data,” the Q&A reads. “This may include your use of generative AI (AI models used to create content) or other AI features, your posts and articles, how often you use LinkedIn, your language preferences, and any feedback you may have provided to our teams. We use this data, in accordance with our privacy policy, to improve or develop the LinkedIn Services.”

LinkedIn previously told TechCrunch that it uses “privacy-enhancing techniques, including redaction and removal of information, to limit personally identifiable information contained in datasets used to train generative AI.”

To opt out of LinkedIn’s data collection, go to the “Data Privacy” section of the LinkedIn settings menu in your computer, click “Data to improve Generative AI,” after which turn off “Use my data to train AI models to create content.” You may try a more comprehensive opt-out through this typebut LinkedIn notes that opting out is not going to affect training that has already taken place.

The nonprofit Open Rights Group (ORG) has asked the Information Commissioner’s Office (ICO), the UK’s independent regulator for data protection laws, to research LinkedIn and other social networks that train on user data by default. Earlier this week, Meta announced it was resuming plans to gather user data for AI training after working with the ICO to simplify the opt-out process.

“LinkedIn is the latest social media company to process our data without asking for our consent,” Mariano delli Santi, a lawyer and policy officer at ORG, said in a press release. “The opt-out model once again proves to be completely inadequate to protect our rights: society cannot be expected to monitor and prosecute every internet company that decides to use our data to train AI. Opt-in consent is not only legally required, but also common sense.”

The Irish Data Protection Commission (DPC), the supervisory authority responsible for monitoring compliance with the GDPR, the EU’s general privacy rules, told TechCrunch that LinkedIn had last week announced that clarifications on its global privacy policy could be published today.

“LinkedIn has informed us that the policy will include an opt-out setting for members who do not want their data used to train AI models that generate content,” a DPC spokesperson said. “This opt-out is not available to EU/EEA members, as LinkedIn does not currently use EU/EEA member data to train or tune these models.”

TechCrunch has reached out to LinkedIn for comment. We will update this text if we hear back.

The need for more data to coach generative AI models has led to more platforms repurposing or otherwise repurposing their vast troves of user-generated content. Some have even taken steps to monetize that content—Tumblr owner Automattic, Photobucket, Reddit, and Stack Overflow are among the many networks licensing data to AI model developers.

Not all of them made opting out easy. When Stack Overflow announced it will begin licensing content, several users deleted their posts in protest — only to see those posts restored and their accounts suspended.

This article was originally published on : techcrunch.com
Continue Reading
Advertisement

OUR NEWSLETTER

Subscribe Us To Receive Our Latest News Directly In Your Inbox!

We don’t spam! Read our privacy policy for more info.

Trending