Technology

Apple says it took a ‘responsible’ approach to training its Apple Intelligence models

Published

10 months ago

July 30, 2024

IAM

Apple published technical paper detailing the models developed for Apple Intelligence, a range of generative AI features coming to iOS, macOS, and iPadOS over the subsequent few months.

In the article, Apple opposes accusations that it took an ethically questionable approach to training a few of its models, reiterating that it didn’t use private user data but as a substitute relied on a combination of knowledge publicly available and licensed to Apple Intelligence.

“(The) pre-training dataset consists of… data we have licensed from publishers, curated publicly available or open datasets, and publicly available information crawled by our web crawler, Applebot,” Apple writes within the article. “Given our focus on protecting user privacy, we note that no private Apple user data is included in the data mix.”

Proof News in July reported that Apple used a dataset called The Pile, which comprises captions from a whole lot of 1000’s of YouTube videos, to train a family of models designed for on-device processing. Many YouTube creators whose captions were wolfed up by The Pile were unaware of this and didn’t consent to it; Apple later issued a statement saying it had no intention of using the models to power any AI features in its products.

A technical paper that offers a sneak peek on the models that Apple first unveiled at WWDC 2024 in June, titled Apple Foundation Models (AFM), emphasizes that the training data for the AFM models was acquired in a “responsible” manner — or at the least responsibly by Apple’s definition.

The training data for the AFM models includes publicly available Internet data, in addition to licensed data from undisclosed publishers. According to The New York Times, Apple I contacted several publishers in late 2023, including NBC, Condé Nast, and IAC, with multi-year deals price at the least $50 million to train models on publishers’ news archives. Apple’s AFM models were also trained on open-source code hosted on GitHub, specifically Swift, Python, C, Objective-C, C++, JavaScript, Java, and Go code.

Training models on code without permission, even open source, is a point of contention amongst developers. Some developers have argued that some open-source code bases are unlicensed or don’t allow AI training of their terms of use. However, Apple says it has “licensed” the code to try to include only repositories with minimal usage restrictions, reminiscent of those licensed under the MIT, ISC, or Apache licenses.

To boost the mathematical skills of the AFM models, Apple specifically included math questions and answers from web sites, math forums, blogs, tutorials, and seminars within the training set, according to the article. The company also used “high-quality, publicly available” data sets (which the article doesn’t specify) with “licenses that allow use to train… models,” filtered to remove sensitive information.

In total, the training dataset for the AFM models weighs in at about 6.3 trillion tokens. (Tokens are small pieces of knowledge which are typically easier for generative AI models to digest.) By comparison, that’s lower than half the variety of tokens — 15 trillion — that Meta used to train its flagship text-generating model, Llama 3.1 405B.

Apple acquired additional data, including human and artificial data, to refine the AFM models and attempt to mitigate any undesirable behaviors reminiscent of toxicity release.

“Our models are designed to help users perform on a regular basis tasks on Apple products in a way that’s well-established
in Apple’s core values and rooted in our principles of responsible AI at every stage,” the corporate said.

There is not any hard evidence or shocking insights within the article, and that is due to its careful design. Rarely are such articles very revealing, due to pressures of competition, but in addition because revealing much of the data could get corporations into legal trouble.

Some corporations that train models by scraping public web data claim that their practice is protected by fair use doctrine. But that is a difficulty that is extremely controversial and the topic of a growing variety of lawsuits.

Apple notes within the article that it allows webmasters to block the crawler from scraping their data. But that puts individual creators in a difficult position. What’s an artist to do if, for instance, their portfolio is hosted on a site that refuses to block Apple from scraping their data?

Court battles will resolve the fate of generative AI models and the way they’re trained. For now, though, Apple is trying to position itself as an ethical player while avoiding unwanted legal scrutiny.

This article was originally published on : techcrunch.com

Technology

The latest model AI Google Gemma can work on phones

Published

1 day ago

May 20, 2025

IAM

It grows “open” AI Google, Gemma, grows.

While Google I/O 2025 On Tuesday, Google removed Gemma 3N compresses, a model designed for “liquid” on phones, laptops and tablets. According to Google, available in a preview starting on Tuesday, Gemma 3N can support sound, text, paintings and flicks.

Models efficient enough to operate in offline mode and without the necessity to calculate within the cloud have gained popularity within the AI community lately. They will not be only cheaper to make use of than large models, but they keep privacy, eliminating the necessity to send data to a distant data center.

During the speech to I/O product manager, Gemma Gus Martins said that GEMMA 3N can work on devices with lower than 2 GB of RAM. “Gemma 3N shares the same architecture as Gemini Nano, and is also designed for incredible performance,” he added.

In addition to Gemma 3N, Google releases Medgemma through the AI developer foundation program. According to Medgemma, it’s essentially the most talented model to research text and health -related images.

“Medgemma (IS) OUR (…) A collection of open models to understand the text and multimodal image (health),” said Martins. “Medgemma works great in various imaging and text applications, thanks to which developers (…) could adapt the models to their own health applications.”

Also on the horizon there may be SignGEMMA, an open model for signaling sign language right into a spoken language. Google claims that Signgemma will allow programmers to create recent applications and integration for users of deaf and hard.

“SIGNGEMMA is a new family of models trained to translate sign language into a spoken text, but preferably in the American sign and English,” said Martins. “This is the most talented model of understanding sign language in history and we are looking forward to you-programmers, deaf and hard communities-to take this base and build with it.”

It is value noting that Gemma has been criticized for non -standard, non -standard license conditions, which in accordance with some developers adopted models with a dangerous proposal. However, this didn’t discourage programmers from downloading Gemma models tens of tens of millions of times.

(Tagstransate) gemma

This article was originally published on : techcrunch.com

Technology

Trump to sign a criminalizing account of porn revenge and clear deep cabinets

Published

2 days ago

May 19, 2025

IAM

President Donald Trump is predicted to sign the act on Take It Down, a bilateral law that introduces more severe punishments for distributing clear images, including deep wardrobes and pornography of revenge.

The Act criminalizes the publication of such photos, regardless of whether or not they are authentic or generated AI. Whoever publishes photos or videos can face penalty, including a advantageous, deprivation of liberty and restitution.

According to the brand new law, media firms and web platforms must remove such materials inside 48 hours of termination of the victim. Platforms must also take steps to remove the duplicate content.

Many states have already banned clear sexual desems and pornography of revenge, but for the primary time federal regulatory authorities will enter to impose restrictions on web firms.

The first lady Melania Trump lobbyed for the law, which was sponsored by the senators Ted Cruz (R-TEXAS) and Amy Klobuchar (d-minn.). Cruz said he inspired him to act after hearing that Snapchat for nearly a 12 months refused to remove a deep displacement of a 14-year-old girl.

Proponents of freedom of speech and a group of digital rights aroused concerns, saying that the law is Too wide And it will probably lead to censorship of legal photos, similar to legal pornography, in addition to government critics.

(Tagstransate) AI

This article was originally published on : techcrunch.com

Technology

Microsoft Nadella sata chooses chatbots on the podcasts

Published

4 days ago

May 18, 2025

IAM

While the general director of Microsoft, Satya Nadella, says that he likes podcasts, perhaps he didn’t take heed to them anymore.

That the treat is approaching at the end longer profile Bloomberg NadellaFocusing on the strategy of artificial intelligence Microsoft and its complicated relations with Opeli. To illustrate how much she uses Copilot’s AI assistant in her day by day life, Nadella said that as a substitute of listening to podcasts, she now sends transcription to Copilot, after which talks to Copilot with the content when driving to the office.

In addition, Nadella – who jokingly described her work as a “E -Mail driver” – said that it consists of a minimum of 10 custom agents developed in Copilot Studio to sum up E -Mailes and news, preparing for meetings and performing other tasks in the office.

It seems that AI is already transforming Microsoft in a more significant way, and programmers supposedly the most difficult hit in the company’s last dismissals, shortly after Nadella stated that the 30% of the company’s code was written by AI.

(Tagstotransate) microsoft

This article was originally published on : techcrunch.com