Connect with us

Technology

Gemini’s data analysis capabilities aren’t as good as Google claims

Published

on

In this photo illustration a Gemini logo and a welcome message on Gemini website are displayed on two screens.

One of the strengths of Google’s flagship generative AI models, Gemini 1.5 Pro and 1.5 Flash, is the quantity of data they’ll supposedly process and analyze. During press conferences and demonstrations, Google has repeatedly claimed that these models can perform previously not possible tasks due to “long context” such as summarizing multiple 100-page documents or looking through scenes in video footage.

But recent research suggests that these models actually aren’t very good at this stuff.

Two separate studies examined how well Google’s Gemini models and others make sense of big amounts of data—think the length of “War and Peace.” Both models find that Gemini 1.5 Pro and 1.5 Flash struggle to accurately answer questions on large data sets; in a single set of document-based tests, the models got the reply right only 40% and 50% of the time.

Advertisement

“While models like Gemini 1.5 Pro can technically process long contexts, we have seen many cases indicating that the models don’t actually ‘understand’ the content,” Marzena Karpińska, a postdoc at UMass Amherst and co-author on one in all the studios, told TechCrunch.

The Gemini context window is incomplete

Model context or context window refers back to the input data (e.g. text) that the model considers before generating output data (e.g. additional text). An easy query – “Who won the 2020 US presidential election?” — might be used as context, very similar to a script for a movie, show, or audio clip. As context windows grow, the scale of the documents they contain also increases.

The latest versions of Gemini can accept greater than 2 million tokens as context. (“Tokens” are broken-down chunks of raw data, such as the syllables “fan,” “tas,” and “tic” in “fantastic.”) That’s roughly corresponding to 1.4 million words, two hours of video, or 22 hours of audio—essentially the most context of any commercially available model.

In a briefing earlier this 12 months, Google showed off several pre-recorded demos intended as an instance the potential of Gemini’s long-context capabilities. One involved Gemini 1.5 Pro combing through the transcript of the Apollo 11 moon landing broadcast—some 402 pages—on the lookout for quotes containing jokes, then finding a scene in the printed that looked like a pencil sketch.

Advertisement

Google DeepMind’s vp of research Oriol Vinyals, who chaired the conference, called the model “magical.”

“(1.5 Pro) does these kinds of reasoning tasks on every page, on every word,” he said.

That may need been an exaggeration.

In one in all the aforementioned studies comparing these capabilities, Karpińska and researchers from the Allen Institute for AI and Princeton asked models to judge true/false statements about fiction books written in English. The researchers selected recent works in order that the models couldn’t “cheat” on prior knowledge, and so they supplemented the statements with references to specific details and plot points that will be not possible to know without reading the books of their entirety.

Advertisement

Given a press release such as “With her Apoth abilities, Nusis is able to reverse engineer a type of portal opened using the reagent key found in Rona’s wooden chest,” Gemini 1.5 Pro and 1.5 Flash — after swallowing the suitable book — had to find out whether the statement was true or false and explain their reasoning.

Image Credits: University of Massachusetts at Amherst

Tested on a single book of about 260,000 words (~520 pages), the researchers found that the 1.5 Pro accurately answered true/false statements 46.7% of the time, while Flash only answered accurately 20% of the time. This implies that the coin is significantly higher at answering questions on the book than Google’s latest machine learning model. Averaging across all benchmark results, neither model achieved higher than likelihood when it comes to accuracy in answering questions.

“We have noticed that models have greater difficulty verifying claims that require considering larger sections of a book, or even the entire book, compared to claims that can be solved by taking evidence at the sentence level,” Karpinska said. “Qualitatively, we also observed that models have difficulty validating claims for implicit information that are clear to a human reader but not explicitly stated in the text.”

The second of the 2 studies, co-authored by researchers at UC Santa Barbara, tested the power of Gemini 1.5 Flash (but not 1.5 Pro) to “reason” about videos — that’s, to seek out and answer questions on their content.

Advertisement

The co-authors created a data set of images (e.g., a photograph of a birthday cake) paired with questions for the model to reply concerning the objects depicted in the pictures (e.g., “What cartoon character is on this cake?”). To evaluate the models, they randomly chosen one in all the pictures and inserted “distraction” images before and after it to create a slideshow-like video.

Flash didn’t do thoroughly. In a test by which the model transcribed six handwritten digits from a “slideshow” of 25 images, Flash performed about 50% of the transcriptions accurately. Accuracy dropped to about 30% at eight digits.

“For real question-and-answer tasks in images, this seems particularly difficult for all the models we tested,” Michael Saxon, a doctoral student at UC Santa Barbara and one in all the study’s co-authors, told TechCrunch. “That little bit of reasoning — recognizing that a number is in a box and reading it — can be what breaks the model.”

Google is promising an excessive amount of with Gemini

Neither study was peer-reviewed, nor did it examine the launch of Gemini 1.5 Pro and 1.5 Flash with contexts of two million tokens. (Both tested context versions with 1 million tokens.) Flash just isn’t intended to be as efficient as Pro when it comes to performance; Google advertises it as a low-cost alternative.

Advertisement

Still, each add fuel to the fireplace that Google has been overpromising — and underdelivering — with Gemini from the beginning. None of the models the researchers tested, including OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet, performed well. But Google is the one model provider to place the context window at the highest of its list in its ads.

“There is nothing wrong with simply saying, ‘Our model can accept X tokens,’ based on objective technical details,” Saxon said. “But the question is: What useful thing can be done with it?”

Overall, generative AI is coming under increasing scrutiny as businesses (and investors) grow to be increasingly frustrated with the technology’s limitations.

In two recent Boston Consulting Group surveys, about half of respondents—all CEOs—said they didn’t expect generative AI to deliver significant productivity advantages and that they were concerned about potential errors and data breaches resulting from generative AI tools. PitchBook recently reported that early-stage generative AI deal activity has declined for 2 consecutive quarters, down 76% from its peak in Q3 2023.

Advertisement

With meeting recap chatbots conjuring fictitious details about people and AI search platforms which can be essentially plagiarism generators, customers are on the lookout for promising differentiators. Google — which had been racing, sometimes clumsily, to meet up with its rivals in the sphere of generative AI — desperately wanted the Gemini context to be one in all those differentiators.

However, it seems that the idea was premature.

“We haven’t figured out how to really show that ‘reasoning’ or ‘understanding’ is happening across long documents, and basically every group publishing these models is just pulling together their own ad hoc assessments to make these claims,” Karpińska said. “Without knowing how long the context processing is happening—and the companies don’t share that detail—it’s hard to say how realistic these claims are.”

Google didn’t reply to a request for comment.

Advertisement

Both Saxon and Karpińska consider that the antidote to the grandiose claims about generative AI is best benchmarks and, in the identical vein, a greater emphasis on third-party criticism. Saxon notes that one in all the more common long-context tests (heavily cited by Google in its marketing materials), the “needle in a haystack,” measures only a model’s ability to retrieve specific pieces of knowledge, such as names and numbers, from datasets—not how well it answers complex questions on that information.

“All scientists and most engineers using these models generally agree that our current benchmarking culture is broken,” Saxon said, “so it’s important that the public understands that these giant reports with numbers like ‘general intelligence in “comparative tests” with an enormous pinch of salt.”

This article was originally published on : techcrunch.com
Advertisement

Technology

The signal is the number one application in the Netherlands. But why?

Published

on

By

Signal

The application signal for sending a privacy -oriented message flew high in Dutch application stores last month, often sitting at the top as the most steadily downloaded free application for iOS and Android in all categories, for data from many application tracking platforms akin to the sensor tower.

The application has experienced popularity over the years, often in response to Changes in politics in rivals akin to WhatsApp Or Geopolitical events. This is because Signal has set a reputation as a more friendly privacy option-it is served by the non-profit foundation (though based in the USA), not a personal company focused on data earning data. In addition, the signal tracks minimal metadata.

In 2025, along with the recent US president, who strengthened the warm Big Tech hug, it is not surprising that digital privacy tools have a moment – especially in Europe, which attracted the anger of President Trump.

Advertisement

But this time, the meaning of the signal in one very specific place-Holandia is particularly eye-catching.

Signal data from the sensor towerImage loans:Sensor tower / screenshot

IN Interview with Dutch newspaper de Telelegraaf last week, President signal Meredith Whittaker He noticed that the number of “new registrations” in the Netherlands was 25 this 12 months, even though it is not clear what the exact comparative period for this data is.

Asked why the Netherlands recorded such development, Whittaker pointed to the combination of things: “growing awareness of privacy, distrust of large technology and political reality in which people realize how sensitive digital communication can be,” said Whittaker.

Data provided to TechCrunch from the application intelligence company Appfigures Increase in Signal Signal in the Netherlands. According to its data, the signal was 365. Among the applications apart from the iPhone in the Netherlands on January 1 and didn’t appear on the list of the most significant general applications. Then, from around January 5, he began to climb the rankings, reaching the highest position until February 2.

Advertisement

The signal immersed and comes out of the lead during weeks, spending around mid -February at the top – including every single day from February 22. By digging deeper into the data, the AppFigures estimates that the total download in Apple and Google Applets in total in December 2024 jumped to 99,000 in January and increased to 233,000 to February – 958%.

While a part of this height could be assigned to a lower saturation signal than other markets, a continuing application position at the top in comparison with neighboring markets of comparable size.

“No other markets are approaching the Netherlands in terms of growth between December and February,” said AppFigures Techcrunch.

For comparison, from December in Belgium, download increased by over 250%, Sweden by 153%and dishes by 95%.

Advertisement

So why the signal can experience what one redditor called “The moment of mass adoption“In the Netherlands?

Clear signal

Give ZengerSenior Policy Advisor at Dutch Digital Rights Foundation Fragments of freedomHe said that even though it is difficult to point one specific reason, he is not surprised.

The last changes in the US have seen Large platform suppliers Adapt with the recent Trump administration, and this has retained a major public and media debate. Relying Europe from the technology of big private American corporations has turn out to be the point of interest of this debate.

“The Dutch are, like many others, very dependent on the infrastructure provided by extremely dominant technology companies, mainly from the USA,” said Zenger. “What does this mean, and the risk that results from it has been nicely demonstrated in the last few weeks. As a result, the public debate in the Netherlands was relatively sharp. Where in the past this problem was discussed only at the level “:” I feel that we are now conducting a debate at the higher levels: “.

Advertisement

In this context, society can mix dominance with data protection abuse. Since corporations akin to meta are frequently studied and fined in the field of information privacy practices, the signal could appear to be less evil: it is based on the US, but supported by a non-profit organization, which ensures encryption of each the content of the message and around it.

Vincent BöhreDirector of the Dutch Organization of Privacy Privacy firstHe also pointed to increased media relationships and a wider change of public opinion.

“Since a few months ago he was re-elected in the United States, in the Dutch-and European media, which seem to support Trump, there were many” Elon) Muska. “Articles criticizing X (previously Twitter) and Meta appear everywhere in the Dutch media, which leads to a change in Dutch public opinion: even people who have never really known or cared for privacy and security in social media, suddenly became interested in” friendly privacy “alternative, in particular the signal.”

Signal of intentions

President of the application for sending a signal message Meredith Whittaker.
President Signa Meredith Whittaker on Web Summit, in Lisbon on November 4, 2022.Image loans:Patricia de Melo Moreira / AFP / Getty Images

While the Netherlands is only one market of 18 million people in the European population over 700 million, its increase in adoption can signal a wider trend throughout the continent, especially when governments try to cut back privacy barriers.

For example, Apple has recently pulled out comprehensive encryption from iCloud in Great Britain to counteract government efforts to put in a backdoor.

Advertisement

Speech Fr. Rightcon 25 In Taiwan, this week, Whittaker confirmed the unwavering Signal attitude regarding privacy.

“Signal position on this subject is very clear- we will not walk, falsify or otherwise disturb the solid guarantees of privacy and security that people rely on” Said Whittaker. “Regardless of whether this disturbance or backdoor is called scanning on the client’s side or removing the protection of encryption against one or the other, the features similar to what Apple has been forced to do in Great Britain”

Separately, in Interview with Swedish public broadcaster, Whittaker said that Signal wouldn’t follow the proposed Swedish law requiring application to send messages for storage.

“In practice, this means asking us to break encryption, which is the basis of our entire activity,” said Whittaker. “Asking us to store data would undermine all our architecture and we would never do it. We would prefer to completely leave the Swedish market. “

Advertisement

TechCrunch contacted to signal a comment, but he didn’t hear during the publication.

(Tagstotranslat) signal of the Netherlands

This article was originally published on : techcrunch.com
Continue Reading

Technology

Gayle King announces participation in the space mission of all women

Published

on

By

Gayle King, CBS News, new deal, morning show

Gayle King will join the thirty first Blue Origin civil flight into space.


Gayle King announced that he was going to space. The host of the talk show during the day provided messages CBS MORNINGS.

King revealed Her participation in the thirty first Blue Origin flights, NS-31. Before discussing the details of the mission, she and her co -lecturers presented the video editing, which described her long -term fascination with travel travel.

Advertisement

In one clip, King said: “I am excited to watch the premiere at home in my pajamas.”

Her enthusiasm led to an invite with Blue Origin. The television personality will disappear from Crew from the whole familyIncluding an award -winning journalist Lauren Sánchez, award -winning Grammy singer Katy Perry and astronaut Aish Bowe.

Soon the explorer of the space admitted that she was hesitating at first.

“I don’t know how to explain at the same time terrified and excited,” said King.

Advertisement

To make a choice, King turned to a gaggle of family members, including her children and a detailed friend, Oprah Winfrey. She said that when her most trusted confidants approved, she was ready.

“When Kirby, Will and Oprah were fine, I was fine,” said King. “I thought Oprah would say no. She said: “I feel that when you don’t do it, if you all come back and also you had the opportunity to do it, you’ll kick.” She is right. “

King is not going to be the first television host who wandered into space with blue origin. In 2021, then-Good morning America Coheat Michael Strahan took part in the third civil flight Blue Origin. The former NFL star and the sender was delighted after returning, expressing how this experience gave him a brand new “perspective” in the world.

“I want to come back,” said Strahan.

Advertisement

Blue origin, Founded by Amazon Billionaire Jeff Bezos in 2000 is a non-public aviation company that focuses on sharing space travels for civilians and developing technology to explore the space long.

The upcoming flight of the king New Shepard It will probably be part of Blue Origin’s constant efforts to normalize civil space travel.


Advertisement
This article was originally published on : www.blackenterprise.com
Continue Reading

Technology

Instagram can turn the rollers in a separate application

Published

on

By

Instagram

Meta is occupied with an independent application for brief movies, Information He informed, citing an anonymous source, which he heard the boss on Instagram Adam Mosseri talked about the personnel project.

The project is reportedly called RAY code, which goals to enhance recommendations for brand new users and existing users in the US and to conclude one other three minutes of movies, the report quoted the source.

The finish line didn’t answer immediately at the request for comment.

Advertisement

Last month, the company announced a video editing application called Edyta to compete with Capcut (belonging to Tiktok Matter Company Bytedance) since it was geared toward using the uncertain future Tiktok and Bytedance in the USA

Currently, the Instagram channel is a mixture of photos, movies (drums) and stories. However, many users imagine that the application has been cluttered since it incorporates movies and not persist with the roots as an application for sharing photos. If the company rotates in an independent application for brief movies, it can create a possibility for Instagram to emphasise other functions.

Instagram began at the starting of this yr paying creators To promote Instagram on other platforms, resembling Tiktok, Snapchat and YouTube. Apparently he also began to supply Big money for the creators Present only on roller skates.

(Tagstranslate) Instagram

Advertisement
This article was originally published on : techcrunch.com
Continue Reading
Advertisement

OUR NEWSLETTER

Subscribe Us To Receive Our Latest News Directly In Your Inbox!

We don’t spam! Read our privacy policy for more info.

Trending