Technology

Gemini’s data analysis capabilities aren’t as good as Google claims

Published

10 months ago

June 29, 2024

IAM

One of the strengths of Google’s flagship generative AI models, Gemini 1.5 Pro and 1.5 Flash, is the quantity of data they’ll supposedly process and analyze. During press conferences and demonstrations, Google has repeatedly claimed that these models can perform previously not possible tasks due to “long context” such as summarizing multiple 100-page documents or looking through scenes in video footage.

But recent research suggests that these models actually aren’t very good at this stuff.

Two separate studies examined how well Google’s Gemini models and others make sense of big amounts of data—think the length of “War and Peace.” Both models find that Gemini 1.5 Pro and 1.5 Flash struggle to accurately answer questions on large data sets; in a single set of document-based tests, the models got the reply right only 40% and 50% of the time.

“While models like Gemini 1.5 Pro can technically process long contexts, we have seen many cases indicating that the models don’t actually ‘understand’ the content,” Marzena Karpińska, a postdoc at UMass Amherst and co-author on one in all the studios, told TechCrunch.

The Gemini context window is incomplete

Model context or context window refers back to the input data (e.g. text) that the model considers before generating output data (e.g. additional text). An easy query – “Who won the 2020 US presidential election?” — might be used as context, very similar to a script for a movie, show, or audio clip. As context windows grow, the scale of the documents they contain also increases.

The latest versions of Gemini can accept greater than 2 million tokens as context. (“Tokens” are broken-down chunks of raw data, such as the syllables “fan,” “tas,” and “tic” in “fantastic.”) That’s roughly corresponding to 1.4 million words, two hours of video, or 22 hours of audio—essentially the most context of any commercially available model.

In a briefing earlier this 12 months, Google showed off several pre-recorded demos intended as an instance the potential of Gemini’s long-context capabilities. One involved Gemini 1.5 Pro combing through the transcript of the Apollo 11 moon landing broadcast—some 402 pages—on the lookout for quotes containing jokes, then finding a scene in the printed that looked like a pencil sketch.

Google DeepMind’s vp of research Oriol Vinyals, who chaired the conference, called the model “magical.”

“(1.5 Pro) does these kinds of reasoning tasks on every page, on every word,” he said.

That may need been an exaggeration.

In one in all the aforementioned studies comparing these capabilities, Karpińska and researchers from the Allen Institute for AI and Princeton asked models to judge true/false statements about fiction books written in English. The researchers selected recent works in order that the models couldn’t “cheat” on prior knowledge, and so they supplemented the statements with references to specific details and plot points that will be not possible to know without reading the books of their entirety.

Given a press release such as “With her Apoth abilities, Nusis is able to reverse engineer a type of portal opened using the reagent key found in Rona’s wooden chest,” Gemini 1.5 Pro and 1.5 Flash — after swallowing the suitable book — had to find out whether the statement was true or false and explain their reasoning.

Image Credits: University of Massachusetts at Amherst

Tested on a single book of about 260,000 words (~520 pages), the researchers found that the 1.5 Pro accurately answered true/false statements 46.7% of the time, while Flash only answered accurately 20% of the time. This implies that the coin is significantly higher at answering questions on the book than Google’s latest machine learning model. Averaging across all benchmark results, neither model achieved higher than likelihood when it comes to accuracy in answering questions.

“We have noticed that models have greater difficulty verifying claims that require considering larger sections of a book, or even the entire book, compared to claims that can be solved by taking evidence at the sentence level,” Karpinska said. “Qualitatively, we also observed that models have difficulty validating claims for implicit information that are clear to a human reader but not explicitly stated in the text.”

The second of the 2 studies, co-authored by researchers at UC Santa Barbara, tested the power of Gemini 1.5 Flash (but not 1.5 Pro) to “reason” about videos — that’s, to seek out and answer questions on their content.

The co-authors created a data set of images (e.g., a photograph of a birthday cake) paired with questions for the model to reply concerning the objects depicted in the pictures (e.g., “What cartoon character is on this cake?”). To evaluate the models, they randomly chosen one in all the pictures and inserted “distraction” images before and after it to create a slideshow-like video.

Flash didn’t do thoroughly. In a test by which the model transcribed six handwritten digits from a “slideshow” of 25 images, Flash performed about 50% of the transcriptions accurately. Accuracy dropped to about 30% at eight digits.

“For real question-and-answer tasks in images, this seems particularly difficult for all the models we tested,” Michael Saxon, a doctoral student at UC Santa Barbara and one in all the study’s co-authors, told TechCrunch. “That little bit of reasoning — recognizing that a number is in a box and reading it — can be what breaks the model.”

Google is promising an excessive amount of with Gemini

Neither study was peer-reviewed, nor did it examine the launch of Gemini 1.5 Pro and 1.5 Flash with contexts of two million tokens. (Both tested context versions with 1 million tokens.) Flash just isn’t intended to be as efficient as Pro when it comes to performance; Google advertises it as a low-cost alternative.

Still, each add fuel to the fireplace that Google has been overpromising — and underdelivering — with Gemini from the beginning. None of the models the researchers tested, including OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet, performed well. But Google is the one model provider to place the context window at the highest of its list in its ads.

“There is nothing wrong with simply saying, ‘Our model can accept X tokens,’ based on objective technical details,” Saxon said. “But the question is: What useful thing can be done with it?”

Overall, generative AI is coming under increasing scrutiny as businesses (and investors) grow to be increasingly frustrated with the technology’s limitations.

In two recent Boston Consulting Group surveys, about half of respondents—all CEOs—said they didn’t expect generative AI to deliver significant productivity advantages and that they were concerned about potential errors and data breaches resulting from generative AI tools. PitchBook recently reported that early-stage generative AI deal activity has declined for 2 consecutive quarters, down 76% from its peak in Q3 2023.

With meeting recap chatbots conjuring fictitious details about people and AI search platforms which can be essentially plagiarism generators, customers are on the lookout for promising differentiators. Google — which had been racing, sometimes clumsily, to meet up with its rivals in the sphere of generative AI — desperately wanted the Gemini context to be one in all those differentiators.

However, it seems that the idea was premature.

“We haven’t figured out how to really show that ‘reasoning’ or ‘understanding’ is happening across long documents, and basically every group publishing these models is just pulling together their own ad hoc assessments to make these claims,” Karpińska said. “Without knowing how long the context processing is happening—and the companies don’t share that detail—it’s hard to say how realistic these claims are.”

Google didn’t reply to a request for comment.

Both Saxon and Karpińska consider that the antidote to the grandiose claims about generative AI is best benchmarks and, in the identical vein, a greater emphasis on third-party criticism. Saxon notes that one in all the more common long-context tests (heavily cited by Google in its marketing materials), the “needle in a haystack,” measures only a model’s ability to retrieve specific pieces of knowledge, such as names and numbers, from datasets—not how well it answers complex questions on that information.

“All scientists and most engineers using these models generally agree that our current benchmarking culture is broken,” Saxon said, “so it’s important that the public understands that these giant reports with numbers like ‘general intelligence in “comparative tests” with an enormous pinch of salt.”

This article was originally published on : techcrunch.com

Up Next

Biggest Data Breaches of 2024: 1 Billion Records Stolen and Growing

Don't Miss

Plaid, once aimed primarily at fintechs, is expanding its corporate business and now has over 1,000 registered customers

Click to comment

Technology

Former President Barack Obama weighs Human Touch vs. And for coding

Published

15 mins ago

April 27, 2025

IAM

Former President Barack Obama spoke in regards to the way forward for human jobs because he feels artificial intelligence (AI) exceeding people’s coding efforts, reports.

By participating within the Sacerdote Great Names series at Hamilton College in CLinton, New York, the previous president of America, he talked about what number of roles will probably be potentially eliminated – and so they aren’t any longer mandatory – on account of the effectiveness of AI, claiming that the software encodes 60% to 70% higher than people.

“Already current models of artificial intelligence, not necessarily those you buy or just go through retail chatgpt, but more advanced models that are now available to companies can cod better than let’s call it 60%, 70% programmers now,” said former president Hamilton Steven Teper.

“We are talking about high qualified places that pay really good salaries and that until recently they were completely the market for the vendor within the Silicon Valley. Many of those works will disappear. The best programmers will have the ability to make use of these tools to expand what they’re already doing, but within the case of many routine things, you’ll simply not need a code, since the computer or machine will do the identical.

Obama isn’t the one celebrity that slowly emphasized the importance of AI, but for sure. Through the Coramino Fund, investment cooperation between comedian Kevin Hart and Juan Domingo Beckmann Gran Coramino Tequila, entrepreneurs and small firms from the community insufficiently confirmed It was encouraged to submit an application for a subsidy program of USD 10,000. While applications for the primary round closed on April 23, 50 firms will receive not only capital to the extension, but additionally receive “the latest AI technological training and practical learning of responsible and effective inclusion in their operations”, in response to.

Hart claims that business owners must jump on opportunities and education.

“The train is coming and fast,” he said. “Either you are on it or if not, get off the road.”

Data and research also support Hart and Obama points of view, and colourful people may be probably the most affecting this because they change into more popular within the workplace. After reviewing the info from the American census, scientists from Julian Samora Institute from Michigan State University stated that Latynoskie firms reported almost 9% of AI adoption, and Asian firms used about 11%. Almost 78% of Białe firms have reported high technology.

Black own firms He handled the last, with the bottom use of artificial intelligence all over the world in 2023, with a smaller number than 2% of firms reporting “high use”.

A report of scientists from the University of California in Los Angeles (UCLA) revealed that Latinx AI employees are exposed to loss of labor on account of automation and increased use of technology, which performs repetitive tasks without human involvement.

Data from the McKinsey Institute for Economic Mobility indicate that the division of AI can broaden the gap in racial wealth by $ 43 million a yr.

(Tagstranslatate) artificial intelligence

This article was originally published on : www.blackenterprise.com

Technology

Musk’s XAI Holdings reportedly collects the second largest private round of financing

Published

1 day ago

April 26, 2025

IAM

Elon Musk’s Xai Holdings talks about gathering $ 20 billion for fresh funds, potentially valuing the combination of AI and social media at over $ 120 billion, in accordance with A New Bloomberg report This says that the talks are at “early stages”. If it succeeds, the contract can be the second largest round of financing startups in history, only with an OPENAI increase in the amount of $ 40 billion last month.

Financing may help alleviate the significant burden of X debt, which costs an organization price $ 200 million monthly, for Bloomberg sources, with annual interest costs exceeding $ 1.3 billion by the end of last yr.

The increase on this size would also show the constant attractiveness of AI investor, and likewise reflects the surprising appearance of Musk as a player of political power in the White House of President Trump.

Musk will probably get from some of the same supporters who consistently financed their ventures, from Tesla to SpaceX, including Antonio Gracias from Valor Equity Partners and Luke Nosek from Gigafund. Gracias even took the role lieutenant In the Musk government department.

Xai didn’t answer immediately.

(Tagstransate) Elon Musk (T) XAI Holdings

This article was originally published on : techcrunch.com

Technology

Leap Hee launches the 1-to-in-innd-second-mobile application, giving home owners better access to equity

Published

3 days ago

April 25, 2025

IAM

Fintech Real Estate Investment Company Leap AnalyticsAlso often known as Leap Hee, he announced the launch of a brand new and progressive mobile application designed to revolutionize the access of home owners and home equity management,

The application allows users to apply for 3 several types of capital capital contracts (Heas) directly on the phone, providing a wealth of comprehensive housing resources. The general director and founding father of Leap, Ashley Bete, claims that the recent application helps home owners make smarter financial decisions without connecting.

“Our new mobile application revolutionizes how home owners gain access to home owners and use their own capital,” said Bete. “By offering three types of hea at your fingertips, together with a package of tools related to the apartment, we authorize home owners to make very informed financial decisions, while releasing the capital potential of their most valuable assets.”

In addition to having Hea-Zarówno in 10-year contracts, in addition to 30-year contracts-at your fingertips, the functions of application supporting the travel of home owners include access to the financial library, financial analyzes and tools, similar to Simulator Improvement Simulators, similar to the Improvement Improvement simulator.

While the purpose of the application is to solve significant problems on the housing market, similar to the effects of redlining and gentrification, Bete said that it’s also consistent with the company’s mission involving the education of home owners in the scope of fixing real estate industry, while ensuring tools for extracting capital from homes, reduction of debt and increasing the renewal of monetary faith. “The LEAP application is a significant progress in the Leap mission to close the gaps in the field of wealth and apartments, and at the same time promoting financial health through innovative household solutions,” he said.

The mission can also be consistent with the findings of how American house owners have been blocked before billions in their very own capital, without even knowing it. AND Recent studies conducted by Home Equity Investment Company Point showed that home owners The risk is blocked before access to $ 731 billion in their very own capitalwhich many depend on, due to a decrease in the resulting credit scoring Loss of labor, according to.

In 2024, the total American domestic capital reached USD 34.7 trillion, which is a rise of 80% since 2020. However, a big a part of this housing wealth stays “closed”.

Applicant Leap Juune Lucero from California said that he would “recommend Leap” after the designation of the company’s home capital contracts as a wonderful alternative to expensive options.

“They helped me and my family to improve our personal finances,” said Lucero. The Munashe Shumba technology director shared similar moods, adding that the application “helps property owners intelligently manage homes and increase their value” with recommendations based on data on “necessary services”.

Download the LEAP mobile application on iOS and Android platforms.

(Tagstranslate) FINTECH (T) Home Equity (T) Leap Hea (T) ASHLEY BETE (T) Leap Analytics (T) Mobile application

This article was originally published on : www.blackenterprise.com

Press Release1 year ago

U.S.-Africa Chamber of Commerce Appoints Robert Alexander of 360WiseMedia as Board Director

Press Release1 year ago

CEO of 360WiSE Launches Mentorship Program in Overtown Miami FL

Business and Finance11 months ago

The Importance of Owning Your Distribution Media Platform

Business and Finance1 year ago

360Wise Media and McDonald’s NY Tri-State Owner Operators Celebrate Success of “Faces of Black History” Campaign with Over 2 Million Event Visits

Ben Crump1 year ago

Another lawsuit accuses Google of bias against Black minority employees

Theater1 year ago

Telling the story of the Apollo Theater

Ben Crump1 year ago

Henrietta Lacks’ family members reach an agreement after her cells undergo advanced medical tests

Ben Crump1 year ago

The families of George Floyd and Daunte Wright hold an emotional press conference in Minneapolis

Theater1 year ago

Applications open for the 2020-2021 Soul Producing National Black Theater residency – Black Theater Matters

Theater11 months ago

Cultural icon Apollo Theater sets new goals on the occasion of its 85th anniversary

360WISE MEDIA

Gemini’s data analysis capabilities aren’t as good as Google claims

Technology

Gemini’s data analysis capabilities aren’t as good as Google claims

The Gemini context window is incomplete

Google is promising an excessive amount of with Gemini

Leave a Reply
Cancel reply

Leave a Reply

Technology

Former President Barack Obama weighs Human Touch vs. And for coding

Technology

Musk’s XAI Holdings reportedly collects the second largest private round of financing

Technology

Leap Hee launches the 1-to-in-innd-second-mobile application, giving home owners better access to equity

Former President Barack Obama weighs Human Touch vs. And for coding

“Luther” Kendrick Lamar and Sza is at the top of the charts at the 7th week in a row, which is … so amazing

Travis Hunter’s father receives the court’s consent to personally be a witness to NFL sleep.

Pro footballer is hiding when the loved ones kidnapped in connection with the growing crime in Ecuador, the family returned safely

Our thoughts on Clair Obsur: Expedition 33, ALFA marathon, a hundred lines and more best shots of the week

U.S.-Africa Chamber of Commerce Appoints Robert Alexander of 360WiseMedia as Board Director

CEO of 360WiSE Launches Mentorship Program in Overtown Miami FL

The Importance of Owning Your Distribution Media Platform

360Wise Media and McDonald’s NY Tri-State Owner Operators Celebrate Success of “Faces of Black History” Campaign with Over 2 Million Event Visits

Another lawsuit accuses Google of bias against Black minority employees

Experiencing San Francisco’s Authentic Asian Charm | Traveling Black

Franchise Opportunities for African Americans

The Next Steps For Ketanji Brown Jackson

Colleen Payne Nabors on Black Enterprise

How To Ease Into Change | The Reset With Coach Tish

OUR NEWSLETTER

Trending

360WISE MEDIA

Gemini’s data analysis capabilities aren’t as good as Google claims

The Gemini context window is incomplete

Google is promising an excessive amount of with Gemini

You may like

Leave a Reply Cancel reply

Leave a Reply

Technology

Former President Barack Obama weighs Human Touch vs. And for coding

Technology

Musk’s XAI Holdings reportedly collects the second largest private round of financing

Technology

Leap Hee launches the 1-to-in-innd-second-mobile application, giving home owners better access to equity

Former President Barack Obama weighs Human Touch vs. And for coding

“Luther” Kendrick Lamar and Sza is at the top of the charts at the 7th week in a row, which is … so amazing

Travis Hunter’s father receives the court’s consent to personally be a witness to NFL sleep.

Pro footballer is hiding when the loved ones kidnapped in connection with the growing crime in Ecuador, the family returned safely

Our thoughts on Clair Obsur: Expedition 33, ALFA marathon, a hundred lines and more best shots of the week

U.S.-Africa Chamber of Commerce Appoints Robert Alexander of 360WiseMedia as Board Director

CEO of 360WiSE Launches Mentorship Program in Overtown Miami FL

The Importance of Owning Your Distribution Media Platform

360Wise Media and McDonald’s NY Tri-State Owner Operators Celebrate Success of “Faces of Black History” Campaign with Over 2 Million Event Visits

Another lawsuit accuses Google of bias against Black minority employees

Experiencing San Francisco’s Authentic Asian Charm | Traveling Black

Franchise Opportunities for African Americans

The Next Steps For Ketanji Brown Jackson

Colleen Payne Nabors on Black Enterprise

How To Ease Into Change | The Reset With Coach Tish

OUR NEWSLETTER

Trending

Leave a Reply
Cancel reply