Connect with us

Technology

Gemini’s data analysis capabilities aren’t as good as Google claims

Published

on

In this photo illustration a Gemini logo and a welcome message on Gemini website are displayed on two screens.

One of the strengths of Google’s flagship generative AI models, Gemini 1.5 Pro and 1.5 Flash, is the quantity of data they’ll supposedly process and analyze. During press conferences and demonstrations, Google has repeatedly claimed that these models can perform previously not possible tasks due to “long context” such as summarizing multiple 100-page documents or looking through scenes in video footage.

But recent research suggests that these models actually aren’t very good at this stuff.

Two separate studies examined how well Google’s Gemini models and others make sense of big amounts of data—think the length of “War and Peace.” Both models find that Gemini 1.5 Pro and 1.5 Flash struggle to accurately answer questions on large data sets; in a single set of document-based tests, the models got the reply right only 40% and 50% of the time.

Advertisement

“While models like Gemini 1.5 Pro can technically process long contexts, we have seen many cases indicating that the models don’t actually ‘understand’ the content,” Marzena Karpińska, a postdoc at UMass Amherst and co-author on one in all the studios, told TechCrunch.

The Gemini context window is incomplete

Model context or context window refers back to the input data (e.g. text) that the model considers before generating output data (e.g. additional text). An easy query – “Who won the 2020 US presidential election?” — might be used as context, very similar to a script for a movie, show, or audio clip. As context windows grow, the scale of the documents they contain also increases.

The latest versions of Gemini can accept greater than 2 million tokens as context. (“Tokens” are broken-down chunks of raw data, such as the syllables “fan,” “tas,” and “tic” in “fantastic.”) That’s roughly corresponding to 1.4 million words, two hours of video, or 22 hours of audio—essentially the most context of any commercially available model.

In a briefing earlier this 12 months, Google showed off several pre-recorded demos intended as an instance the potential of Gemini’s long-context capabilities. One involved Gemini 1.5 Pro combing through the transcript of the Apollo 11 moon landing broadcast—some 402 pages—on the lookout for quotes containing jokes, then finding a scene in the printed that looked like a pencil sketch.

Advertisement

Google DeepMind’s vp of research Oriol Vinyals, who chaired the conference, called the model “magical.”

“(1.5 Pro) does these kinds of reasoning tasks on every page, on every word,” he said.

That may need been an exaggeration.

In one in all the aforementioned studies comparing these capabilities, Karpińska and researchers from the Allen Institute for AI and Princeton asked models to judge true/false statements about fiction books written in English. The researchers selected recent works in order that the models couldn’t “cheat” on prior knowledge, and so they supplemented the statements with references to specific details and plot points that will be not possible to know without reading the books of their entirety.

Advertisement

Given a press release such as “With her Apoth abilities, Nusis is able to reverse engineer a type of portal opened using the reagent key found in Rona’s wooden chest,” Gemini 1.5 Pro and 1.5 Flash — after swallowing the suitable book — had to find out whether the statement was true or false and explain their reasoning.

Image Credits: University of Massachusetts at Amherst

Tested on a single book of about 260,000 words (~520 pages), the researchers found that the 1.5 Pro accurately answered true/false statements 46.7% of the time, while Flash only answered accurately 20% of the time. This implies that the coin is significantly higher at answering questions on the book than Google’s latest machine learning model. Averaging across all benchmark results, neither model achieved higher than likelihood when it comes to accuracy in answering questions.

“We have noticed that models have greater difficulty verifying claims that require considering larger sections of a book, or even the entire book, compared to claims that can be solved by taking evidence at the sentence level,” Karpinska said. “Qualitatively, we also observed that models have difficulty validating claims for implicit information that are clear to a human reader but not explicitly stated in the text.”

The second of the 2 studies, co-authored by researchers at UC Santa Barbara, tested the power of Gemini 1.5 Flash (but not 1.5 Pro) to “reason” about videos — that’s, to seek out and answer questions on their content.

Advertisement

The co-authors created a data set of images (e.g., a photograph of a birthday cake) paired with questions for the model to reply concerning the objects depicted in the pictures (e.g., “What cartoon character is on this cake?”). To evaluate the models, they randomly chosen one in all the pictures and inserted “distraction” images before and after it to create a slideshow-like video.

Flash didn’t do thoroughly. In a test by which the model transcribed six handwritten digits from a “slideshow” of 25 images, Flash performed about 50% of the transcriptions accurately. Accuracy dropped to about 30% at eight digits.

“For real question-and-answer tasks in images, this seems particularly difficult for all the models we tested,” Michael Saxon, a doctoral student at UC Santa Barbara and one in all the study’s co-authors, told TechCrunch. “That little bit of reasoning — recognizing that a number is in a box and reading it — can be what breaks the model.”

Google is promising an excessive amount of with Gemini

Neither study was peer-reviewed, nor did it examine the launch of Gemini 1.5 Pro and 1.5 Flash with contexts of two million tokens. (Both tested context versions with 1 million tokens.) Flash just isn’t intended to be as efficient as Pro when it comes to performance; Google advertises it as a low-cost alternative.

Advertisement

Still, each add fuel to the fireplace that Google has been overpromising — and underdelivering — with Gemini from the beginning. None of the models the researchers tested, including OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet, performed well. But Google is the one model provider to place the context window at the highest of its list in its ads.

“There is nothing wrong with simply saying, ‘Our model can accept X tokens,’ based on objective technical details,” Saxon said. “But the question is: What useful thing can be done with it?”

Overall, generative AI is coming under increasing scrutiny as businesses (and investors) grow to be increasingly frustrated with the technology’s limitations.

In two recent Boston Consulting Group surveys, about half of respondents—all CEOs—said they didn’t expect generative AI to deliver significant productivity advantages and that they were concerned about potential errors and data breaches resulting from generative AI tools. PitchBook recently reported that early-stage generative AI deal activity has declined for 2 consecutive quarters, down 76% from its peak in Q3 2023.

Advertisement

With meeting recap chatbots conjuring fictitious details about people and AI search platforms which can be essentially plagiarism generators, customers are on the lookout for promising differentiators. Google — which had been racing, sometimes clumsily, to meet up with its rivals in the sphere of generative AI — desperately wanted the Gemini context to be one in all those differentiators.

However, it seems that the idea was premature.

“We haven’t figured out how to really show that ‘reasoning’ or ‘understanding’ is happening across long documents, and basically every group publishing these models is just pulling together their own ad hoc assessments to make these claims,” Karpińska said. “Without knowing how long the context processing is happening—and the companies don’t share that detail—it’s hard to say how realistic these claims are.”

Google didn’t reply to a request for comment.

Advertisement

Both Saxon and Karpińska consider that the antidote to the grandiose claims about generative AI is best benchmarks and, in the identical vein, a greater emphasis on third-party criticism. Saxon notes that one in all the more common long-context tests (heavily cited by Google in its marketing materials), the “needle in a haystack,” measures only a model’s ability to retrieve specific pieces of knowledge, such as names and numbers, from datasets—not how well it answers complex questions on that information.

“All scientists and most engineers using these models generally agree that our current benchmarking culture is broken,” Saxon said, “so it’s important that the public understands that these giant reports with numbers like ‘general intelligence in “comparative tests” with an enormous pinch of salt.”

This article was originally published on : techcrunch.com
Advertisement

Technology

The Legal Defense Fund withdraws from the META civil law advisory group over Dei Rolback

Published

on

By

Legal Defense Fund,, Meta, dei,


On April 11, the Legal Defense Fund announced that he was leaving the external advisory council for civil rights regarding the fear that the changes in technology company introduced diversity, own capital, inclusion and availability in January.

According to those changes that some perceived as the capitulation of meta against the upcoming Trump administration, contributed to their decision To leave the advisory council of the technology company.

In January, LDF, along with several other organizations of civil rights, which were a part of the board, sent a letter to Marek Zuckerberg, CEO of Meta, outlining their fears As for a way changes would negatively affect users.

Advertisement

“We are shocked and disappointed that the finish has not consulted with this group or its members, considering these significant changes in its content policy. Non -compliance with even its own advisory group of experts on external civil rights shows a cynical disregard for its diverse users base and undermines the commitment of the meta in the field of freedom of speech with which he claims to” return “.

They closed the letter, hoping that the finish would recommend the ideals of freedom of speech: “If the finish really wants to recommend freedom of speech, he must commit to freedom of speech for all his services. As an advisory group from external civil rights, we offer our advice and knowledge in creating a better path.”

These fears increased only in the next months, culminating in one other list, which from the LDF director, Todd A. Cox, who indicated that the organization withdraws its membership from the META civil law advisory council.

“I am deeply disturbed and disappointed with the announcement of Medical on January 7, 2025, with irresponsible changes in content moderation policies on platforms, which are a serious risk for the health and safety of black communities and risk that they destabilize our republic,” Cox wrote.

Advertisement

He continued: “For almost a decade, the NACP Legal Defense and Educational Fund, Inc. (LDF) has invested a lot of time and resources, working with META as part of the informal committee advising the company in matters of civil rights. However, the finish introduced these changes in the policy of the content modification without consulting this group, and many changes directly with the guidelines from the guidelines from LDF and partners. LD can no longer participate in the scope. ” Advisory Committee for Rights “

In a separate but related LDF list, it clearly resembled a finish about the actual obligations of the Citizens’ Rights Act of 1964 and other provisions regarding discrimination in the workplace, versus the false statements of the Trump administration, that diversity, justice and initiative to incorporate discriminates against white Americans.

“While the finish has modified its policy, its obligations arising from federal regulations regarding civil rights remain unchanged. The title of VII of the Act on civic rights of 1964 and other regulations on civil rights prohibit discrimination in the workplace, including disconnecting treatment, principles in the workplace which have unfair disproportionate effects, and the hostile work environment. Also when it comes to inclusion, and access programs.

In the LDF press release, announcing each letters, Cox He called attention Metal insert into growing violence and division in the country’s social climate.

Advertisement

“LDF worked hard and in good faith with meta leadership and its consulting group for civil rights to ensure that the company’s workforce reflects the values ​​and racial warehouses of the United States and to increase the security priorities of many different communities that use meta platforms,” ​​said Cox. “Now we cannot support a company in good conscience that consciously takes steps in order to introduce changes in politics that supply further division and violence in the United States. We call the meta to reverse the course with these dangerous changes.”

(Tagstranslate) TODD A. COX (T) Legal Defense Fund (T) META (T) Diversity (T) Equality (T) inclusion

This article was originally published on : www.blackenterprise.com
Advertisement
Continue Reading

Technology

Students of young, talented and black yale collect $ 3 million on a new application

Published

on

By


Nathaneo Johnson and Sean Hargrow, juniors from Yale University, collected $ 3 million in only 14 days to finance their startup, series, social application powered by AI, designed to support significant connections and challenge platforms, similar to LinkedIn and Instagram.

A duo that’s a co -host of the podcast A series of foundersHe created the application after recognizing the gap in the way in which digital platforms help people connect. SEries focuses moderately on facilitating authentic introductions than gathering likes, observing or involvement indicators.

“Social media is great for broadcasting, but it does not necessarily help you meet the right people at the right time,” said Johnson in an interview with Entrepreneur warehouse.

Advertisement

The series connects users through AI “friends” who communicate via IMessage and help to introduce. Users introduce specific needs-are on the lookout for co-founders, mentors, colleagues or investors-AI makes it easier to introduce based on mutual value. The concept attracts comparisons to LinkedIn, but with more personal experience.

“You publish photos on Instagram, publish movies on Tiktok and publish work posts on LinkedIn … And that’s where you have this microinfluuncer band,” Johnson added.

The application goals to avoid the superficial character of typical social platforms. Hargrow emphasized that although aesthetics often dominates on Instagram and the content virus drives tabktok, Number It is intentional, deliberate contacts.

“We are not trying to replace relationships in the real world-we are going to make it easier for people to find the right relationships,” said Hargrow.

Advertisement

Parable projects carried out before the seeded (*3*)Funding roundwhich included participation with Pear VC, DGB, VC, forty seventh Street, Radicle Impact, UNCASMON Projects and several famous Angels Investors, including the General Director of Reddit Steve Huffman and the founder of GPTZERO Edward Tian. Johnson called one meeting of investors “dinner for a million dollars”, reflecting how their pitch resonated with early supporters.

Although not the principal corporations, Johnson and Hargrow based pre-coreneuring through their podcast, through which they interviews the founders and leaders of C-Suite about less known elements of constructing the company-as accounting, business law and team formation.

Since the beginning of the series, over 32,000 messages between “friends” have been mentioned within the test phases. The initial goal of the application is the entrepreneurs market. Despite this, the founders hope to develop in finance, dating, education and health – ultimately striving to construct probably the most available warm network on the earth.

Advertisement

(Tagstranslate) VC (T) Yale (T) Venture Capital (T) Technology (T) APP

This article was originally published on : www.blackenterprise.com
Continue Reading

Technology

Tesla used cars offers rapidly increased in March

Published

on

By

Tesla cars sit in a dealership lot

The growing variety of Tesla owners puts their used vehicles on the market, because consumers react to the political activities of Elon Musk and the worldwide protests they were driven.

In March, the variety of used Tesla vehicles listed on the market at autotrader.com increased rapidly, Sherwood News announcedCiting data from the house company Autotrader Cox Automotive. The numbers were particularly high in the last week of March, when on average over 13,000 used Teslas was replaced. It was not only a record – a rise of 67% in comparison with the identical week of the yr earlier.

At the identical time, the sale of latest Tesla vehicles slowed down even when EV sales from other brands increases. In the primary quarter of 2025, almost 300,000 latest EVs were sold in the USA According to the most recent Kelley Blue Book reporta rise of 10.6% yr on yr. Meanwhile, Tesla sales fell in the primary quarter, which is nearly 9% in comparison with the identical period in 2024.

Advertisement

Automaks resembling GM and Hyundai are still behind Tesla. But they see growth growth. For example, GM brands sold over 30,000 EV in the primary quarter, almost double the amount of a yr ago, in line with Kelley Blue Book.

(Tagstranslat) electric vehicles

This article was originally published on : techcrunch.com
Continue Reading
Advertisement

OUR NEWSLETTER

Subscribe Us To Receive Our Latest News Directly In Your Inbox!

We don’t spam! Read our privacy policy for more info.

Trending