Connect with us

Technology

Gemini’s data analysis capabilities aren’t as good as Google claims

Published

on

In this photo illustration a Gemini logo and a welcome message on Gemini website are displayed on two screens.

One of the strengths of Google’s flagship generative AI models, Gemini 1.5 Pro and 1.5 Flash, is the quantity of data they’ll supposedly process and analyze. During press conferences and demonstrations, Google has repeatedly claimed that these models can perform previously not possible tasks due to “long context” such as summarizing multiple 100-page documents or looking through scenes in video footage.

But recent research suggests that these models actually aren’t very good at this stuff.

Two separate studies examined how well Google’s Gemini models and others make sense of big amounts of data—think the length of “War and Peace.” Both models find that Gemini 1.5 Pro and 1.5 Flash struggle to accurately answer questions on large data sets; in a single set of document-based tests, the models got the reply right only 40% and 50% of the time.

Advertisement

“While models like Gemini 1.5 Pro can technically process long contexts, we have seen many cases indicating that the models don’t actually ‘understand’ the content,” Marzena Karpińska, a postdoc at UMass Amherst and co-author on one in all the studios, told TechCrunch.

The Gemini context window is incomplete

Model context or context window refers back to the input data (e.g. text) that the model considers before generating output data (e.g. additional text). An easy query – “Who won the 2020 US presidential election?” — might be used as context, very similar to a script for a movie, show, or audio clip. As context windows grow, the scale of the documents they contain also increases.

The latest versions of Gemini can accept greater than 2 million tokens as context. (“Tokens” are broken-down chunks of raw data, such as the syllables “fan,” “tas,” and “tic” in “fantastic.”) That’s roughly corresponding to 1.4 million words, two hours of video, or 22 hours of audio—essentially the most context of any commercially available model.

In a briefing earlier this 12 months, Google showed off several pre-recorded demos intended as an instance the potential of Gemini’s long-context capabilities. One involved Gemini 1.5 Pro combing through the transcript of the Apollo 11 moon landing broadcast—some 402 pages—on the lookout for quotes containing jokes, then finding a scene in the printed that looked like a pencil sketch.

Advertisement

Google DeepMind’s vp of research Oriol Vinyals, who chaired the conference, called the model “magical.”

“(1.5 Pro) does these kinds of reasoning tasks on every page, on every word,” he said.

That may need been an exaggeration.

In one in all the aforementioned studies comparing these capabilities, Karpińska and researchers from the Allen Institute for AI and Princeton asked models to judge true/false statements about fiction books written in English. The researchers selected recent works in order that the models couldn’t “cheat” on prior knowledge, and so they supplemented the statements with references to specific details and plot points that will be not possible to know without reading the books of their entirety.

Advertisement

Given a press release such as “With her Apoth abilities, Nusis is able to reverse engineer a type of portal opened using the reagent key found in Rona’s wooden chest,” Gemini 1.5 Pro and 1.5 Flash — after swallowing the suitable book — had to find out whether the statement was true or false and explain their reasoning.

Image Credits: University of Massachusetts at Amherst

Tested on a single book of about 260,000 words (~520 pages), the researchers found that the 1.5 Pro accurately answered true/false statements 46.7% of the time, while Flash only answered accurately 20% of the time. This implies that the coin is significantly higher at answering questions on the book than Google’s latest machine learning model. Averaging across all benchmark results, neither model achieved higher than likelihood when it comes to accuracy in answering questions.

“We have noticed that models have greater difficulty verifying claims that require considering larger sections of a book, or even the entire book, compared to claims that can be solved by taking evidence at the sentence level,” Karpinska said. “Qualitatively, we also observed that models have difficulty validating claims for implicit information that are clear to a human reader but not explicitly stated in the text.”

The second of the 2 studies, co-authored by researchers at UC Santa Barbara, tested the power of Gemini 1.5 Flash (but not 1.5 Pro) to “reason” about videos — that’s, to seek out and answer questions on their content.

Advertisement

The co-authors created a data set of images (e.g., a photograph of a birthday cake) paired with questions for the model to reply concerning the objects depicted in the pictures (e.g., “What cartoon character is on this cake?”). To evaluate the models, they randomly chosen one in all the pictures and inserted “distraction” images before and after it to create a slideshow-like video.

Flash didn’t do thoroughly. In a test by which the model transcribed six handwritten digits from a “slideshow” of 25 images, Flash performed about 50% of the transcriptions accurately. Accuracy dropped to about 30% at eight digits.

“For real question-and-answer tasks in images, this seems particularly difficult for all the models we tested,” Michael Saxon, a doctoral student at UC Santa Barbara and one in all the study’s co-authors, told TechCrunch. “That little bit of reasoning — recognizing that a number is in a box and reading it — can be what breaks the model.”

Google is promising an excessive amount of with Gemini

Neither study was peer-reviewed, nor did it examine the launch of Gemini 1.5 Pro and 1.5 Flash with contexts of two million tokens. (Both tested context versions with 1 million tokens.) Flash just isn’t intended to be as efficient as Pro when it comes to performance; Google advertises it as a low-cost alternative.

Advertisement

Still, each add fuel to the fireplace that Google has been overpromising — and underdelivering — with Gemini from the beginning. None of the models the researchers tested, including OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet, performed well. But Google is the one model provider to place the context window at the highest of its list in its ads.

“There is nothing wrong with simply saying, ‘Our model can accept X tokens,’ based on objective technical details,” Saxon said. “But the question is: What useful thing can be done with it?”

Overall, generative AI is coming under increasing scrutiny as businesses (and investors) grow to be increasingly frustrated with the technology’s limitations.

In two recent Boston Consulting Group surveys, about half of respondents—all CEOs—said they didn’t expect generative AI to deliver significant productivity advantages and that they were concerned about potential errors and data breaches resulting from generative AI tools. PitchBook recently reported that early-stage generative AI deal activity has declined for 2 consecutive quarters, down 76% from its peak in Q3 2023.

Advertisement

With meeting recap chatbots conjuring fictitious details about people and AI search platforms which can be essentially plagiarism generators, customers are on the lookout for promising differentiators. Google — which had been racing, sometimes clumsily, to meet up with its rivals in the sphere of generative AI — desperately wanted the Gemini context to be one in all those differentiators.

However, it seems that the idea was premature.

“We haven’t figured out how to really show that ‘reasoning’ or ‘understanding’ is happening across long documents, and basically every group publishing these models is just pulling together their own ad hoc assessments to make these claims,” Karpińska said. “Without knowing how long the context processing is happening—and the companies don’t share that detail—it’s hard to say how realistic these claims are.”

Google didn’t reply to a request for comment.

Advertisement

Both Saxon and Karpińska consider that the antidote to the grandiose claims about generative AI is best benchmarks and, in the identical vein, a greater emphasis on third-party criticism. Saxon notes that one in all the more common long-context tests (heavily cited by Google in its marketing materials), the “needle in a haystack,” measures only a model’s ability to retrieve specific pieces of knowledge, such as names and numbers, from datasets—not how well it answers complex questions on that information.

“All scientists and most engineers using these models generally agree that our current benchmarking culture is broken,” Saxon said, “so it’s important that the public understands that these giant reports with numbers like ‘general intelligence in “comparative tests” with an enormous pinch of salt.”

This article was originally published on : techcrunch.com
Advertisement

Technology

Meta introduces community notes in the United States next week

Published

on

By

Next week, MET will start considered one of the company’s most vital renovation to the extent that it informs details about facts on its platforms.

On March 18, Meta will start issuing versions of social notes for Facebook, Instagram and doubts in the United States. The program copies the facts checking system, which Twitter presented in 2021 and have become the only approach to improve the introductory information after Elon Musk turned the platform into X.

Managing finish say that they deal with receiving social notes in the US before it introduces a function in other countries. It is a region with a high rate for testing a crucial latest function, taking into consideration that the US is the most lucrative finish market, but the finish line can hesitate before the introduction of social notes in other regions, equivalent to the European Union, where the European Commission is currently Study X regarding the effectiveness of social notes.

Advertisement

This movement may also signal the desire Marek Zuckerberg Marek I criticized the finish for the censorship of conservative points of view.

Facebook users in the USA will soon see community notes (loan: meta)
Clicking notes will show more information (loan: meta)

Zuckerberg announced for the first time these changes in January as a part of a broader effort to make sure oxygen of a greater perspective on its platforms. Since 2016, the finish has consisted in checking the facts of other corporations in order to confirm information on their platforms, but Neil Potts, Vice President of Public Policy Meta, told reporters on Wednesday that the systems are too biased, they usually are not scalable enough and made too many mistakes.

For example, Potts said that the finish used false labels of checking facts to An article of opinion on climate change This appeared in Fox News and Wall Street Journal. Otherwise, Zuckerberg recently told Joe Rogana in Podcast that the finish shouldn’t reject the fears of Covid-19 vaccines as disinformation.

The meta hopes that community notes will seek advice from the public belief that they’re biased, make fewer mistakes and presents a more scalable system of checking facts, which ultimately solves more disinformation. However, Meta notes that this method doesn’t replace community standards – the company’s principles that determine whether posts are considered hate speech, fraud or other forbidden content.

A review of METAM content moderation appears at a time when many technology corporations try to unravel historical prejudices against conservatives. X led the efforts of the industry, and Elon Musk claims to pay attention his social platform around “Freedom of Speech”. Opeli recently announced that he was changing the way he trains AI models to just accept “intellectual freedom” and said that he would act in order to not censor some points of view.

Advertisement

Rachel Lambert, Meta product management director, said in Wednesday’s briefing that META is predicated on its latest system of checking facts from open Source X algorithms around social notes.

Meta opened applications for colleagues for its Notebook Network community in February. Meta -collaborators will give you the option to suggest that directly checking the facts of claims in the Facebook, Instagram or thread post. Other colleagues will then assess the note as helpful or not helpful, partly determining whether the social note will appear to be other users.

Co -founders can assess the help of the notes (loan: meta)
(Loan: meta)

Like the X system, the Community Meta notes assesses which colleagues normally don’t agree on posts. Using this information, the meta will only display a note when pages that sometimes oppose one another agree that the note is useful.

Even if most Meta collaborators think that a social note is required, this doesn’t mean that you just might be shown. In addition, the meta claims that it would not lower the post or account in its algorithms, even when the social note is displayed on the post.

Over the years, crowdsourcing systems, equivalent to social notes, have been seen as promising solutions regarding disinformation in social media, but they’ve flaws.

Advertisement

On the other hand, scientists have found that individuals consider community notes as more trustworthy than flags from the third perpetrators of facts, in accordance with the study published in the journal Science.

In one other large -scale study on the System of Checking Facts X Scientists with University of Luxembourg He stated that posts with attached social notes reduce the spread of misleading posts by a median of 61%.

But many posts don’t attach notes to them, or it takes too long. Because X, and shortly the finish require social notes to realize a consensus amongst colleagues with opposite points of view, this often signifies that checking the facts are added only after reaching 1000’s or tens of millions of individuals.

The same study of the University of Luxembourg also showed that community notes could also be too slow to intervene in the early and most viral stage of fasting.

Advertisement

Last test with Digital Hate Advisory Center He emphasizes the puzzle. Researchers attempted positions containing disinformation of elections on X and stated that colleagues suggested accurate, vital details about these posts in 81% of cases.

However, from amongst those posts that received suggestions, only 9% received a consensus amongst colleagues, which implies that the overwhelming majority of those posts didn’t appear with any facts.

(Tagstranslate) Facebook

This article was originally published on : techcrunch.com
Advertisement
Continue Reading

Technology

The Tesla dealer in France set fire to a group addressed to musk

Published

on

By

fire, Tesla, Elon Musk, group, attack, France

Tesla’s dozen vehicles were set on fire, with 8 completely damaged.


According to the French Tesla store in France, he was set on fire in what the authorities call a suspicious attack on arson

The fire broke out on March 2 in the suburbs of Toulouse from Plaisance-Dou-Touch. Eight Tesla vehicles have been destroyed, which causes an estimated lack of 700,000 euros (756 280 USD), and 4 other Teslas have been seriously damaged.

Advertisement

The city prosecutor’s office stated that the fire was “no accident at all.”

Investigators imagine that arsonists cut a hole in the dealer’s peripheral fence before setting on. March 4 Anarchist Group of southern France He demanded responsibilitycalling the attack a part of his mission to fight fascism.

Today, there may be an acceleration of fascist, patriarchal, exobal and colonial design. While the elites reproduce Nazi greetings, we decided to salute in our own way, Tesla’s dealer on the night of March 2-3, 2025 in Plaisance-Dou-Touch.
We burned the vehicles in the housing with two gasoline cans. Later, we wondered if fire boards could be a simpler way.

“Today there is an acceleration of fascist, patriarchal, genocidal and colonial design. While the elites propagate Nazi salutes, we decided to welcome the Tesla dealer in our own way at night from March 2-3, 2025, in Plaisance-Du-Touch. We set on fire in vehicles inside the complex with two gasoline cans, “said the group in a statement.

Group message encouraged others to take similar actions Against Tesla CEO, Elon Musk and his billionaire allies. His slogan: “Hello spring, burn Tesla!” He suggests that there could also be more attacks.

Advertisement

While the French government didn’t accept attacks on Muska, officials have He was critical of him recently.

“Ten years ago, if we were told that the owner of one of the largest social networks in the world would support a new international reaction and intervene directly in elections, including in Germany, who would imagine it?” French President Emmanuel Macron said.

Macron’s comments refer to Musk’s presence on the far -right German political event where he expressed support for a supposedly extremist party. Musk called members not to be ashamed of the party’s past, which incorporates help in the Holocaust. Ethnic nationalism also seemed to be supported.

“It is good to be proud of German culture and German values, and not lose it in some multiculturalism that dilutes everything,” said Musk.

Advertisement

Since joining the administration of Trump, Musk drew controversy regarding inflammatory statements and actions.

During the presidential inauguration, musk was seen Chest movement Interpreted as a Nazi salut on the stage. The owner X denied that the gesture was deliberate, but many disagreed.

Musk also promoted the concept of ​​taking on Canada and Denmark and include them in the United States. Citizens of countries spoke noisely against what they consider to be hostile rhetoric.

Since the influence of musk becomes an increasing number of controversial, more groups turned to direct motion against it, each in the US and abroad.

Advertisement


This article was originally published on : www.blackenterprise.com
Continue Reading

Technology

Uber finishes the acquisition of foodpand Taiwan, citing regulatory obstacles

Published

on

By

Uber Eats bike courier

Uber Technologies has accomplished the acquisition of foodpand Hero in Taiwan, a German technology company he said on Tuesday.

The commercial appears about three months later Taiwan antitrust regulator he blocked the contract, citing Competitive problems. The Honest Trade Commission (FTC) said that if Uber purchased a foodpand, its market share in Taiwan will increase to 90%, potentially resulting in Uber’s price increase.

Uber eats and foodpanda are The best players On the food supply market in Taiwan. In the last report was found This foodpand enjoyed a 52% market share from January 2022 to August 2023, while Uber Eats had 48%. Food supply corporations, reminiscent of foodomo and lots of other applications for providing fast food, constitute a small percentage of Taiwan market share.

Advertisement

Pursuant to the contract signed on May 14, 2024, Uber is obliged to pay a termination fee, which is estimated at around USD 250 million.

Uber and the delivery hero didn’t immediately reply to the technical request for comment.

When Uber announced He would buy Taiwan Division Foodpand from the delivery hero, he expected to finish the contract in the first half of 2025. Transferring in accordance with Uber Eatsa’s plan in Asia, especially by strengthening its presence in Taiwan. Both corporations also got involved in a separate agreement wherein Uber agreed to purchase $ 300 million of newly broadcast unusual shares from the hero of delivery.

The contract also emphasized the further withdrawal of the hero of delivery from the same market. At that point, the delivery hero tried to sell a package of other Southeast Asia operations – including in Singapore, Cambodia, Laos, Malaysia, Myanmar, the Philippines and Thailand – an undisclosed third side. In September 2023 ended these discussionsAn announcement in a press release that “the decision to solve negotiations after months of discussion was made after thoroughly considered.”

Advertisement

The hero’s food supply division compete with hornbeams in Southeast Asia.

In September, his foodpanda unit released the exemption geared toward improving operations before potential sales. The cuts previously occurred with dismissals from staff in 2022 and 2023.

(Tagstotransate) Asia

This article was originally published on : techcrunch.com
Advertisement
Continue Reading
Advertisement

OUR NEWSLETTER

Subscribe Us To Receive Our Latest News Directly In Your Inbox!

We don’t spam! Read our privacy policy for more info.

Trending