Connect with us

Technology

Gemini’s data analysis capabilities aren’t as good as Google claims

Published

on

In this photo illustration a Gemini logo and a welcome message on Gemini website are displayed on two screens.

One of the strengths of Google’s flagship generative AI models, Gemini 1.5 Pro and 1.5 Flash, is the quantity of data they’ll supposedly process and analyze. During press conferences and demonstrations, Google has repeatedly claimed that these models can perform previously not possible tasks due to “long context” such as summarizing multiple 100-page documents or looking through scenes in video footage.

But recent research suggests that these models actually aren’t very good at this stuff.

Two separate studies examined how well Google’s Gemini models and others make sense of big amounts of data—think the length of “War and Peace.” Both models find that Gemini 1.5 Pro and 1.5 Flash struggle to accurately answer questions on large data sets; in a single set of document-based tests, the models got the reply right only 40% and 50% of the time.

“While models like Gemini 1.5 Pro can technically process long contexts, we have seen many cases indicating that the models don’t actually ‘understand’ the content,” Marzena Karpińska, a postdoc at UMass Amherst and co-author on one in all the studios, told TechCrunch.

The Gemini context window is incomplete

Model context or context window refers back to the input data (e.g. text) that the model considers before generating output data (e.g. additional text). An easy query – “Who won the 2020 US presidential election?” — might be used as context, very similar to a script for a movie, show, or audio clip. As context windows grow, the scale of the documents they contain also increases.

The latest versions of Gemini can accept greater than 2 million tokens as context. (“Tokens” are broken-down chunks of raw data, such as the syllables “fan,” “tas,” and “tic” in “fantastic.”) That’s roughly corresponding to 1.4 million words, two hours of video, or 22 hours of audio—essentially the most context of any commercially available model.

In a briefing earlier this 12 months, Google showed off several pre-recorded demos intended as an instance the potential of Gemini’s long-context capabilities. One involved Gemini 1.5 Pro combing through the transcript of the Apollo 11 moon landing broadcast—some 402 pages—on the lookout for quotes containing jokes, then finding a scene in the printed that looked like a pencil sketch.

Google DeepMind’s vp of research Oriol Vinyals, who chaired the conference, called the model “magical.”

“(1.5 Pro) does these kinds of reasoning tasks on every page, on every word,” he said.

That may need been an exaggeration.

In one in all the aforementioned studies comparing these capabilities, Karpińska and researchers from the Allen Institute for AI and Princeton asked models to judge true/false statements about fiction books written in English. The researchers selected recent works in order that the models couldn’t “cheat” on prior knowledge, and so they supplemented the statements with references to specific details and plot points that will be not possible to know without reading the books of their entirety.

Given a press release such as “With her Apoth abilities, Nusis is able to reverse engineer a type of portal opened using the reagent key found in Rona’s wooden chest,” Gemini 1.5 Pro and 1.5 Flash — after swallowing the suitable book — had to find out whether the statement was true or false and explain their reasoning.

Image Credits: University of Massachusetts at Amherst

Tested on a single book of about 260,000 words (~520 pages), the researchers found that the 1.5 Pro accurately answered true/false statements 46.7% of the time, while Flash only answered accurately 20% of the time. This implies that the coin is significantly higher at answering questions on the book than Google’s latest machine learning model. Averaging across all benchmark results, neither model achieved higher than likelihood when it comes to accuracy in answering questions.

“We have noticed that models have greater difficulty verifying claims that require considering larger sections of a book, or even the entire book, compared to claims that can be solved by taking evidence at the sentence level,” Karpinska said. “Qualitatively, we also observed that models have difficulty validating claims for implicit information that are clear to a human reader but not explicitly stated in the text.”

The second of the 2 studies, co-authored by researchers at UC Santa Barbara, tested the power of Gemini 1.5 Flash (but not 1.5 Pro) to “reason” about videos — that’s, to seek out and answer questions on their content.

The co-authors created a data set of images (e.g., a photograph of a birthday cake) paired with questions for the model to reply concerning the objects depicted in the pictures (e.g., “What cartoon character is on this cake?”). To evaluate the models, they randomly chosen one in all the pictures and inserted “distraction” images before and after it to create a slideshow-like video.

Flash didn’t do thoroughly. In a test by which the model transcribed six handwritten digits from a “slideshow” of 25 images, Flash performed about 50% of the transcriptions accurately. Accuracy dropped to about 30% at eight digits.

“For real question-and-answer tasks in images, this seems particularly difficult for all the models we tested,” Michael Saxon, a doctoral student at UC Santa Barbara and one in all the study’s co-authors, told TechCrunch. “That little bit of reasoning — recognizing that a number is in a box and reading it — can be what breaks the model.”

Google is promising an excessive amount of with Gemini

Neither study was peer-reviewed, nor did it examine the launch of Gemini 1.5 Pro and 1.5 Flash with contexts of two million tokens. (Both tested context versions with 1 million tokens.) Flash just isn’t intended to be as efficient as Pro when it comes to performance; Google advertises it as a low-cost alternative.

Still, each add fuel to the fireplace that Google has been overpromising — and underdelivering — with Gemini from the beginning. None of the models the researchers tested, including OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet, performed well. But Google is the one model provider to place the context window at the highest of its list in its ads.

“There is nothing wrong with simply saying, ‘Our model can accept X tokens,’ based on objective technical details,” Saxon said. “But the question is: What useful thing can be done with it?”

Overall, generative AI is coming under increasing scrutiny as businesses (and investors) grow to be increasingly frustrated with the technology’s limitations.

In two recent Boston Consulting Group surveys, about half of respondents—all CEOs—said they didn’t expect generative AI to deliver significant productivity advantages and that they were concerned about potential errors and data breaches resulting from generative AI tools. PitchBook recently reported that early-stage generative AI deal activity has declined for 2 consecutive quarters, down 76% from its peak in Q3 2023.

With meeting recap chatbots conjuring fictitious details about people and AI search platforms which can be essentially plagiarism generators, customers are on the lookout for promising differentiators. Google — which had been racing, sometimes clumsily, to meet up with its rivals in the sphere of generative AI — desperately wanted the Gemini context to be one in all those differentiators.

However, it seems that the idea was premature.

“We haven’t figured out how to really show that ‘reasoning’ or ‘understanding’ is happening across long documents, and basically every group publishing these models is just pulling together their own ad hoc assessments to make these claims,” Karpińska said. “Without knowing how long the context processing is happening—and the companies don’t share that detail—it’s hard to say how realistic these claims are.”

Google didn’t reply to a request for comment.

Both Saxon and Karpińska consider that the antidote to the grandiose claims about generative AI is best benchmarks and, in the identical vein, a greater emphasis on third-party criticism. Saxon notes that one in all the more common long-context tests (heavily cited by Google in its marketing materials), the “needle in a haystack,” measures only a model’s ability to retrieve specific pieces of knowledge, such as names and numbers, from datasets—not how well it answers complex questions on that information.

“All scientists and most engineers using these models generally agree that our current benchmarking culture is broken,” Saxon said, “so it’s important that the public understands that these giant reports with numbers like ‘general intelligence in “comparative tests” with an enormous pinch of salt.”

This article was originally published on : techcrunch.com
Continue Reading
Advertisement
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Technology

US medical device giant Artivion says hackers stole files during a cybersecurity incident

Published

on

By

Artivion, a medical device company that produces implantable tissue for heart and vascular transplants, says its services have been “disrupted” resulting from a cybersecurity incident.

In 8-K filing In an interview with the SEC on Monday, Georgia-based Artivion, formerly CryoLife, said it became aware of a “cybersecurity incident” that involved the “compromise and encryption” of information on November 21. This suggests that the corporate was attacked by ransomware, but Artivion has not yet confirmed the character of the incident and didn’t immediately reply to TechCrunch’s questions. No major ransomware group has yet claimed responsibility for the attack.

Artivion said it took some systems offline in response to the cyberattack, which the corporate said caused “disruptions to certain ordering and shipping processes.”

Artivion, which reported third-quarter revenue of $95.8 million, said it didn’t expect the incident to have a material impact on the corporate’s funds.

This article was originally published on : techcrunch.com
Continue Reading

Technology

It’s a Raspberry Pi 5 in a keyboard and it’s called Raspberry Pi 500

Published

on

By

Manufacturer of single-board computers Raspberry Pi is updating its cute little computer keyboard device with higher specs. Named Raspberry Pi500This successor to the Raspberry Pi 400 is just as powerful as the present Raspberry Pi flagship, the Raspberry Pi 5. It is on the market for purchase now from Raspberry Pi resellers.

The Raspberry Pi 500 is the simplest method to start with the Raspberry Pi because it’s not as intimidating because the Raspberry Pi 5. When you take a look at the Raspberry Pi 500, you do not see any chipsets or PCBs (printed circuit boards). The Raspberry Pi is totally hidden in the familiar housing, the keyboard.

The idea with the Raspberry Pi 500 is you could connect a mouse and a display and you are able to go. If, for instance, you’ve got a relative who uses a very outdated computer with an outdated version of Windows, the Raspberry Pi 500 can easily replace the old PC tower for many computing tasks.

More importantly, this device brings us back to the roots of the Raspberry Pi. Raspberry Pi computers were originally intended for educational applications. Over time, technology enthusiasts and industrial customers began using single-board computers all over the place. (For example, when you’ve ever been to London Heathrow Airport, all of the departures and arrivals boards are there powered by Raspberry Pi.)

Raspberry Pi 500 draws inspiration from the roots of the Raspberry Pi Foundation, a non-profit organization. It’s the right first computer for college. In some ways, it’s a lot better than a Chromebook or iPad because it’s low cost and highly customizable, which inspires creative pondering.

The Raspberry Pi 500 comes with a 32GB SD card that comes pre-installed with Raspberry Pi OS, a Debian-based Linux distribution. It costs $90, which is a slight ($20) price increase over the Raspberry Pi 400.

Only UK and US keyboard variants will probably be available at launch. But versions with French, German, Italian, Japanese, Nordic and Spanish keyboard layouts will probably be available soon. And when you’re in search of a bundle that features all the things you would like, Raspberry Pi also offers a $120 desktop kit that features the Raspberry Pi 500, a mouse, a 27W USB-C power adapter, and a micro-HDMI to HDMI cable.

In other news, Raspberry Pi has announced one other recent thing: the Raspberry Pi monitor. It is a 15.6-inch 1080p monitor that’s priced at $100. Since there are quite a few 1080p portable monitors available on the market, this launch is not as noteworthy because the Pi 500. However, for die-hard Pi fans, there’s now also a Raspberry Pi-branded monitor option available.

Image credits:Raspberry Pi

This article was originally published on : techcrunch.com
Continue Reading

Technology

Apple Vision Pro may add support for PlayStation VR controllers

Published

on

By

Vision Pro headset

According to Apple, Apple desires to make its Vision Pro mixed reality device more attractive for gamers and game developers latest report from Bloomberg’s Mark Gurman.

The Vision Pro was presented more as a productivity and media consumption device than a tool geared toward gamers, due partly to its reliance on visual and hand controls moderately than a separate controller.

However, Apple may need gamers if it desires to expand the Vision Pro’s audience, especially since Gurman reports that lower than half one million units have been sold to this point. As such, the corporate has reportedly been in talks with Sony about adding support for PlayStation VR2 handheld controllers, and has also talked to developers about whether they may support the controllers of their games.

Offering more precise control, Apple may also make other forms of software available in Vision Pro, reminiscent of Final Cut Pro or Adobe Photoshop.

This article was originally published on : techcrunch.com
Continue Reading
Advertisement

OUR NEWSLETTER

Subscribe Us To Receive Our Latest News Directly In Your Inbox!

We don’t spam! Read our privacy policy for more info.

Trending