Connect with us

Technology

Gemini’s data analysis capabilities aren’t as good as Google claims

Published

on

In this photo illustration a Gemini logo and a welcome message on Gemini website are displayed on two screens.

One of the strengths of Google’s flagship generative AI models, Gemini 1.5 Pro and 1.5 Flash, is the quantity of data they’ll supposedly process and analyze. During press conferences and demonstrations, Google has repeatedly claimed that these models can perform previously not possible tasks due to “long context” such as summarizing multiple 100-page documents or looking through scenes in video footage.

But recent research suggests that these models actually aren’t very good at this stuff.

Two separate studies examined how well Google’s Gemini models and others make sense of big amounts of data—think the length of “War and Peace.” Both models find that Gemini 1.5 Pro and 1.5 Flash struggle to accurately answer questions on large data sets; in a single set of document-based tests, the models got the reply right only 40% and 50% of the time.

Advertisement

“While models like Gemini 1.5 Pro can technically process long contexts, we have seen many cases indicating that the models don’t actually ‘understand’ the content,” Marzena Karpińska, a postdoc at UMass Amherst and co-author on one in all the studios, told TechCrunch.

The Gemini context window is incomplete

Model context or context window refers back to the input data (e.g. text) that the model considers before generating output data (e.g. additional text). An easy query – “Who won the 2020 US presidential election?” — might be used as context, very similar to a script for a movie, show, or audio clip. As context windows grow, the scale of the documents they contain also increases.

The latest versions of Gemini can accept greater than 2 million tokens as context. (“Tokens” are broken-down chunks of raw data, such as the syllables “fan,” “tas,” and “tic” in “fantastic.”) That’s roughly corresponding to 1.4 million words, two hours of video, or 22 hours of audio—essentially the most context of any commercially available model.

In a briefing earlier this 12 months, Google showed off several pre-recorded demos intended as an instance the potential of Gemini’s long-context capabilities. One involved Gemini 1.5 Pro combing through the transcript of the Apollo 11 moon landing broadcast—some 402 pages—on the lookout for quotes containing jokes, then finding a scene in the printed that looked like a pencil sketch.

Advertisement

Google DeepMind’s vp of research Oriol Vinyals, who chaired the conference, called the model “magical.”

“(1.5 Pro) does these kinds of reasoning tasks on every page, on every word,” he said.

That may need been an exaggeration.

In one in all the aforementioned studies comparing these capabilities, Karpińska and researchers from the Allen Institute for AI and Princeton asked models to judge true/false statements about fiction books written in English. The researchers selected recent works in order that the models couldn’t “cheat” on prior knowledge, and so they supplemented the statements with references to specific details and plot points that will be not possible to know without reading the books of their entirety.

Advertisement

Given a press release such as “With her Apoth abilities, Nusis is able to reverse engineer a type of portal opened using the reagent key found in Rona’s wooden chest,” Gemini 1.5 Pro and 1.5 Flash — after swallowing the suitable book — had to find out whether the statement was true or false and explain their reasoning.

Image Credits: University of Massachusetts at Amherst

Tested on a single book of about 260,000 words (~520 pages), the researchers found that the 1.5 Pro accurately answered true/false statements 46.7% of the time, while Flash only answered accurately 20% of the time. This implies that the coin is significantly higher at answering questions on the book than Google’s latest machine learning model. Averaging across all benchmark results, neither model achieved higher than likelihood when it comes to accuracy in answering questions.

“We have noticed that models have greater difficulty verifying claims that require considering larger sections of a book, or even the entire book, compared to claims that can be solved by taking evidence at the sentence level,” Karpinska said. “Qualitatively, we also observed that models have difficulty validating claims for implicit information that are clear to a human reader but not explicitly stated in the text.”

The second of the 2 studies, co-authored by researchers at UC Santa Barbara, tested the power of Gemini 1.5 Flash (but not 1.5 Pro) to “reason” about videos — that’s, to seek out and answer questions on their content.

Advertisement

The co-authors created a data set of images (e.g., a photograph of a birthday cake) paired with questions for the model to reply concerning the objects depicted in the pictures (e.g., “What cartoon character is on this cake?”). To evaluate the models, they randomly chosen one in all the pictures and inserted “distraction” images before and after it to create a slideshow-like video.

Flash didn’t do thoroughly. In a test by which the model transcribed six handwritten digits from a “slideshow” of 25 images, Flash performed about 50% of the transcriptions accurately. Accuracy dropped to about 30% at eight digits.

“For real question-and-answer tasks in images, this seems particularly difficult for all the models we tested,” Michael Saxon, a doctoral student at UC Santa Barbara and one in all the study’s co-authors, told TechCrunch. “That little bit of reasoning — recognizing that a number is in a box and reading it — can be what breaks the model.”

Google is promising an excessive amount of with Gemini

Neither study was peer-reviewed, nor did it examine the launch of Gemini 1.5 Pro and 1.5 Flash with contexts of two million tokens. (Both tested context versions with 1 million tokens.) Flash just isn’t intended to be as efficient as Pro when it comes to performance; Google advertises it as a low-cost alternative.

Advertisement

Still, each add fuel to the fireplace that Google has been overpromising — and underdelivering — with Gemini from the beginning. None of the models the researchers tested, including OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet, performed well. But Google is the one model provider to place the context window at the highest of its list in its ads.

“There is nothing wrong with simply saying, ‘Our model can accept X tokens,’ based on objective technical details,” Saxon said. “But the question is: What useful thing can be done with it?”

Overall, generative AI is coming under increasing scrutiny as businesses (and investors) grow to be increasingly frustrated with the technology’s limitations.

In two recent Boston Consulting Group surveys, about half of respondents—all CEOs—said they didn’t expect generative AI to deliver significant productivity advantages and that they were concerned about potential errors and data breaches resulting from generative AI tools. PitchBook recently reported that early-stage generative AI deal activity has declined for 2 consecutive quarters, down 76% from its peak in Q3 2023.

Advertisement

With meeting recap chatbots conjuring fictitious details about people and AI search platforms which can be essentially plagiarism generators, customers are on the lookout for promising differentiators. Google — which had been racing, sometimes clumsily, to meet up with its rivals in the sphere of generative AI — desperately wanted the Gemini context to be one in all those differentiators.

However, it seems that the idea was premature.

“We haven’t figured out how to really show that ‘reasoning’ or ‘understanding’ is happening across long documents, and basically every group publishing these models is just pulling together their own ad hoc assessments to make these claims,” Karpińska said. “Without knowing how long the context processing is happening—and the companies don’t share that detail—it’s hard to say how realistic these claims are.”

Google didn’t reply to a request for comment.

Advertisement

Both Saxon and Karpińska consider that the antidote to the grandiose claims about generative AI is best benchmarks and, in the identical vein, a greater emphasis on third-party criticism. Saxon notes that one in all the more common long-context tests (heavily cited by Google in its marketing materials), the “needle in a haystack,” measures only a model’s ability to retrieve specific pieces of knowledge, such as names and numbers, from datasets—not how well it answers complex questions on that information.

“All scientists and most engineers using these models generally agree that our current benchmarking culture is broken,” Saxon said, “so it’s important that the public understands that these giant reports with numbers like ‘general intelligence in “comparative tests” with an enormous pinch of salt.”

This article was originally published on : techcrunch.com
Advertisement

Technology

Workday announces a new system for AI agents a few days after release

Published

on

By

agents, Workday, AI

The exemptions took place almost a week before the announcement of their new AI system.


After releasing about 1750 employees last week, the manufacturer of HR Workday software announced a new system for implementing artificial intelligence agents (AI).

Workday will introduce its agent record system on February 11 as confirmed in Press message. The system will manage all artificial Workday agents on working platforms and other corporations. Workday supports many corporations in software that supervises their funds and HR management.

Advertisement

This new system goals to assist corporations optimize their digital workforce. Provides performance and increase in AI inclusion. Given the increased use of artificial intelligence in company operations, Workday hopes to administer these new technologies as they progress.

The message appears after the working day reduced the working force by about 8.5%. Exemptions appear amongst greater AI integration, causing concerns concerning the availability of labor and security in a progressive society.

However, Aneel Bhusri, the chief chairman and co -founder of Workday, also emphasized that tomorrow’s labor force still depends upon the participation of individuals.

“The working strength of the future will cover both people and AI agents, and companies that do not learn to manage this extremely complex reality will stay behind quickly,” said Bhusri. We consider that no company on this planet is best than working day to introduce this new era of working force management in a trusted, ethical way. “

Advertisement

He added: “Our deep understanding of human skills and roles naturally extends to the management of digital working force. The future is here and, like the transition to the cloud, we are ready to help our clients reach it. “

The AI ​​system tries to “unlock the full potential” of those inhuman agents. This includes assistance in centralized management, improved agents and protected and compatible implementation. What’s more, the implementation of AI agents has a new set of autonomous skills to perform new tasks. These agents also analyze contracts, indicate incorrect payroll data and assistance in financial control and data information.

Workday also received support from the management of other corporations in the sector of potential and emotions related to programming artificial intelligence. Julie Sweet, general director of Accenture, emphasized that the system would transform corporations on this “new landscape”.

“We believe that the re -revision of the company in the AI ​​era will create a trouble -free professional experience between people and agents,” Sweet explained. “The agent’s life cycle should be fully managed. We need them to train. They must observe our conformity rules. They must understand our values ​​and must be monitored for performance. That is why it is exciting to see what a business day is doing to help companies manage this new landscape. ”

Advertisement


This article was originally published on : www.blackenterprise.com
Continue Reading

Technology

These Google Alternates photos offer a lot of storage options at a reasonable price

Published

on

By

woman taking selfie

Google photos are a great service for storing images on all devices. But Google and Gmail offer only 15 GB of memory at no cost. Google photos used to offer free unlimited storage of images, but this will not be the case.

If you might be searching for a higher plan to store photos, various functions or you only want to depart the Google ecosystem, listed below are some alternatives.

Free storage: 1000 photos

Advertisement

Usually storage services offer storage with a limit of size. But Flickr accepts a different approach: it allows people to store 1000 photos and films at no cost. One advantage of Flickr: You can send a picture as much as 200 MB, in comparison with the 75 MB limit in Google photos within the free plan. Flickr’s paid plans start from USD 10.44 per 30 days for a vast memory mass.

If you ought to look at functions beyond personal use, Flickr lets you make your photos public in order that others can find them. You may also join groups on the premise of various topics.

Free storage: 5 GB

Dropbox will not be a concentrated memory service across the photos, but it will possibly be a bonus if you ought to store things outside the cloud photos. The company paid plans from USD 9.99 per 30 days for two TB of memory, which is analogous Google One Premium plan.

Advertisement

Free storage: 5 GB

ENTE was created by a former Google engineer as a more private alternative to Google photos. The service has comprehensive protection protection protection, which suggests that the corporate doesn’t collect any data. The application is offered on various platforms and incorporates functions for identification and adnting of people; Show photos from different locations; And create categories comparable to sunsets, memes and documents. All that is processed in your device.

ENTE monthly plans start from USD 2.49 per 30 days for 50 GB of memory, which will be made available to 5 other people. The basic Ente code comes from an open, so you’ll be able to modify it to even have an independent version.

Image loans:Body

Free storage: 100 MB

Advertisement

Cryptete is one other photo service that focuses on privacy; It can also be Open Source. You can create an account with the username and password (there may be also the choice to make use of E -Mail and passwords). Although its free level doesn’t offer much space, the paid level starts from USD 3.30 per 30 days for 10 GB of memory. The service works on iOS, Android, Windows, Mac and Linux via a progressive web application and uses the AES256 encryption to guard the media.

In addition to being a photo storage service, it has a built -in document editor, which supports Markdown, Code and Katex Math. In addition, you should use the view on the side of the documents, store and edit files as PDF and DOCX, and use elements comparable to tables and selection fields.

Free storage: 5 GB with Prime membership

This is a welcome addition to Amazon Prime members. You can press additional photos that you may have at a free level of Amazon photos, and for those who want more, storage plans start from USD 1.99 per 30 days for 100 GB.

Advertisement
Image loans: Amazon photosImage loans:Amazon photos

Free storage: 21 high -resolution photos per week

500PX is more focused on hobbyists or skilled photographers. It has the functions of the community to emphasise your work and a approach to present your snaps in an uncompressed format. Plans cost lower than USD 50 a 12 months, with a discount, which lets you store unlimited high -resolution photos. Premium plans remove ads from the platform, and likewise offer insight into how your photos appear on the platform. Its higher PRO plan, at a price of almost $ 100 a 12 months, gives tools to construct a portfolio with a non -standard domain.

Image loans: 500pxImage loans:500PX

Free storage: No free storage

Offering a free level seems like a bummer, but Photobucket offers one of the bottom storage rates, with $ 5 per 30 days for 1 TB of memory. If you pay for the plan yearly, the price is lower. Photobucket offers a super approach to share photos with various groups with plans to bury the group for 8 USD per 30 days for storing 1 TB, which also provides access to editing tools.

(Tagstranslat) Google Photos

This article was originally published on : techcrunch.com
Advertisement
Continue Reading

Technology

Georgia drivers can leave physical licenses behind them

Published

on

By

Car, stolen car, electronics, Mother

Legislators from Georgia have introduced provisions enabling drivers to present e-monic on the interaction of law enforcement agencies.


Georgia drivers may soon leave the wallet at home and still have access to the license in coping with law enforcement agencies. According to WSBTV, the Chamber of Representatives of Georgia proposed a brand new bill, HB 296 to increase using e-manager’s license.

Thanks to the electronic driver’s license, drivers can present identification during traffic stops or other law enforcement interactions.

Advertisement

The e-identification of E for the inhabitants of Georgia is currently available within the Samsung, Google and Apple portfolio. Residents can also download the Apple ID. The use of driver E license was implemented in 2023 by the governor Brian Kemp.

Georgia Department of Services Driver I began to offer the choice Shortly after the governor was announced. The initial change allowed using a license at TSA safety checkpoints.

“As a country No. 1 for business, Georgia recognizes the value of finding new and innovative ways to remain at the forefront of emerging trends,” said Kemp. “I want to thank our wonderful team in DDS for cooperation with partners in the private sector, as well as TSA to enable this exciting new service. I expect this option to be widely available for hard -working Georgians and visitors. “

Digital identification is Slowly becomes a trend. Fourteen states have adopted a principle that enables residents to make use of this method to navigate transport safety control, in accordance with the TSA press message.

Advertisement

Other states offering mobile driving licenses are Arizona, California, Colorado, Georgia, Hawaii, Iowa, Louisiana, Maryland, New Mexico, New York, Ohio and Utah. Puerto Rico also allows them.


This article was originally published on : www.blackenterprise.com
Advertisement
Continue Reading
Advertisement

OUR NEWSLETTER

Subscribe Us To Receive Our Latest News Directly In Your Inbox!

We don’t spam! Read our privacy policy for more info.

Trending