Technology
Gemini’s data analysis capabilities aren’t as good as Google claims
![In this photo illustration a Gemini logo and a welcome message on Gemini website are displayed on two screens.](https://360wisemedia.com/wp-content/uploads/2024/06/Geminis-data-analysis-capabilities-arent-as-good-as-Google-claims.jpg)
One of the strengths of Google’s flagship generative AI models, Gemini 1.5 Pro and 1.5 Flash, is the quantity of data they’ll supposedly process and analyze. During press conferences and demonstrations, Google has repeatedly claimed that these models can perform previously not possible tasks due to “long context” such as summarizing multiple 100-page documents or looking through scenes in video footage.
But recent research suggests that these models actually aren’t very good at this stuff.
Two separate studies examined how well Google’s Gemini models and others make sense of big amounts of data—think the length of “War and Peace.” Both models find that Gemini 1.5 Pro and 1.5 Flash struggle to accurately answer questions on large data sets; in a single set of document-based tests, the models got the reply right only 40% and 50% of the time.
“While models like Gemini 1.5 Pro can technically process long contexts, we have seen many cases indicating that the models don’t actually ‘understand’ the content,” Marzena Karpińska, a postdoc at UMass Amherst and co-author on one in all the studios, told TechCrunch.
The Gemini context window is incomplete
Model context or context window refers back to the input data (e.g. text) that the model considers before generating output data (e.g. additional text). An easy query – “Who won the 2020 US presidential election?” — might be used as context, very similar to a script for a movie, show, or audio clip. As context windows grow, the scale of the documents they contain also increases.
The latest versions of Gemini can accept greater than 2 million tokens as context. (“Tokens” are broken-down chunks of raw data, such as the syllables “fan,” “tas,” and “tic” in “fantastic.”) That’s roughly corresponding to 1.4 million words, two hours of video, or 22 hours of audio—essentially the most context of any commercially available model.
In a briefing earlier this 12 months, Google showed off several pre-recorded demos intended as an instance the potential of Gemini’s long-context capabilities. One involved Gemini 1.5 Pro combing through the transcript of the Apollo 11 moon landing broadcast—some 402 pages—on the lookout for quotes containing jokes, then finding a scene in the printed that looked like a pencil sketch.
Google DeepMind’s vp of research Oriol Vinyals, who chaired the conference, called the model “magical.”
“(1.5 Pro) does these kinds of reasoning tasks on every page, on every word,” he said.
That may need been an exaggeration.
In one in all the aforementioned studies comparing these capabilities, Karpińska and researchers from the Allen Institute for AI and Princeton asked models to judge true/false statements about fiction books written in English. The researchers selected recent works in order that the models couldn’t “cheat” on prior knowledge, and so they supplemented the statements with references to specific details and plot points that will be not possible to know without reading the books of their entirety.
Given a press release such as “With her Apoth abilities, Nusis is able to reverse engineer a type of portal opened using the reagent key found in Rona’s wooden chest,” Gemini 1.5 Pro and 1.5 Flash — after swallowing the suitable book — had to find out whether the statement was true or false and explain their reasoning.
Tested on a single book of about 260,000 words (~520 pages), the researchers found that the 1.5 Pro accurately answered true/false statements 46.7% of the time, while Flash only answered accurately 20% of the time. This implies that the coin is significantly higher at answering questions on the book than Google’s latest machine learning model. Averaging across all benchmark results, neither model achieved higher than likelihood when it comes to accuracy in answering questions.
“We have noticed that models have greater difficulty verifying claims that require considering larger sections of a book, or even the entire book, compared to claims that can be solved by taking evidence at the sentence level,” Karpinska said. “Qualitatively, we also observed that models have difficulty validating claims for implicit information that are clear to a human reader but not explicitly stated in the text.”
The second of the 2 studies, co-authored by researchers at UC Santa Barbara, tested the power of Gemini 1.5 Flash (but not 1.5 Pro) to “reason” about videos — that’s, to seek out and answer questions on their content.
The co-authors created a data set of images (e.g., a photograph of a birthday cake) paired with questions for the model to reply concerning the objects depicted in the pictures (e.g., “What cartoon character is on this cake?”). To evaluate the models, they randomly chosen one in all the pictures and inserted “distraction” images before and after it to create a slideshow-like video.
Flash didn’t do thoroughly. In a test by which the model transcribed six handwritten digits from a “slideshow” of 25 images, Flash performed about 50% of the transcriptions accurately. Accuracy dropped to about 30% at eight digits.
“For real question-and-answer tasks in images, this seems particularly difficult for all the models we tested,” Michael Saxon, a doctoral student at UC Santa Barbara and one in all the study’s co-authors, told TechCrunch. “That little bit of reasoning — recognizing that a number is in a box and reading it — can be what breaks the model.”
Google is promising an excessive amount of with Gemini
Neither study was peer-reviewed, nor did it examine the launch of Gemini 1.5 Pro and 1.5 Flash with contexts of two million tokens. (Both tested context versions with 1 million tokens.) Flash just isn’t intended to be as efficient as Pro when it comes to performance; Google advertises it as a low-cost alternative.
Still, each add fuel to the fireplace that Google has been overpromising — and underdelivering — with Gemini from the beginning. None of the models the researchers tested, including OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet, performed well. But Google is the one model provider to place the context window at the highest of its list in its ads.
“There is nothing wrong with simply saying, ‘Our model can accept X tokens,’ based on objective technical details,” Saxon said. “But the question is: What useful thing can be done with it?”
Overall, generative AI is coming under increasing scrutiny as businesses (and investors) grow to be increasingly frustrated with the technology’s limitations.
In two recent Boston Consulting Group surveys, about half of respondents—all CEOs—said they didn’t expect generative AI to deliver significant productivity advantages and that they were concerned about potential errors and data breaches resulting from generative AI tools. PitchBook recently reported that early-stage generative AI deal activity has declined for 2 consecutive quarters, down 76% from its peak in Q3 2023.
With meeting recap chatbots conjuring fictitious details about people and AI search platforms which can be essentially plagiarism generators, customers are on the lookout for promising differentiators. Google — which had been racing, sometimes clumsily, to meet up with its rivals in the sphere of generative AI — desperately wanted the Gemini context to be one in all those differentiators.
However, it seems that the idea was premature.
“We haven’t figured out how to really show that ‘reasoning’ or ‘understanding’ is happening across long documents, and basically every group publishing these models is just pulling together their own ad hoc assessments to make these claims,” Karpińska said. “Without knowing how long the context processing is happening—and the companies don’t share that detail—it’s hard to say how realistic these claims are.”
Google didn’t reply to a request for comment.
Both Saxon and Karpińska consider that the antidote to the grandiose claims about generative AI is best benchmarks and, in the identical vein, a greater emphasis on third-party criticism. Saxon notes that one in all the more common long-context tests (heavily cited by Google in its marketing materials), the “needle in a haystack,” measures only a model’s ability to retrieve specific pieces of knowledge, such as names and numbers, from datasets—not how well it answers complex questions on that information.
“All scientists and most engineers using these models generally agree that our current benchmarking culture is broken,” Saxon said, “so it’s important that the public understands that these giant reports with numbers like ‘general intelligence in “comparative tests” with an enormous pinch of salt.”
Technology
Workday announces a new system for AI agents a few days after release
![agents, Workday, AI](https://360wisemedia.com/wp-content/uploads/2025/02/Workday-announces-a-new-system-for-AI-agents-a-few.jpg)
The exemptions took place almost a week before the announcement of their new AI system.
After releasing about 1750 employees last week, the manufacturer of HR Workday software announced a new system for implementing artificial intelligence agents (AI).
Workday will introduce its agent record system on February 11 as confirmed in Press message. The system will manage all artificial Workday agents on working platforms and other corporations. Workday supports many corporations in software that supervises their funds and HR management.
This new system goals to assist corporations optimize their digital workforce. Provides performance and increase in AI inclusion. Given the increased use of artificial intelligence in company operations, Workday hopes to administer these new technologies as they progress.
The message appears after the working day reduced the working force by about 8.5%. Exemptions appear amongst greater AI integration, causing concerns concerning the availability of labor and security in a progressive society.
However, Aneel Bhusri, the chief chairman and co -founder of Workday, also emphasized that tomorrow’s labor force still depends upon the participation of individuals.
“The working strength of the future will cover both people and AI agents, and companies that do not learn to manage this extremely complex reality will stay behind quickly,” said Bhusri. We consider that no company on this planet is best than working day to introduce this new era of working force management in a trusted, ethical way. “
He added: “Our deep understanding of human skills and roles naturally extends to the management of digital working force. The future is here and, like the transition to the cloud, we are ready to help our clients reach it. “
The AI system tries to “unlock the full potential” of those inhuman agents. This includes assistance in centralized management, improved agents and protected and compatible implementation. What’s more, the implementation of AI agents has a new set of autonomous skills to perform new tasks. These agents also analyze contracts, indicate incorrect payroll data and assistance in financial control and data information.
Workday also received support from the management of other corporations in the sector of potential and emotions related to programming artificial intelligence. Julie Sweet, general director of Accenture, emphasized that the system would transform corporations on this “new landscape”.
“We believe that the re -revision of the company in the AI era will create a trouble -free professional experience between people and agents,” Sweet explained. “The agent’s life cycle should be fully managed. We need them to train. They must observe our conformity rules. They must understand our values and must be monitored for performance. That is why it is exciting to see what a business day is doing to help companies manage this new landscape. ”
Technology
These Google Alternates photos offer a lot of storage options at a reasonable price
![woman taking selfie](https://360wisemedia.com/wp-content/uploads/2025/02/These-Google-Alternates-photos-offer-a-lot-of-storage-options.jpg)
Google photos are a great service for storing images on all devices. But Google and Gmail offer only 15 GB of memory at no cost. Google photos used to offer free unlimited storage of images, but this will not be the case.
If you might be searching for a higher plan to store photos, various functions or you only want to depart the Google ecosystem, listed below are some alternatives.
Free storage: 1000 photos
Usually storage services offer storage with a limit of size. But Flickr accepts a different approach: it allows people to store 1000 photos and films at no cost. One advantage of Flickr: You can send a picture as much as 200 MB, in comparison with the 75 MB limit in Google photos within the free plan. Flickr’s paid plans start from USD 10.44 per 30 days for a vast memory mass.
If you ought to look at functions beyond personal use, Flickr lets you make your photos public in order that others can find them. You may also join groups on the premise of various topics.
Free storage: 5 GB
Dropbox will not be a concentrated memory service across the photos, but it will possibly be a bonus if you ought to store things outside the cloud photos. The company paid plans from USD 9.99 per 30 days for two TB of memory, which is analogous Google One Premium plan.
Free storage: 5 GB
ENTE was created by a former Google engineer as a more private alternative to Google photos. The service has comprehensive protection protection protection, which suggests that the corporate doesn’t collect any data. The application is offered on various platforms and incorporates functions for identification and adnting of people; Show photos from different locations; And create categories comparable to sunsets, memes and documents. All that is processed in your device.
ENTE monthly plans start from USD 2.49 per 30 days for 50 GB of memory, which will be made available to 5 other people. The basic Ente code comes from an open, so you’ll be able to modify it to even have an independent version.
Free storage: 100 MB
Cryptete is one other photo service that focuses on privacy; It can also be Open Source. You can create an account with the username and password (there may be also the choice to make use of E -Mail and passwords). Although its free level doesn’t offer much space, the paid level starts from USD 3.30 per 30 days for 10 GB of memory. The service works on iOS, Android, Windows, Mac and Linux via a progressive web application and uses the AES256 encryption to guard the media.
In addition to being a photo storage service, it has a built -in document editor, which supports Markdown, Code and Katex Math. In addition, you should use the view on the side of the documents, store and edit files as PDF and DOCX, and use elements comparable to tables and selection fields.
Free storage: 5 GB with Prime membership
This is a welcome addition to Amazon Prime members. You can press additional photos that you may have at a free level of Amazon photos, and for those who want more, storage plans start from USD 1.99 per 30 days for 100 GB.
![](https://techcrunch.com/wp-content/uploads/2025/02/Amazon-Photos.jpeg?w=680)
Free storage: 21 high -resolution photos per week
500PX is more focused on hobbyists or skilled photographers. It has the functions of the community to emphasise your work and a approach to present your snaps in an uncompressed format. Plans cost lower than USD 50 a 12 months, with a discount, which lets you store unlimited high -resolution photos. Premium plans remove ads from the platform, and likewise offer insight into how your photos appear on the platform. Its higher PRO plan, at a price of almost $ 100 a 12 months, gives tools to construct a portfolio with a non -standard domain.
Free storage: No free storage
Offering a free level seems like a bummer, but Photobucket offers one of the bottom storage rates, with $ 5 per 30 days for 1 TB of memory. If you pay for the plan yearly, the price is lower. Photobucket offers a super approach to share photos with various groups with plans to bury the group for 8 USD per 30 days for storing 1 TB, which also provides access to editing tools.
(Tagstranslat) Google Photos
Technology
Georgia drivers can leave physical licenses behind them
![Car, stolen car, electronics, Mother](https://360wisemedia.com/wp-content/uploads/2025/02/Georgia-drivers-can-leave-physical-licenses-behind-them.jpg)
Legislators from Georgia have introduced provisions enabling drivers to present e-monic on the interaction of law enforcement agencies.
Georgia drivers may soon leave the wallet at home and still have access to the license in coping with law enforcement agencies. According to WSBTV, the Chamber of Representatives of Georgia proposed a brand new bill, HB 296 to increase using e-manager’s license.
Thanks to the electronic driver’s license, drivers can present identification during traffic stops or other law enforcement interactions.
The e-identification of E for the inhabitants of Georgia is currently available within the Samsung, Google and Apple portfolio. Residents can also download the Apple ID. The use of driver E license was implemented in 2023 by the governor Brian Kemp.
Georgia Department of Services Driver I began to offer the choice Shortly after the governor was announced. The initial change allowed using a license at TSA safety checkpoints.
“As a country No. 1 for business, Georgia recognizes the value of finding new and innovative ways to remain at the forefront of emerging trends,” said Kemp. “I want to thank our wonderful team in DDS for cooperation with partners in the private sector, as well as TSA to enable this exciting new service. I expect this option to be widely available for hard -working Georgians and visitors. “
-
Press Release11 months ago
CEO of 360WiSE Launches Mentorship Program in Overtown Miami FL
-
Press Release10 months ago
U.S.-Africa Chamber of Commerce Appoints Robert Alexander of 360WiseMedia as Board Director
-
Business and Finance9 months ago
The Importance of Owning Your Distribution Media Platform
-
Business and Finance11 months ago
360Wise Media and McDonald’s NY Tri-State Owner Operators Celebrate Success of “Faces of Black History” Campaign with Over 2 Million Event Visits
-
Ben Crump10 months ago
Another lawsuit accuses Google of bias against Black minority employees
-
Theater11 months ago
Telling the story of the Apollo Theater
-
Ben Crump11 months ago
Henrietta Lacks’ family members reach an agreement after her cells undergo advanced medical tests
-
Ben Crump11 months ago
The families of George Floyd and Daunte Wright hold an emotional press conference in Minneapolis
-
Theater11 months ago
Applications open for the 2020-2021 Soul Producing National Black Theater residency – Black Theater Matters
-
Theater9 months ago
Cultural icon Apollo Theater sets new goals on the occasion of its 85th anniversary