Connect with us

Technology

Gemini’s data analysis capabilities aren’t as good as Google claims

Published

on

In this photo illustration a Gemini logo and a welcome message on Gemini website are displayed on two screens.

One of the strengths of Google’s flagship generative AI models, Gemini 1.5 Pro and 1.5 Flash, is the quantity of data they’ll supposedly process and analyze. During press conferences and demonstrations, Google has repeatedly claimed that these models can perform previously not possible tasks due to “long context” such as summarizing multiple 100-page documents or looking through scenes in video footage.

But recent research suggests that these models actually aren’t very good at this stuff.

Two separate studies examined how well Google’s Gemini models and others make sense of big amounts of data—think the length of “War and Peace.” Both models find that Gemini 1.5 Pro and 1.5 Flash struggle to accurately answer questions on large data sets; in a single set of document-based tests, the models got the reply right only 40% and 50% of the time.

“While models like Gemini 1.5 Pro can technically process long contexts, we have seen many cases indicating that the models don’t actually ‘understand’ the content,” Marzena Karpińska, a postdoc at UMass Amherst and co-author on one in all the studios, told TechCrunch.

The Gemini context window is incomplete

Model context or context window refers back to the input data (e.g. text) that the model considers before generating output data (e.g. additional text). An easy query – “Who won the 2020 US presidential election?” — might be used as context, very similar to a script for a movie, show, or audio clip. As context windows grow, the scale of the documents they contain also increases.

The latest versions of Gemini can accept greater than 2 million tokens as context. (“Tokens” are broken-down chunks of raw data, such as the syllables “fan,” “tas,” and “tic” in “fantastic.”) That’s roughly corresponding to 1.4 million words, two hours of video, or 22 hours of audio—essentially the most context of any commercially available model.

In a briefing earlier this 12 months, Google showed off several pre-recorded demos intended as an instance the potential of Gemini’s long-context capabilities. One involved Gemini 1.5 Pro combing through the transcript of the Apollo 11 moon landing broadcast—some 402 pages—on the lookout for quotes containing jokes, then finding a scene in the printed that looked like a pencil sketch.

Google DeepMind’s vp of research Oriol Vinyals, who chaired the conference, called the model “magical.”

“(1.5 Pro) does these kinds of reasoning tasks on every page, on every word,” he said.

That may need been an exaggeration.

In one in all the aforementioned studies comparing these capabilities, Karpińska and researchers from the Allen Institute for AI and Princeton asked models to judge true/false statements about fiction books written in English. The researchers selected recent works in order that the models couldn’t “cheat” on prior knowledge, and so they supplemented the statements with references to specific details and plot points that will be not possible to know without reading the books of their entirety.

Given a press release such as “With her Apoth abilities, Nusis is able to reverse engineer a type of portal opened using the reagent key found in Rona’s wooden chest,” Gemini 1.5 Pro and 1.5 Flash — after swallowing the suitable book — had to find out whether the statement was true or false and explain their reasoning.

Image Credits: University of Massachusetts at Amherst

Tested on a single book of about 260,000 words (~520 pages), the researchers found that the 1.5 Pro accurately answered true/false statements 46.7% of the time, while Flash only answered accurately 20% of the time. This implies that the coin is significantly higher at answering questions on the book than Google’s latest machine learning model. Averaging across all benchmark results, neither model achieved higher than likelihood when it comes to accuracy in answering questions.

“We have noticed that models have greater difficulty verifying claims that require considering larger sections of a book, or even the entire book, compared to claims that can be solved by taking evidence at the sentence level,” Karpinska said. “Qualitatively, we also observed that models have difficulty validating claims for implicit information that are clear to a human reader but not explicitly stated in the text.”

The second of the 2 studies, co-authored by researchers at UC Santa Barbara, tested the power of Gemini 1.5 Flash (but not 1.5 Pro) to “reason” about videos — that’s, to seek out and answer questions on their content.

The co-authors created a data set of images (e.g., a photograph of a birthday cake) paired with questions for the model to reply concerning the objects depicted in the pictures (e.g., “What cartoon character is on this cake?”). To evaluate the models, they randomly chosen one in all the pictures and inserted “distraction” images before and after it to create a slideshow-like video.

Flash didn’t do thoroughly. In a test by which the model transcribed six handwritten digits from a “slideshow” of 25 images, Flash performed about 50% of the transcriptions accurately. Accuracy dropped to about 30% at eight digits.

“For real question-and-answer tasks in images, this seems particularly difficult for all the models we tested,” Michael Saxon, a doctoral student at UC Santa Barbara and one in all the study’s co-authors, told TechCrunch. “That little bit of reasoning — recognizing that a number is in a box and reading it — can be what breaks the model.”

Google is promising an excessive amount of with Gemini

Neither study was peer-reviewed, nor did it examine the launch of Gemini 1.5 Pro and 1.5 Flash with contexts of two million tokens. (Both tested context versions with 1 million tokens.) Flash just isn’t intended to be as efficient as Pro when it comes to performance; Google advertises it as a low-cost alternative.

Still, each add fuel to the fireplace that Google has been overpromising — and underdelivering — with Gemini from the beginning. None of the models the researchers tested, including OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet, performed well. But Google is the one model provider to place the context window at the highest of its list in its ads.

“There is nothing wrong with simply saying, ‘Our model can accept X tokens,’ based on objective technical details,” Saxon said. “But the question is: What useful thing can be done with it?”

Overall, generative AI is coming under increasing scrutiny as businesses (and investors) grow to be increasingly frustrated with the technology’s limitations.

In two recent Boston Consulting Group surveys, about half of respondents—all CEOs—said they didn’t expect generative AI to deliver significant productivity advantages and that they were concerned about potential errors and data breaches resulting from generative AI tools. PitchBook recently reported that early-stage generative AI deal activity has declined for 2 consecutive quarters, down 76% from its peak in Q3 2023.

With meeting recap chatbots conjuring fictitious details about people and AI search platforms which can be essentially plagiarism generators, customers are on the lookout for promising differentiators. Google — which had been racing, sometimes clumsily, to meet up with its rivals in the sphere of generative AI — desperately wanted the Gemini context to be one in all those differentiators.

However, it seems that the idea was premature.

“We haven’t figured out how to really show that ‘reasoning’ or ‘understanding’ is happening across long documents, and basically every group publishing these models is just pulling together their own ad hoc assessments to make these claims,” Karpińska said. “Without knowing how long the context processing is happening—and the companies don’t share that detail—it’s hard to say how realistic these claims are.”

Google didn’t reply to a request for comment.

Both Saxon and Karpińska consider that the antidote to the grandiose claims about generative AI is best benchmarks and, in the identical vein, a greater emphasis on third-party criticism. Saxon notes that one in all the more common long-context tests (heavily cited by Google in its marketing materials), the “needle in a haystack,” measures only a model’s ability to retrieve specific pieces of knowledge, such as names and numbers, from datasets—not how well it answers complex questions on that information.

“All scientists and most engineers using these models generally agree that our current benchmarking culture is broken,” Saxon said, “so it’s important that the public understands that these giant reports with numbers like ‘general intelligence in “comparative tests” with an enormous pinch of salt.”

This article was originally published on : techcrunch.com
Continue Reading
Advertisement
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Technology

Wiz acquires Dazz for $450 million to expand cybersecurity platform

Published

on

By

Wizardone of the talked about names within the cybersecurity world, is making a major acquisition to expand its reach of cloud security products, especially amongst developers. This is buying Dazzlespecialist in solving security problems and risk management. Sources say the deal is valued at $450 million, which incorporates money and stock.

This is a leap within the startup’s latest round of funding. In July, we reported that Dazz had raised $50 million at a post-money valuation of just below $400 million.

Remediation and posture management – two areas of focus for Dazz – are key services within the cybersecurity market that Wiz hasn’t sorted in addition to it wanted.

“Dazz is a leader in this market, with the best talent and the best customers, which fits perfectly into the company culture,” Assaf Rappaport, CEO of Wiz, said in an interview.

Remediation, which refers to helping you understand and resolve vulnerabilities, shapes how an enterprise actually handles the various vulnerability alerts it could receive from the network. Posture management is a more preventive product: it allows a company to higher understand the scale, shape and performance of its network from a perspective, allowing it to construct higher security services around it.

Dazz will proceed to operate as a separate entity while it’s integrated into the larger Wiz stack. Wiz has made a reputation for itself as a “one-stop shop,” and Rappaport said the integrated offering will proceed to be a core a part of it.

He believes this contrasts with what number of other SaaS corporations are built. In the safety industry, there are, Rappaport said, “a lot of Frankenstein mashups where companies prioritize revenue over building a single technology stack that actually works as a platform.” It could be assumed that integration is much more necessary in cybersecurity than in other areas of enterprise IT.

Wiz and Dazz already had an in depth relationship before this deal. Merat Bahat — the CEO who co-founded Dazz with Tomer Schwartz and Yuval Ofir (CTO and VP of R&D, respectively) — worked closely with Assaf Rappaport at Microsoft, which acquired his previous startup Adallom.

After Rappaport left to found Wiz together with his former Adallom co-founders, CTO Ami Luttwak, VP of Product Yinon Costica and VP of R&D Roy Reznik, Bahat was one in all the primary investors. Similarly, when Bahat founded Dazz, Assaf was a small investor in it.

The connection goes deeper than work colleagues. Bahat and Rappaport are also close friends, and she or he was the second family of Mickey, Rappaport’s beloved dog, referred to as Chief Dog Officer Wiz (together with LinkedIn profile). Once the deal was done, the 2 faced two very sad events: each Bahat and Mika’s mother died.

“We hope for a new chapter of positivity,” Bahat said. The cycle of life does indeed proceed.

Rumors of this takeover began to appear earlier this month; Rappaport confirmed that they then began talking seriously.

But that is not the one M&A conversation Wiz has gotten involved in. Earlier this 12 months, Google tried to buy Wiz itself for $23 billion to construct a major cybersecurity business. Wiz walked away from the deal, which might have been the biggest in Google’s history, partly because Rappaport believed Wiz could turn into a fair larger company by itself terms. And that is what this agreement goals to do.

This acquisition is a test for Wiz, which earlier this 12 months filled its coffers with $1 billion solely for M&A purposes (it has raised almost $2 billion in total, and we hear the subsequent round will close in just a few weeks). . Other offers included purchasing Gem security for $350 million, but Dazz is its largest acquisition ever.

More mergers and acquisitions could also be coming. “We believe next year will be an acquisition year for us,” Rappaport said.

In an interview with TC, Luttwak said that one in all Wiz’s priorities now’s to create more tools for developers that have in mind what they need to do their jobs.

Enterprises have made significant investments in cloud services to speed up operations and make their IT more agile, but this shift has include a significantly modified security profile for these organizations: network and data architectures are more complex and attack surfaces are larger, creating opportunities for malicious hackers to find ways to to hack into these systems. Artificial intelligence makes all of this far more difficult when it comes to malicious attackers. (It’s also a chance: the brand new generation of tools for our defense relies on artificial intelligence.)

Wiz’s unique selling point is its all-in-one approach. Drawing data from AWS, Azure, Google Cloud and other cloud environments, Wiz scans applications, data and network processes for security risk aspects and provides its users with a series of detailed views to understand where these threats occur, offering over a dozen products covering the areas, corresponding to code security, container environment security, and provide chain security, in addition to quite a few partner integrations for those working with other vendors (or to enable features that Wiz doesn’t offer directly).

Indeed, Wiz offered some extent of repair to help prioritize and fix problems, but as Luttwak said, the Dazz product is solely higher.

“We now have a platform that actually provides a 360-degree view of risk across infrastructure and applications,” he said. “Dazz is a leader in attack surface management, the ability to collect vulnerability signals from the application layer across the entire stack and build the most incredible context that allows you to trace the situation back to engineers to help with remediation.”

For Dazz’s part, once I interviewed Bahat in July 2024, when Dazz raised $50 million at a $350 million valuation, she extolled the virtues of constructing strong solutions and this week said the third quarter was “amazing.”

“But market dynamics are what trigger these types of transactions,” she said. She confirmed that Dazz had also received takeover offers from other corporations. “If you think about the customers and joint customers that we have with Wiz, it makes sense for them to have it on one platform.”

And a few of Dazz’s competitors are still going it alone: ​​Cyera, like Dazz, an authority in attack surface management, just yesterday announced a rise of $300 million at a valuation of $5 billion (which confirms our information). But what’s going to he do with this money? Make acquisitions, after all.

Wiz says it currently has annual recurring revenue of $500 million (it has a goal of $1 billion ARR next 12 months) and has greater than 45% of its Fortune 100 customers. Dazz said ARR is within the tens of hundreds of thousands of dollars and currently growing 500% on a customer base of roughly 100 organizations.

This article was originally published on : techcrunch.com
Continue Reading

Technology

Department of Justice: Google must sell Chrome to end its monopoly

Published

on

By

Google corporate logo hangs outside the Google Germany offices

The U.S. Department of Justice argued Wednesday that Google should sell its Chrome browser as part of a countermeasure to break the corporate’s illegal monopoly on online search, according to a filing with the Justice Department. United States District Court for the District of Columbia. If the answer proposed by the Department of Justice is approved, Google won’t have the option to re-enter the search marketplace for five years.

Ultimately, it’ll be District Court Judge Amit Mehta who will determine what the ultimate punishment for Google might be. This decision could fundamentally change one of the most important firms on the planet and alter the structure of the Internet as we understand it. This phase of the method is anticipated to begin sometime in 2025.

In August, Judge Mehta ruled that Google constituted an illegal monopoly since it abused its power within the search industry. The judge also questioned Google’s control over various web gateways and the corporate’s payments to third parties to maintain its status because the default search engine.

The Department of Justice’s latest filing says Google’s ownership of Android and Chrome, that are key distribution channels for its search business, poses a “significant challenge” to remediation to ensure a competitive search market.

The Justice Department has proposed other remedies to address the search engine giant’s monopoly, including Google spinning off its Android mobile operating system. The filing indicated that Google and other partners may oppose the spin-off and suggested stringent countermeasures, including ending the use of Android to the detriment of search engine competitors. The Department of Justice has suggested that if Google doesn’t impose restrictions on Android, it must be forced to sell it.

Prosecutors also argued that the corporate must be barred from stepping into exclusionary third-party agreements with browser or phone firms, resembling Google’s agreement with Apple to be the default search engine on all Apple products.

The Justice Department also argued that Google should license its search data, together with ad click data, to competitors.

Additionally, the Department of Justice also set conditions prohibiting Google from re-entering the browser market five years after the spin-off of Chrome. Additionally, it also proposed that after the sale of Chrome, Google mustn’t acquire or own any competing ad text search engine, query-based AI product, or ad technology. Moreover, the document identifies provisions that allow publishers to opt out of Google using their data to train artificial intelligence models.

If the court accepts these measures, Google will face a serious setback as a competitor to OpenAI, Microsoft and Anthropic in AI technology.

Google’s answer

In response, Google said the Department of Justice’s latest filing constitutes a “radical interventionist program” that may harm U.S. residents and the country’s technological prowess on the planet.

“The Department of Justice’s wildly overblown proposal goes far beyond the Court’s decision. “It would destroy the entire range of Google products – even beyond search – that people love and find useful in their everyday lives,” said Google’s president of global affairs and chief legal officer Kent Walker. blog post.

Walker made additional arguments that the proposal would threaten user security and privacy, degrade the standard of the Chrome and Android browsers, and harm services resembling Mozilla Firefox, which depends upon Google’s search engine.

He added that if the proposal is adopted, it could make it tougher for people to access Google search. Moreover, it could hurt the corporate’s prospects within the AI ​​race.

“The Justice Department’s approach would lead to unprecedented government overreach that would harm American consumers, developers and small businesses and threaten America’s global economic and technological leadership at precisely the moment when it is needed most,” he said.

The company is to submit a response to the above request next month.

Wednesday’s filing confirms earlier reports that prosecutors were considering getting Google to spin off Chrome, which controls about 61% of the U.S. browser market. According to to the StatCounter web traffic service.

This article was originally published on : techcrunch.com
Continue Reading

Technology

Snowflake acquires data management company Datavolo

Published

on

By

The Snowflake Inc logo, which represents the American cloud computing-based data company that offers cloud-based storage and analytics services, is being displayed on their pavilion at the Mobile World Congress 2024 in Barcelona, Spain, on February 28, 2024.

Cloud giant Snowflake has agreed to take over Datavoloa company managing the data pipeline, for an undisclosed amount.

Snowflake announced the deal on Wednesday after the market bell closed, while reporting its third-quarter 2025 earnings. The purchase has not yet closed and is subject to customary closing conditions, Snowflake noted wa release.

Joseph Witt and Luke Roquet, who met while working together at Hortonworks, founded Datavolo in 2023. Witt was previously a vp at Cloudera, and Roquet was Cloudera’s chief marketing officer and, before that, director of business development at AWS.

Datavolo uses Apache NiFi, an open source data processing project developed by the NSA, to power a platform to automate data flow between disparate enterprise data sources. Data “processors” extract, cleanse, transform and enrich data, including for generative use of artificial intelligence.

With Datavolo having raised $21 million in enterprise capital from investors including Citi Ventures and General Catalyst prior to the acquisition, Snowflake CEO Sridhar Ramaswamy envisions creating more comprehensive data pipelines for Snowflake customers. For example, he says Datavolo can enable users to interchange single-use data connectors with flexible pipelines that allow them to maneuver data from cloud and on-premises sources to Snowflake’s data cloud.

“By bringing Datavolo to Snowflake, we are increasing the amount of data captured by Snowflake over the lifecycle, providing our customers with both simplicity and cost savings, without sacrificing data extensibility,” Ramaswamy said in a press release. “We are thrilled to have the Datavolo team join Snowflake as we accelerate the best platform for enterprise data – unstructured and structured, batch and streaming – and committed to the success of the open source community.”

Witt says Snowflake will support and help manage the Apache NiFi project after the acquisition closes. “Data engineering at scale can be extremely expensive and complex, and our goal has always been to simplify our customers’ experiences so they can realize value faster,” he added within the press release. “By joining forces with Snowflake, we can deliver the massive scale and radical simplicity of the Snowflake platform to our customers, ultimately unlocking data engineering for more users.”

Thanks partly to artificial intelligence, demand for data management technologies has increased. Fortune’s business insights estimates that the worldwide enterprise data management market could possibly be price $224.87 billion by 2032.

However, data management has been a challenge for enterprises long before the substitute intelligence boom. According to in a 2022 survey by Great Hopetions, a data quality platform, 91% of organizations said data quality issues impact their performance.

Against this backdrop, it isn’t surprising that firms like Datavolo are gaining prominence.

Today was a giant day for Snowflake who reported better-than-expected earnings sent the company’s shares up 19%. In addition to the acquisition of Snowflake, the company announced a multi-year partnership with Anthropic to integrate the startup’s AI models into Snowflake’s Cortex AI, Snowflake Intelligence and Cortex Analyst products.

This article was originally published on : techcrunch.com
Continue Reading
Advertisement

OUR NEWSLETTER

Subscribe Us To Receive Our Latest News Directly In Your Inbox!

We don’t spam! Read our privacy policy for more info.

Trending