Technology
Why RAG won’t solve the AI generative hallucination problem
Hallucinations – essentially the lies that generative artificial intelligence models tell – pose an enormous problem for firms seeking to integrate the technology into their operations.
Because models haven’t any real intelligence and easily predict words, images, speech, music, and other data in keeping with a non-public schema, they often get it mistaken. Very bad. In a recent article in The Wall Street Journal, a source cites a case through which Microsoft’s generative AI invented meeting participants and suggested that conference calls covered topics that weren’t actually discussed during the call.
As I wrote a while ago, hallucinations could be an unsolvable problem in modern transformer-based model architectures. However, many generative AI vendors suggest eliminating them roughly through a technical approach called search augmented generation (RAG).
Here’s how one supplier, Squirro, he throws it: :
At the core of the offering is the concept of Recovery Augmented LLM or Recovery Augmented Generation (RAG) built into the solution… (our Generative Artificial Intelligence) is exclusive in its promise of zero hallucinations. Each piece of knowledge it generates is traceable to its source, ensuring credibility.
Here it’s similar tone from SiftHub:
Using RAG technology and fine-tuned large language models and industry knowledge training, SiftHub enables firms to generate personalized responses without hallucinations. This guarantees greater transparency and reduced risk, and instills absolute confidence in using AI for all of your needs.
RAG was pioneered by data scientist Patrick Lewis, a researcher at Meta and University College London and lead writer of the 2020 report paper who coined this term. When applied to a model, RAG finds documents which may be relevant to a given query—for instance, the Wikipedia page for the Super Bowl—using keyword searches, after which asks the model to generate a solution in this extra context.
“When you interact with a generative AI model like ChatGPT or Lama and ask a question, by default the model responds based on its ‘parametric memory’ – i.e. knowledge stored in its parameters as a result of training on massive data from the Internet,” he explained David Wadden, a research scientist at AI2, the artificial intelligence research arm of the nonprofit Allen Institute. “But just as you are likely to give more accurate answers if you have a source of information in front of you (e.g. a book or file), the same is true for some models.”
RAG is undeniably useful – it lets you assign things generated by the model to discovered documents to ascertain their veracity (with the additional advantage of avoiding potentially copyright-infringing regurgitations). RAG also allows firms that don’t need their documents for use for model training – say, firms in highly regulated industries comparable to healthcare and law – to permit their models to make use of these documents in a safer and temporary way.
But RAG actually stops the model from hallucinating. It also has limitations that many providers overlook.
Wadden says RAG is best in “knowledge-intensive” scenarios where the user desires to apply the model to fill an “information need” – for instance, to search out out who won the Super Bowl last 12 months. In such scenarios, the document answering the query will likely contain lots of the same keywords as the query (e.g., “Super Bowl,” “last year”), making it relatively easy to search out via keyword search.
Things get harder for reasoning-intensive tasks like coding and math, where in a keyword-based query it’s harder to find out the concepts needed to reply the query, much less determine which documents is perhaps relevant.
Even for basic questions, models can grow to be “distracted” by irrelevant content in the documents, especially long documents where the answer isn’t obvious. Or, for reasons still unknown, they could simply ignore the contents of recovered documents and rely as a substitute on their parametric memory.
RAG can be expensive when it comes to the equipment needed to deploy it on a big scale.
This is because retrieved documents, whether from the Internet, an internal database, or elsewhere, have to be kept in memory – at the very least temporarily – for the model to confer with them again. Another expense is computing the increased context that the model must process before generating a response. For a technology already famous for the large amounts of computing power and electricity required to even perform basic operations, this can be a serious consideration.
This does not imply RAG cannot be improved. Wadden noted many ongoing efforts to coach models to raised leverage documents recovered using RAG.
Some of those efforts include models that may “decide” when to make use of documents, or models that may opt out of search first in the event that they deem it unnecessary. Others are specializing in ways to index massive document datasets more efficiently and to enhance search through higher representations of documents—representations that transcend keywords.
“We’re pretty good at retrieving documents based on keywords, but we’re not very good at retrieving documents based on more abstract concepts, such as the checking technique needed to solve a math problem,” Wadden said. “Research is required to construct document representations and search techniques that may discover suitable documents for more abstract generation tasks. I feel it’s mostly an open query at this point.”
So RAG may help reduce models’ hallucinations, however it isn’t the answer to all hallucinatory problems of AI. Beware of any seller who tries to say otherwise.