Researchers have developed a new framework that reduces generative AI hallucinations by up to 32%.
Today’s generative artificial intelligence (GenAI) platforms are powerful yet imperfect. While the large language models (LLMs) that power GenAI applications seem to provide plausible answers, the answers they provide often contain errors, which AI researchers refer to as “hallucinations.”
Researchers at SRI have developed a new framework called Pelican that improves the accuracy of GenAI responses, particularly in the context of image interpretation.
The research
“Even these big models can make mistakes,” said Pritish Sahu, a computer scientist at SRI’s Center for Vision Technologies. “That’s where Pelican comes in. It breaks claims down into smaller components and tries to verify individual pieces.”
In a recent paper accepted to the 2024 Conference on Empirical Methods in Natural Language Processing, Sahu and his colleagues demonstrated how, by decomposing responses into simpler sub-claims and validating to those sub-claims individually, Pelican improved the accuracy of five different Large Visual Language Models (LVLMs). In the team’s experiments, Pelican reduced hallucination rates by 8-32% across the five LVLMs in question. Critically, Pelican provides users with the flexibility to plug in numerous pre-built analytical tools and programs, which can often verify sub-claims more efficiently and effectively than the core LLMs/LVLMs.
“Pelican provides, for the first time, a seamless combination of code generation and execution with natural language reasoning,” said Ajay Divakaran, technical director of the Vision and Learning Laboratory in SRI’s Center for Vision Technologies. “And all this task distribution and outsourcing is completely transparent.”
That means that Pelican doesn’t just improve the accuracy of the final answers; the researchers also have access to all the smaller questions and answers that the framework used. So if Pelican provides an answer that is incorrect, it’s easy for the researchers to look at the sub-tasks and see where the error originated.
“Pelican provides better accuracy and better explainability,” Divakaran said. “If you know where the breakdown happened, then you know concretely what you need to do to fix it.”
Why it matters
Around the world, AI researchers are intently focused on minimizing GenAI hallucinations. While GenAI shows tremendous potential to improve productivity and access to knowledge, persistent errors will limit the scalability of GenAI tools, particularly in fields and use cases where complete accuracy is essential.
In the near term, there is not likely to be a single solution to GenAI hallucinations. Instead, frameworks like Pelican will be layered into GenAI tools, steadily improving their accuracy.
“We used a narrow, concrete problem to showcase Pelican, but the framework is applicable to any kind of task that can be decomposed into individual sub-tasks,” Divakaran said. “We are working to widen the scope of Pelican and to make it more flexible and powerful in the future.”