The long-time AI researcher discusses the need to allay public concerns with responsible AI initiatives.
The booming field of AI has deep roots at SRI, which first launched its AI Center back in 1966. One of the field’s leaders is SRI’s Ray Perrault, a Distinguished Computer Scientist at the Artificial Intelligence Center, where he served as Director from 1988 to 2017. Perrault entered the field of computer science when it was still being developed; he recalls that he enrolled in the first-ever programming class offered at McGill University back in the late 1960s.
Much of Perrault’s research at SRI has focused on natural language processing, which involves designing computer programs to parse the complex nuances of normal human speech and writing. This research has garnered enormous attention in the past couple of years thanks to the widespread adoption of large language models (LLMs), such as ChatGPT. Users can pose plain-language questions to these chatbots that in turn offer mutually intelligible and often highly accurate responses.
Despite the impressive capabilities of natural language-capable virtual assistants and chatbots, Perrault notes they have flaws that are becoming more obvious. Arguably the most notable limitation is so-called hallucination, when LLMs provide false, nonsensical, or inconsistent answers. Hallucination arises from a mix of factors, including algorithmic-driven over-extrapolations at best, or guesses at worst, contradictions, and incompleteness of data. Nearly everyone who has spent time working with LLMs has had the experience, with Perrault sharing his anecdote of asking ChatGPT “to write a summary of my career, and it got two or three things very clearly wrong.”
Other limitations of LLMs are their lack of planning and symbolic reasoning—perhaps best expressed by the models’ narrow operational parameter space. “One of the things that I find surprising is in spite of all the chess games they have read in the literature that they are undoubtedly trained on, LLMs are still way worse at chess than the simple programs that run on your phone,” said Perrault.
Solving these issues will not be easy, Perrault said. As the “large” in their name suggests, LLMs are trained on massive sets encompassing billions of examples of written language—in some models, essentially the entire internet, suggesting data sets cannot grow much bigger. Furthermore, training LLMs has grown remarkably expensive, Perrault noted, with the latest version of Google’s Gemini chatbot, Gemini Ultra, estimated to have cost $191 million.
“We’re looking at astronomical amounts of money, and if training is just being done using the same fundamental algorithms we’ve got now, it won’t solve fundamental problems,” said Perrault.
Instead, one approach being spearheaded at SRI is small language models (SMLs) that work with close-ended, relevant expert content. SMLs can be programmed with human expert-derived guidelines from the get-go, constraining the models’ errors and hallucinations, as well as their cost.
Taking stock of AI
Perrault’s comprehensive view of AI’s past, present, and future is informed through his role as Co-Chair of the AI Index Steering Committee. Part of the Stanford Institute for Human-Centered Artificial Intelligence (HAI), the committee has prepared an annual report, the AI Index, for the last seven years that surveys the field and captures key trends. The latest AI Index report for 2024 was published on April 15.
“The field evolves quickly and has much bigger and imminent social consequences, so keeping track of it is not easy, and that is the mission of the AI Index,” said Perrault.
As an indication of the growing impact and promise of AI, Perrault noted that the first report in 2017 ran 60-some pages, while the latest is about 500. Some important topics that have come up in the latest report include public opinion and policy developments. The report includes survey results from several firms that collectively illustrate the public perception that AI is becoming ever likelier to dramatically affect individuals’ lives. The report also conveys that survey respondents feel an accompanying surge in nervousness, with feelings of concern outweighing those of excitement.
Addressing these concerns policy-wise and in broader society will be integral for AI’s continued safe and beneficial development and deployment, Perrault said. To that end, the latest AI Index newly addressed the definition and measurement of aspects of AI systems involving what researchers in the field are calling Responsible AI. Some of the fundamental questions Responsible AI asks of AI systems include, Are the systems fair? Can they keep data private? Are they secure?
“Being able to measure these aspects of new AI systems is essential to policymakers faced with deciding how to regulate them,” said Perrault. “We’re going to get better at asking these kinds of questions and figuring out the policy issues to be addressed for the good of humankind.”