Reliable Reasoning with Large Language Models

  • 350

LLMs of today evoke a mixture of admiration and dissatisfaction. Advances in language modeling have given us much to admire: impressive text generation, seemingly complex reasoning, ability to write code, use tools, and much more. Automation of various kinds looks imminent. At the same time, they also seem stunningly bad, contradictory, and making up things that just don’t make sense. Automation looks risky at best!

Reliability remains one of the fundamental hurdles in translating the potential of LLMs to adoption in critical application areas. In this three part talk, I will present three investigations into improving reliability of language models for complex reasoning problems. I will show how we can formalize a form of unreliable shortcut-based reasoning and synthetic data generation techniques for improving reliable reasoning in the context of multi-step question answering. Then, I will present our recent work on developing a reliable execution environment for programmatically testing LLM-based automation for solving complex tasks using multiple everyday applications.

Niranjan is an Assistant Professor in the Computer Science department at Stony Brook University, where he heads the Language Understanding and Reasoning lab (LUNR). Prior to joining Stony Brook, he was a post-doctoral researcher in the University of Washington, and was one of the early members of the Allen Institute for Artificial Intelligence. Niranjan completed his PhD in Computer Science from the University of Massachusetts Amherst.