TrustNavGPT: Trust-Driven Audio-Guided Robot Navigation under Uncertainty with Large Language Models

IROS 2024

¹Purdue University
² University of Central Florida

Abstract

Large language models (LLMs) exhibit a wide range of promising capabilities -- from step-by-step planning to commonsense reasoning --that provide utility for robot navigation. However, as humans communicate with robots in the real world, ambiguity and uncertainty may be embedded inside spoken instructions. While LLMs are proficient at processing text in human conversations, they often encounter difficulties with the nuances of verbal instructions and, thus, remain prone to hallucinate trust in human command. In this work, we present TrustNavGPT, an LLM-based audio-guided navigation agent that uses affective cues in spoken communication—elements such as tone and inflection that convey meaning beyond words—allowing it to assess the trustworthiness of human commands and make effective, safe decisions.

Overview

The current navigation methods using Large Language Models (LLMs) struggle with making accurate decisions when faced with ambiguous audio instructions. Our strategy involves affective cues from spoken communication into LLMs, enabling them to evaluate the reliability of human instructions from the semantic and vocal uncertainty, thus allowing for safe and successful navigation.

Pipeline

Human audio goes through an audio-processing module that transcribes it, while a vocal cue model identifies three essential affective cues. We then prompt a language model to generate five possible next-step actions, selecting the choice based on the next token logit probability. Notably, semantic transcription alone leads to the red choice, but incorporating the vocal cue results in the green choice being selected. Finally, a tool library translates the chosen language instruction into agent actions for navigation.

TrustNavGPT: Trust-Driven Audio-Guided Robot Navigation under Uncertainty with Large Language Models

TrustNavGPT can reason the confidence level of human audio instructions, and perform safe and efficient robot navigation based on human vocal cues.

Abstract

Overview

Pipeline

BibTeX