
Autonomous systems represent a major frontier in artificial intelligence, but understanding why these systems fail remains a significant challenge. SMU Assistant Professor of Computer Science Huo Yintong, along with other researchers, investigated the causes of failure in these increasingly complex systems. The research introduced Cibench, a comprehensive benchmark designed to rigorously evaluate Large Language Model (LLM)-based agents, with a particular focus on their ability to collaborate and perform complex tasks involving tool use and real-world data interaction. It also offered a detailed taxonomy of failure causes to build more reliable and effective autonomous agents for the future.