AI agents fail half the time: new benchmark reveals weaknesses

20 Aug 2025

Autonomous systems represent a major frontier in artificial intelligence, but understanding why these systems fail remains a significant challenge. SMU Assistant Professor of Computer Science Huo Yintong, along with other researchers, investigated the causes of failure in these increasingly complex systems. The research introduced Cibench, a comprehensive benchmark designed to rigorously evaluate Large Language Model (LLM)-based agents, with a particular focus on their ability to collaborate and perform complex tasks involving tool use and real-world data interaction. It also offered a detailed taxonomy of failure causes to build more reliable and effective autonomous agents for the future.

https://quantumzeitgeist.com/ai-agents-fail-half-the-time-new-benchmark-reveals…

Students recognised for promoting cybersecurity awareness in the community; 119 student volunteers honoured

Singtel-owned NCS names Sam Liew as CEO as Ng Kuo Pin steps down

Chantalle Ng plays office lady in new drama, says 'perfect life' is morning yoga and fish beehoon soup for lunch

Singapore students prove anyone can build AI: Codechella 2025 redefines university hackathons

Where to find us

Get in touch