Home / Companies / Deepgram / Blog / Post Details
Content Deep Dive

API-Bank: Benchmarking Language Models’ Tool Use

Blog post from Deepgram

Post Details
Company
Date Published
Author
Brad Nikkel
Word Count
2,334
Language
English
Hacker News Points
-
Summary

Researchers have developed a new benchmark called API-Bank for testing how well large language models (LLMs) use external tools such as APIs to accomplish tasks. The benchmark evaluates LLMs' abilities in three main areas: deciding when to call an API, finding the right tool for the job, and employing multiple APIs to complete a task. GPT-4 outperforms GPT-3.5 Turbo on most of the tests, but both models struggle with tasks requiring multiple rounds of interdependent API calls. The results highlight the potential for LLMs to become more efficient and useful by incorporating external tools, as well as areas where further improvements are needed.