Home / Companies / Deepgram / Blog / Post Details
Content Deep Dive

API-Bank: Benchmarking Language Models’ Tool Use

Blog post from Deepgram

Post Details
Company
Date Published
Author
Brad Nikkel
Word Count
2,334
Company Posts That Month
26
Language
English
Hacker News Points
-
Summary

Researchers have developed a new benchmark called API-Bank for testing how well large language models (LLMs) use external tools such as APIs to accomplish tasks. The benchmark evaluates LLMs' abilities in three main areas: deciding when to call an API, finding the right tool for the job, and employing multiple APIs to complete a task. GPT-4 outperforms GPT-3.5 Turbo on most of the tests, but both models struggle with tasks requiring multiple rounds of interdependent API calls. The results highlight the potential for LLMs to become more efficient and useful by incorporating external tools, as well as areas where further improvements are needed.

Trends Found in this Post
Trend Post Mentions Total Month Mentions Posts Companies MoM
LLM 77 2,871 337 112 +58%
AI Model Fine-tuning 3 653 128 64 -3%
Vector Search 1 1,743 241 77 +53%