API-Bank: Benchmarking Language Models’ Tool Use

Post Details

Company

Deepgram

Date Published

Aug. 28, 2023

Author

Brad Nikkel

Word Count

2,334

Company Posts That Month

26

Language

English

Hacker News Points

-

Source URL

deepgram.com/learn/apibank-llm-benchmark

Summary

Researchers have developed a new benchmark called API-Bank for testing how well large language models (LLMs) use external tools such as APIs to accomplish tasks. The benchmark evaluates LLMs' abilities in three main areas: deciding when to call an API, finding the right tool for the job, and employing multiple APIs to complete a task. GPT-4 outperforms GPT-3.5 Turbo on most of the tests, but both models struggle with tasks requiring multiple rounds of interdependent API calls. The results highlight the potential for LLMs to become more efficient and useful by incorporating external tools, as well as areas where further improvements are needed.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
LLM	77	2,871	337	112	+58%
AI Model Fine-tuning	3	653	128	64	-3%
Vector Search	1	1,743	241	77	+53%