SuperGLUE: Understanding a Sticky Benchmark for LLMs

Company

Deepgram

Date Published

Aug. 9, 2023

Author

Zian (Andy) Wang

Word count

1208

Language

English

Hacker News points

None

URL

deepgram.com/learn/superglue-llm-benchmark-explained

Summary

SuperGLUE is a more complex benchmark for evaluating Language Models (LLMs) compared to the GLUE benchmark introduced in 2019. It offers a new set of tasks, as well as a public leaderboard for assessing language models' performance. The SuperGLUE benchmark includes eight subtasks and two additional "metrics" that analyze the model at a broader scale. These tasks are designed to be solvable by an English-speaking college student but surpass what current (late 2019) language models can accomplish. The final SuperGLUE benchmark score is computed as the simple average across all tasks. Unlike HuggingFace leaderboard for LLMs, the leaderboard for SuperGLUE is populated mainly by models developed by smaller research labs rather than well-known close-sourced models such as Claude and GPT.