SuperGLUE: Understanding a Sticky Benchmark for LLMs

Post Details

Company

Deepgram

Date Published

Aug. 9, 2023

Author

Zian (Andy) Wang

Word Count

1,208

Company Posts That Month

26

Language

English

Hacker News Points

-

Source URL

deepgram.com/learn/superglue-llm-benchmark-explained

Summary

SuperGLUE is a more complex benchmark for evaluating Language Models (LLMs) compared to the GLUE benchmark introduced in 2019. It offers a new set of tasks, as well as a public leaderboard for assessing language models' performance. The SuperGLUE benchmark includes eight subtasks and two additional "metrics" that analyze the model at a broader scale. These tasks are designed to be solvable by an English-speaking college student but surpass what current (late 2019) language models can accomplish. The final SuperGLUE benchmark score is computed as the simple average across all tasks. Unlike HuggingFace leaderboard for LLMs, the leaderboard for SuperGLUE is populated mainly by models developed by smaller research labs rather than well-known close-sourced models such as Claude and GPT.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
LLM	7	2,871	337	112	+58%