Benchmarking Supermaven's Long-Context Capabilities

Post Details

Company

Supermaven

Date Published

March 14, 2024

Author

Jacob Jackson, CEO

Word Count

663

Language

English

Hacker News Points

-

Source URL

supermaven.com/blog/benchmarking-long-context

Summary

Supermaven, a code completion tool boasting a 300,000-token context window, undergoes rigorous testing to validate its performance and utility in leveraging extensive context. The "needle in a haystack" test reveals that while Supermaven can effectively retrieve specific information embedded within a large text, it is notably easier due to the distinctiveness of the inserted "needle." To further challenge the model, a dense retrieval task is devised, requiring Supermaven to recall key-value pairs across a lengthy sequence, demonstrating its ability to handle more complex memory tasks. Results indicate that the model excels in retrieving information when occurrences are near the beginning or end of a sequence but struggles with mid-sequence retrieval, achieving around 75% accuracy when separated by 50,000 tokens. Additionally, an analysis of prediction error against context length shows that Supermaven effectively utilizes the full context available to improve prediction accuracy, with error rates decreasing as more context is incorporated.