Code Runner: Secure, scalable code execution for model evaluation

Post Details

Company

LabelBox

Date Published

Dec. 20, 2024

Author

Dmytro Apollonin

Word Count

866

Language

-

Hacker News Points

-

Source URL

labelbox.com/blog/code-runner-secure-scalable-code-execution-for-model-evaluation-2

Summary

Labelbox has introduced Code Runner, a new feature on its platform designed to enhance the evaluation of large language models (LLMs) by allowing users to execute code directly within the evaluation workflow. Code Runner aims to improve the quality of responses in coding-related projects by providing precise outputs, such as standard output, standard error, execution time, and warnings, without users needing to leave the platform. The infrastructure behind Code Runner is powered by Google Cloud Run, which offers a secure, scalable environment for executing code in isolated, temporary containers tailored to specific programming languages like Python and JavaScript. The system ensures security through measures such as separate Google Cloud Platform projects and communication via Private Service Connect, which prevents public exposure and restricts network access. Code Runner's architecture is designed for scalability, handling multiple requests efficiently, and reliability, as each execution occurs in a clean, stateless environment. By integrating this feature, Labelbox empowers users to perform dynamic, interactive testing, encouraging feedback and continuous improvement.