Running HumanEval safely with Riza

Company

Riza

Date Published

June 11, 2024

Author

Andrew Benton

Word count

976

Language

English

Hacker News points

None

URL

riza.io/blog/running-human-eval-safely-with-riza

Summary

Riza introduces a method for securely evaluating large language model (LLM) code generation capabilities using its Code Interpreter API, allowing users to run untrusted code within a safe environment. The process involves leveraging Riza as the execution engine for HumanEval evaluations, which traditionally require running potentially risky code directly on the user's machine. By substituting the direct execution with Riza's API, users can mitigate security risks while still assessing the LLM's ability to generate functional code. The guide details steps for integrating Riza into the HumanEval framework, including setting up necessary API keys, generating evaluation data using Meta's llama3 70b model, and modifying existing scripts for secure execution. The evaluation process measures the effectiveness of code generated by the LLM, with the llama3 70b model achieving a pass rate of approximately 44% on 164 HumanEval problems on its first attempt.