Benchmarking AI-Powered Code Fix Generation for Mobile App Crashes

Company

Instabug

Date Published

April 10, 2025

Author

Sherief Abul-Ezz

Word count

1412

Language

English

Hacker News points

None

URL

www.instabug.com/blog/benchmarking-ai-code-fix-mobile-crashes

Summary

SmartResolve's AI model evaluation highlights the strengths and weaknesses of various large language models (LLMs) in generating code fixes for mobile crashes. The top-performing models on iOS are GPT-4o, Claude 3.5 Haiku V1, and Claude 3.5 Sonnet V1, which demonstrate strong coherence and correctness. In contrast, models like LLaMA-3-70b and OpenAI o1 struggle significantly due to poor performance on Android, particularly in terms of correctness and relevance. A hybrid model selection strategy is recommended for SmartResolve's production use, leveraging high-coherence models for structured responses while integrating stable models for balanced performance across platforms. The evaluation results will be continuously updated as new models enter the market, ensuring SmartResolve remains at the forefront of AI-powered mobile crash resolution.