A technical look at SonarSweep for GPT-OSS-20B

Post Details

Company

Sonar

Date Published

Dec. 4, 2025

Author

Joe Tyler

Word Count

634

Language

English

Hacker News Points

-

Source URL

www.sonarsource.com/blog/a-technical-look-at-sonarsweep-for-gpt-oss-20b

Summary

SonarSweep-java-gpt-oss-20b is a fine-tuned version of OpenAI's gpt-oss-20b, aimed at generating high-quality Java code by leveraging the SonarSweep pipeline to improve training data quality, thus reducing the presence of bugs and vulnerabilities without increasing the model size or latency. This model demonstrates a significant improvement in code quality, achieving a ~41% reduction in bugs and security vulnerabilities, and an ~18% reduction in code smells compared to the base model, while maintaining functional correctness and general question-answering capabilities. The fine-tuning process involved optimizing a dataset of 70k Java examples to follow best coding practices, resulting in a model tailored for speed and standard tasks rather than complex reasoning, and serving as a testament to how high-quality training data can enhance the reliability and safety of LLM-generated code. The release invites the community to explore the model further on HuggingFace and offers insights into the advantages of using SonarSweep for fine-tuning to improve the security and maintainability of code produced by language models.