Home / Companies / Sonar / Blog / Post Details
Content Deep Dive

A technical look at SonarSweep for GPT-OSS-20B

Blog post from Sonar

Post Details
Company
Date Published
Author
Joe Tyler
Word Count
634
Language
English
Hacker News Points
-
Summary

SonarSweep-java-gpt-oss-20b is a fine-tuned version of OpenAI's gpt-oss-20b, aimed at generating high-quality Java code by leveraging the SonarSweep pipeline to improve training data quality, thus reducing the presence of bugs and vulnerabilities without increasing the model size or latency. This model demonstrates a significant improvement in code quality, achieving a ~41% reduction in bugs and security vulnerabilities, and an ~18% reduction in code smells compared to the base model, while maintaining functional correctness and general question-answering capabilities. The fine-tuning process involved optimizing a dataset of 70k Java examples to follow best coding practices, resulting in a model tailored for speed and standard tasks rather than complex reasoning, and serving as a testament to how high-quality training data can enhance the reliability and safety of LLM-generated code. The release invites the community to explore the model further on HuggingFace and offers insights into the advantages of using SonarSweep for fine-tuning to improve the security and maintainability of code produced by language models.