Code Suggestion Attribution
Blog post from Windsurf
Public code repositories often include a LICENSE file, detailing whether the code is covered by permissive or non-permissive licenses, such as MIT or GPL. Permissive licenses allow for commercial use, while non-permissive licenses require sharing the source code and can have legal consequences if violated. The rise of large language models (LLMs) complicates this, as developers using AI-generated code could unknowingly use non-permissive code, risking liability. To address this, AI code generation tools must ensure compliance by either filtering training data (proactively) or filtering generated suggestions (reactively). One method involves precomputing fingerprints of public codebases to identify matches during code generation, which must occur swiftly to meet latency demands. Additionally, audit logs can track code matches at individual and team levels. Codeium offers attribution filters for enterprise customers and is exploring improvements in matching algorithms, configurability for specific repositories, and compliance solutions for companies using generative AI.