Introducing Provenance and Attribution: Minimize IP liability for GenAI output
Blog post from Tabnine
State-of-the-art large language models (LLMs) such as Claude 3.5 Sonnet and GPT-4o have significantly enhanced generative AI applications like AI code assistants, yet they carry the risk of incorporating copyleft-licensed code due to their training on vast internet data, including potentially restricted code. Tabnine introduces a feature called Provenance and Attribution to mitigate this risk by checking AI-generated code against publicly visible GitHub repositories, flagging matches, and providing information about their source and license type, thus aiding software development teams in maintaining compliance. This feature offers two levels of protection: training time protection, using the Tabnine Protected 2 model trained exclusively on permissively licensed code, and inference time protection, which notifies users of LLM output matches with existing code. Provenance and Attribution supports a broad range of development activities and is designed to ensure compliance with specific project or organizational requirements by tracking code matches and allowing administrators to manage potential IP infringements. Currently in private preview for Tabnine Enterprise customers, this capability is poised to enhance legal and compliance teams' confidence in using varied models while ensuring the generated code remains compliant and risk-free.