Secrets Story: The Prefixed Secrets That Tried%20to%2BGet\nAway
Blog post from Semgrep
Secret scanning tools, essential for application security, often miss valid secrets due to design choices aimed at minimizing false positives, such as reliance on non-word boundaries and keywords. This leads to undetected leaks of sensitive data like API keys and tokens across platforms such as GitHub, OpenAI, and Anthropic. The blog post explores how secret scanners work, their methods to reduce false positives, and the resultant false negatives with examples from real repositories. Issues arise from prefix collisions, lack of unique identifiers, and overly strict boundary checks, which prevent detection of legitimate secrets. Recommendations include refining detection rules, ensuring precise token format specifications, and encouraging third-party services to document token formats and establish verification endpoints. The post also suggests that services should consider monitoring public repositories and implementing measures like token expiration to mitigate risks associated with leaked secrets.