Building reliable secrets detection - Secrets in source code (episode 3/3)

Post Details

Company

GitGuardian

Date Published

Dec. 18, 2020

Author

Mackenzie Jackson

Word Count

2,379

Language

English

Hacker News Points

2

Source URL

blog.gitguardian.com/secrets-in-source-code-episode-3-3-building-reliable-secrets-detection

Summary

The article from GitGuardian discusses the intricate challenges and methodologies involved in detecting secrets such as API keys and credentials within code repositories, particularly on platforms like GitHub. Secrets sprawl, where sensitive data is dispersed across systems and codebases, poses significant security risks due to the need for both controlled access and widespread distribution. Detecting these secrets is complex because they rarely follow a consistent pattern, requiring sophisticated algorithms to identify potential candidates. GitGuardian employs a two-step process: first identifying potential secrets using methods like high entropy detection and regular expressions, and then filtering these to exclude false positives through various techniques, including API validation and context analysis. The article emphasizes the importance of using a combination of methods for reliable detection and filtering, with GitGuardian leveraging its extensive data set from scanning billions of GitHub commits to refine its algorithms. This large-scale data analysis allows GitGuardian to better identify the subtle indicators that differentiate true positives from false positives, giving it an edge in effectively managing secrets detection.