Home / Companies / GitGuardian / Blog / Post Details
Content Deep Dive

Secrets Detection - Optimizing filter processes

Blog post from GitGuardian

Post Details
Company
Date Published
Author
Guardians
Word Count
1,514
Language
English
Hacker News Points
-
Summary

In the article, Henri Hubert discusses optimizing the performance of GitGuardian's secret detection engine, which involves balancing precision, recall, and speed. The engine is divided into three stages: prevalidation, matching, and postvalidation, with prevalidation being the most time-consuming due to its high frequency of calls, despite its low per-call duration. By analyzing benchmarks, the team identified that prevalidation could be improved by caching frequently accessed properties, reordering steps for efficiency, and incorporating lightweight keyword searches. These optimizations led to significant speed improvements without sacrificing precision or recall. The study underscores the importance of the initial filtering steps in data processing pipelines, emphasizing that each step should have a clear, singular purpose to maintain system clarity and adaptability. Additionally, the article hints at further optimizations through regex engine experimentation, which will be discussed in future publications.