Home / Companies / Pinecone / Blog / Post Details
Content Deep Dive

Garbage Day: How Pinecone Safely Deletes Billions of Objects at Scale

Blog post from Pinecone

Post Details
Company
Date Published
Author
Lea Wang-Tomic
Word Count
1,396
Language
English
Hacker News Points
-
Summary

Pinecone has developed a system named Janitor to efficiently manage and delete stale and orphaned data objects in their immutable blob storage infrastructure, which helps reduce storage costs. The immutable design means each write operation creates a new file rather than modifying existing ones, leading to an accumulation of unnecessary data that cannot be immediately deleted without risking service reliability. Janitor addresses this problem by categorizing deletions into three modes—normal, orphan, and customer deletion—each with its own safety protocols and timing to ensure data is safely removed. The system operates on a protocol of identify, verify, and execute, ensuring only verified data is deleted, and it employs a conservative approach to minimize errors. Janitor's robust testing framework uses property-based tests and a mock clock to simulate long-term behavior in seconds, enabling quick identification and resolution of bugs. The implementation of Janitor has made data deletion at Pinecone more predictable and auditable, alleviating previous challenges and ensuring that storage remains cost-effective and operationally efficient without impacting customer service.