From “Secondary Storage” To Just “Storage”: A Tale of Lambdas, LZ4, and Garbage Collection
Blog post from Honeycomb
Two years after introducing Secondary Storage as a cost-effective but slower alternative to primary NVMe storage, Honeycomb has dramatically improved query performance by incorporating AWS Lambda and optimizing data processing. Initially, Secondary Storage used gzip compression for data transferred to S3, leading to significant delays, as queries required extensive CPU resources that were not matched by an equivalent increase in compute power. By leveraging Lambda, which provides on-demand CPUs in 100ms increments, Honeycomb is now able to process large queries concurrently, significantly speeding up performance without the prohibitive cost of permanent resources. Additionally, switching from gzip to LZ4 compression and adopting more efficient memory management techniques have further reduced query times and CPU overhead, making secondary storage queries nearly as fast as those using primary storage. These advancements, which include optimizing code and reducing unnecessary memory allocations, have effectively doubled backend query performance, allowing Honeycomb to offer faster and more efficient data querying solutions.