How Refinery Helps With Sampling Complex Event Data
Blog post from Honeycomb
Sampling is a method used to extract a subset of data from a larger dataset to make inferences about the whole, and while it is not flawless, it can be highly effective in managing large volumes of complex event data when implemented using Honeycomb’s trace-aware sampling proxy, Refinery. GOAT, an e-commerce platform specializing in designer sneakers and apparel, exemplifies the need for an efficient sampling solution due to its high volume of customer-facing requests. At the 2021 hnycon, Kevan Carstensen, a Backend Engineer at GOAT, shared that their small team relies on sampling with Refinery to manage this data volume, cut through noise, and resolve issues efficiently. However, Kevan emphasized that sampling is not universally suitable and should not be the default choice without understanding its nuances and potential drawbacks, such as increased cognitive load and maintenance requirements. GOAT’s implementation of Refinery involved integrating it into their internal Platform as a Service (PaaS) and tuning it to meet their specific load requirements, leading to enhanced visibility and cost management. The process highlighted the importance of rules-based sampling to optimize event quotas and budgets, allowing GOAT to maintain a stable infrastructure while focusing on insights provided by the sampling data.