Understanding lag in a streaming pipeline
Blog post from New Relic
In data-processing pipelines where timeliness is crucial, New Relic addresses unexpected latency using various strategies to ensure functional correctness and real-time observability. The company operates a large-scale streaming ETL pipeline that ingests, processes, and stores data, with customers depending on timely alerts for system behavior insights. To handle latency, New Relic implements techniques such as maintaining data order, buffering and reordering, and dynamically adapting buffering based on latency tracking. Strategies include using traffic flow metadata, synthetic "time waves" to detect lag, and external monitoring of pipeline segments. Each method has its challenges, like the difficulty in detecting blockages or ensuring coverage across all paths. By combining and scoping these strategies, New Relic can adjust data processing dynamically to minimize latency impact, which is vital for maintaining the accuracy and reliability of alerts and overall system performance.