Using parallel Logstash pipelines to improve persistent queue performance

Company

Elastic

Date Published

Nov. 14, 2019

Author

Alex Marquardt

Word count

948

Language

English

Hacker News points

None

URL

www.elastic.co/blog/using-parallel-logstash-pipelines-to-improve-persistent-queue-performance

Summary

Logstash uses in-memory queues for buffering events between pipeline stages, but enabling persistent queues to prevent data loss can significantly impact performance, particularly due to the single-threaded nature of disk I/O. This performance drop is highlighted by a case where throughput fell by 75% when persistent queues were implemented. The slowdown occurs because a single pipeline cannot drive the disk with more than one thread, even if there are multiple inputs. To mitigate this, the blog suggests running multiple parallel Logstash pipelines within a single process and load balancing the input data across them, which can increase throughput by allowing more simultaneous disk I/O operations. In a real-world scenario, implementing four parallel pipelines improved performance significantly, bringing throughput closer to levels without persistent queues, albeit still 25% lower. This approach effectively overcomes the limitations of single-threaded disk I/O by increasing the number of concurrent threads writing to the disk.