Home / Companies / Elastic / Blog / Post Details
Content Deep Dive

Using parallel Logstash pipelines to improve persistent queue performance

Blog post from Elastic

Post Details
Company
Date Published
Author
Alex Marquardt
Word Count
948
Language
English
Hacker News Points
-
Summary

Logstash uses in-memory queues for buffering events between pipeline stages, but enabling persistent queues to prevent data loss can significantly impact performance, particularly due to the single-threaded nature of disk I/O. This performance drop is highlighted by a case where throughput fell by 75% when persistent queues were implemented. The slowdown occurs because a single pipeline cannot drive the disk with more than one thread, even if there are multiple inputs. To mitigate this, the blog suggests running multiple parallel Logstash pipelines within a single process and load balancing the input data across them, which can increase throughput by allowing more simultaneous disk I/O operations. In a real-world scenario, implementing four parallel pipelines improved performance significantly, bringing throughput closer to levels without persistent queues, albeit still 25% lower. This approach effectively overcomes the limitations of single-threaded disk I/O by increasing the number of concurrent threads writing to the disk.