Company
Date Published
Author
David Pilato
Word count
657
Language
-
Hacker News points
None

Summary

In this third installment of a series on enriching postal addresses using the Elastic Stack, David Pilato demonstrates how to enhance an existing dataset by integrating the BANO dataset with Logstash. The process begins by reading a CSV file with Filebeat instead of using the http-input plugin, configuring a beat-input plugin to handle file input. A CSV filter is then applied to parse the data, which includes geolocation points, and enrich it by sorting based on geographical distance. The enrichment process achieves a rate of approximately 140 documents per second, with an average event latency of 20-40 ms. Despite some slowdown due to Elasticsearch lookups, the method remains efficient for ETL operations compared to using Elasticsearch as an ingest pipeline. Additionally, Pilato suggests alternatives like reading data from SQL databases using a jdbc-input plugin or connecting to existing Elasticsearch data for further enrichment. The series concludes with the prospect of indexing other open data sources to cover regions beyond France.