Enriching Your Postal Addresses With the Elastic Stack - Part 3

Post Details

Company

Elastic

Date Published

May 21, 2018

Author

David Pilato

Word Count

657

Language

-

Hacker News Points

-

Source URL

www.elastic.co/blog/enriching-your-postal-addresses-with-the-elastic-stack-part-3

Summary

In this third installment of a series on enriching postal addresses using the Elastic Stack, David Pilato demonstrates how to enhance an existing dataset by integrating the BANO dataset with Logstash. The process begins by reading a CSV file with Filebeat instead of using the http-input plugin, configuring a beat-input plugin to handle file input. A CSV filter is then applied to parse the data, which includes geolocation points, and enrich it by sorting based on geographical distance. The enrichment process achieves a rate of approximately 140 documents per second, with an average event latency of 20-40 ms. Despite some slowdown due to Elasticsearch lookups, the method remains efficient for ETL operations compared to using Elasticsearch as an ingest pipeline. Additionally, Pilato suggests alternatives like reading data from SQL databases using a jdbc-input plugin or connecting to existing Elasticsearch data for further enrichment. The series concludes with the prospect of indexing other open data sources to cover regions beyond France.