🚂 On Track with Apache Kafka – Building a Streaming ETL Solution with Rail Data

Company

Confluent

Date Published

Oct. 16, 2019

Author

Robin Moffatt

Word count

3127

Language

English

Hacker News points

None

URL

www.confluent.io/blog/build-streaming-etl-solutions-with-kafka-and-rail-data

Summary

Apache Kafka is used to build a powerful data system that ingests events from an external system, enriches with other data, transforms, and drives both analytics and real-time notification applications. The system uses KSQL for data transformations, streaming to target databases using Kafka Connect, and Elasticsearch for interactive dashboards. The data pipeline includes reserializing JSON data to Avro schema, flattening nested columns, and resolving foreign keys such as location codes. Event time is used instead of system time when aggregating and filtering on timestamps in the event stream. Kafka Connect is used to stream enriched data to target systems. The benefits of using Apache Kafka include its robust log-based architecture, scalability, and versatility, allowing users to handle both streaming and queue processing.