How to Process GitHub Data with Kafka Streams

Post Details

Company

Confluent

Date Published

March 26, 2024

Author

Lucia Cerchie, Bill Bejeck

Word Count

1,528

Language

English

Hacker News Points

-

Source URL

www.confluent.io/blog/process-github-data-with-kafka-streams

Summary

The text discusses using Apache Kafka to track events in a large codebase, specifically GitHub's data sources (REST + GraphQL APIs). It explains how to use the Confluent GitHub source connector to get GitHub events into a Kafka topic and then process those events using Kafka Streams topology. The author also provides an overview of data pipelines, sources, and sinks, as well as details on implementing a state store in Kafka Streams. Furthermore, the text touches upon extending the project by adding a sink and mentions other resources for learning more about Kafka demos, Flink SQL tutorials, and resolving "unknown magic byte" errors.