Home / Companies / Upstash / Blog / Post Details
Content Deep Dive

Using Apache Spark with Serverless Kafka

Blog post from Upstash

Post Details
Company
Date Published
Author
Omer Aytac
Word Count
2,508
Language
English
Hacker News Points
-
Summary

A blog post outlines the development of a simple data pipeline using serverless Kafka, Apache Spark, and Cassandra to collect and process real-time data from a React Native mobile app. The pipeline begins with the app generating logs as users interact with products, which are then captured by serverless Kafka. Apache Spark, a distributed processing tool, streams these logs to the Cassandra database, where they are stored for further analysis. The post explains the setup and configuration of each component, including the creation of keyspaces and tables in Cassandra to store the log messages and their timestamps. Two streaming methods are explored: the legacy Spark DStream and the newer Structured Streaming, both implemented in Java. The post concludes by highlighting the utility of data pipelines in collecting, processing, and storing data to gain insights into product performance and user interaction.