Hooking up Spark and ScyllaDB: Part 1

Company

ScyllaDB

Date Published

July 31, 2018

Author

Itamar Ravid

Word count

3899

Language

English

Hacker News points

None

URL

www.scylladb.com/2018/07/31/spark-scylla

Summary

The blog post introduces a series focusing on integrating Apache Spark and ScyllaDB, outlining key components and functionalities of both systems. Spark is described as a powerful platform for distributed data processing, utilizing Resilient Distributed Datasets (RDDs) for parallel computations across cluster nodes. The post details how to run Spark using Docker, emphasizing the importance of understanding Spark's architecture, including its execution model and the role of RDDs. ScyllaDB, a high-performance NoSQL database compatible with Cassandra, is highlighted for its efficient data management and retrieval capabilities, particularly when using the Cassandra Query Language (CQL). The post also discusses the Datastax Spark/Cassandra Connector, which facilitates data transfer between Spark and ScyllaDB, enabling users to perform complex analytical tasks by leveraging the strengths of both technologies. This initial overview sets the stage for deeper exploration of specific topics in subsequent posts.