Hooking up Spark and ScyllaDB: Part 3

Post Details

Company

ScyllaDB

Date Published

Oct. 8, 2018

Author

Itamar Ravid

Word Count

1,511

Language

English

Hacker News Points

-

Source URL

www.scylladb.com/2018/10/08/hooking-up-spark-and-scylla-part-3

Summary

In this part of the blog series on integrating Apache Spark with ScyllaDB, Itamar Ravid explores how Spark DataFrames can be written back to ScyllaDB, building upon previous discussions on reading data. The article provides a practical guide on writing Spark DataFrames to ScyllaDB using the DataStax Cassandra connector, highlighting the importance of schema compatibility and the potential issues that can arise from data type mismatches. It details the process of creating DataFrames, mapping their schemas to ScyllaDB tables, and executing write operations in parallel across partitions. Additionally, the post emphasizes the laziness of data fetching during these operations, which allows for efficient memory usage by only loading necessary data batches. The blog concludes with a teaser for the next installment, which will cover streaming workloads and their integration with ScyllaDB.