Run Real-Time Applications with Spark and the SingleStore Spark Connector

Company

SingleStore

Date Published

Feb. 10, 2015

Author

Wayne Song

Word count

325

Language

English

Hacker News points

None

URL

www.singlestore.com/blog/memsql-spark-connector

Summary

Apache Spark is a powerful distributed computing framework that excels at processing large datasets, but it requires a solution for data persistence. To address this, the SingleStore team has released the SingleStore Spark connector, which enables seamless integration between Spark and SingleStore. This connector provides several optimizations, including parallel reading of data from SingleStore and colocating data with SingleStore nodes on the same physical machines. It also offers two main components: a `SingleStoreRDD` class for loading data from SingleStore queries and a `saveToSingleStore` function for persisting results to SingleStore tables. The connector is open source, and a comprehensive 79-page guide provides code samples and performance recommendations for deploying Spark applications with SingleStore.