Why Use Spark With NoSQL Databases?

Company

Couchbase

Date Published

June 6, 2016

Author

Will Gardella, Director, Product Management, Couchbase

Word count

1440

Language

English

Hacker News points

None

URL

www.couchbase.com/blog/why-spark-and-nosql

Summary

Spark is a big data processing framework that provides analytics, machine learning, graph processing, and more on top of large volumes of data, but it's not a database. It reads data from various sources like HDFS, Amazon S3 or Couchbase Server, processes it, and then writes the results out for further use. Spark is designed to handle high throughput but at the expense of latency. Combining Spark with a NoSQL database like Couchbase can solve different problems and provides benefits such as faster access to operational data, powerful query language, native SDKs, sub-millisecond latencies, elastic scaling, ease of administration, XDCR, and high availability. The Couchbase Spark Connector provides an open-source integration between the two technologies, reducing first round trip data processing time, enabling fast performance, and supporting a range of data access methods. Additionally, Spark can be used as a toolkit for solving data integration challenges, allowing developers to combine data from multiple sources and feed it to applications or other consumers in a convenient and scalable way.