Tutorial for Operationalizing Spark with MongoDB

Company

MongoDB

Date Published

Oct. 26, 2015

Author

Matt Kalan

Word count

2466

Language

English

Hacker News points

None

URL

www.mongodb.com/blog/post/tutorial-for-operationalizing-spark-with-mongodb

Summary

The text discusses the operationalization of Apache Spark with MongoDB, a NoSQL database. The tutorial covers setting up a Spark environment with MongoDB, reading data from MongoDB using Spark DataFrames, and writing data to MongoDB. The benefits of using MongoDB as an input or output for Hadoop jobs are highlighted, including the ability to define secondary indexes for fast data retrieval and low latency reporting. The text also discusses how to run Spark queries on any slice of data in MongoDB without table scans, leveraging the power of Spark and the indexing capabilities of MongoDB. The tutorial demonstrates how easily the power of Spark can be combined with the power of MongoDB for operational requirements of analytics and a data lake environment.