Using MongoDB with Hadoop & Spark: Part 1 - Introduction & Setup

Company

MongoDB

Date Published

Feb. 17, 2015

Author

Matt Kalan

Word count

986

Language

English

Hacker News points

URL

www.mongodb.com/blog/post/using-mongodb-hadoop-spark-part-1-introduction-setup

Summary

In this introduction and setup part of a three-part series on using MongoDB with Hadoop and Spark, Matt Kalan explains the value of combining these technologies for big data analytics and machine learning. He discusses a simple example of aggregating 1-minute intervals of stock prices into 5-minute intervals, where the data is stored in MongoDB and processed in Hive or Spark via the MongoDB Hadoop Connector. The author sets up a Cloudera VM environment, downloads sample data, installs MongoDB and the Hadoop Connector, and loads the data into MongoDB. He then provides steps to set up the environment for further analysis using Hive and Spark.