Company
Date Published
Author
Matt Kalan
Word count
986
Language
English
Hacker News points
3

Summary

In this introduction and setup part of a three-part series on using MongoDB with Hadoop and Spark, Matt Kalan explains the value of combining these technologies for big data analytics and machine learning. He discusses a simple example of aggregating 1-minute intervals of stock prices into 5-minute intervals, where the data is stored in MongoDB and processed in Hive or Spark via the MongoDB Hadoop Connector. The author sets up a Cloudera VM environment, downloads sample data, installs MongoDB and the Hadoop Connector, and loads the data into MongoDB. He then provides steps to set up the environment for further analysis using Hive and Spark.