Getting started with MongoDB, PySpark, and Jupyter Notebook

Company

MongoDB

Date Published

Oct. 9, 2020

Author

Robert Walters

Word count

1264

Language

English

Hacker News points

None

URL

www.mongodb.com/blog/post/getting-started-with-mongodb-pyspark-and-jupyter-notebook

Summary

A JupyterLab notebook was created to leverage MongoDB data in conjunction with PySpark, an open-source general-purpose cluster-computing framework that efficiently processes large-scale data. The notebook loaded financial security data from MongoDB using the MongoDB Spark Connector and PySpark, calculated a moving average based on the price of the stock security, and updated the data in MongoDB with the new calculation. The environment was set up to include a MongoDB cluster, an Apache Spark deployment, and JupyterLab, allowing for seamless integration and ad-hoc queries. The example demonstrates how easy it is to integrate MongoDB data within a Spark data science application, showcasing the capabilities of the MongoDB Connector for Spark.