Company
Date Published
Author
MongoDB
Word count
1570
Language
English
Hacker News points
None

Summary

The MongoDB team, Sweet Song and Daniel Alabi, performed a project using the Flights dataset to compute the PageRank of all airports in the dataset. They chose to use the MapReduce framework to parallelize the computation. The goal was to create a new collection for every iteration of PageRank, with the algorithm stopping once the average percentage change of the PageRank values dropped below 0.1%. The team created a graph of airports, where each document in the Flights dataset represented an edge between two airports, and used this graph to compute the PageRank of each airport. The computation took 6.203 seconds to converge after 20 iterations. The resulting PageRank scores were correlated with the number of flights between airports, as expected. However, the team encountered challenges, including the fact that updates in large collections are slower than inserts, and that the dataset lacked information on international flights. Despite these limitations, the project demonstrated the effectiveness of using MongoDB's MapReduce framework to compute complex algorithms like PageRank.