Home / Companies / ScyllaDB / Blog / Post Details
Content Deep Dive

Analyzing flight delays with ScyllaDB on top of Spark

Blog post from ScyllaDB

Post Details
Company
Date Published
Author
Seva Morotskiy
Word Count
2,095
Language
English
Hacker News Points
-
Summary

The blog post by Seva Morotskiy demonstrates how to utilize the Spark Scala API in conjunction with ScyllaDB to analyze flight delays and cancellations using a public dataset from the Research and Innovative Technology Administration (RITA). The dataset, containing approximately 120 million records of U.S. commercial flight data from 1987 to 2008, is processed to extract average arrival and departure delays, and cancellations for various air carriers. The analysis involves loading the dataset into ScyllaDB using the "Loader" module and then querying it with the "Extractor" module to identify top destinations and carriers with the highest delays and cancellations. The post details the setup and configuration of the required environments, including ScyllaDB, Java, Scala, Sbt, and Spark, and provides code snippets for executing the analysis. The analysis reveals insights such as the top three destinations with the highest average arrival delays and the carriers with the most cancellations, showcasing ScyllaDB's capability to handle large datasets efficiently for real-time data analysis in the airline industry.