ScyllaDB Student Projects, Part I: Parquet
Blog post from ScyllaDB
In 2019, ScyllaDB supported a student program at the University of Warsaw, where undergraduate Computer Science students collaborated with ScyllaDB engineers to enhance ScyllaDB and its Seastar engine, with projects such as implementing Parquet support. The project aimed to adapt the Apache Parquet format for use with ScyllaDB, which traditionally uses SSTables, to offer more efficient data storage options. The students developed a new library, parquet4seastar, from scratch to integrate Parquet with Seastar's asynchronous framework, facilitating low-latency operations. They also created a demo application, parquet2cql, which translates Parquet files into CQL queries for ScyllaDB. Testing showed promising results in terms of performance and storage efficiency, particularly where Parquet outperformed SSTables in scenarios with fewer unique data values. The work not only contributed to the students' Bachelor’s thesis but also laid the groundwork for future collaborations between ScyllaDB and the University of Warsaw.