Migrate Parquet Files with the ScyllaDB Migrator

Post Details

Company

ScyllaDB

Date Published

June 10, 2020

Author

Itamar Ravid

Word Count

575

Language

English

Hacker News Points

-

Source URL

www.scylladb.com/2020/06/10/migrate-parquet-files-with-the-scylla-migrator

Summary

ScyllaDB has announced an enhancement to its open-source ScyllaDB Migrator, allowing users to import Apache Parquet files directly into ScyllaDB tables, leveraging Apache Spark's distributed execution model for parallel data insertion. Previously, the Migrator supported data loading from Cassandra or DynamoDB into ScyllaDB and migrations between ScyllaDB clusters. The new feature uses Spark's DataFrame abstraction to load Parquet files stored on AWS, requiring users to configure their source settings and execute the migration through a Spark cluster. This development is part of ScyllaDB’s broader plan to expand the Migrator's capabilities with more data source and target types, aiming to transform it into a versatile database-to-database migration tool.