Migrate Parquet Files with the ScyllaDB Migrator
Blog post from ScyllaDB
ScyllaDB has announced an enhancement to its open-source ScyllaDB Migrator, allowing users to import Apache Parquet files directly into ScyllaDB tables, leveraging Apache Spark's distributed execution model for parallel data insertion. Previously, the Migrator supported data loading from Cassandra or DynamoDB into ScyllaDB and migrations between ScyllaDB clusters. The new feature uses Spark's DataFrame abstraction to load Parquet files stored on AWS, requiring users to configure their source settings and execute the migration through a Spark cluster. This development is part of ScyllaDB’s broader plan to expand the Migrator's capabilities with more data source and target types, aiming to transform it into a versatile database-to-database migration tool.