Export and Import Data with Azure Databricks and YugabyteDB

Company

Yugabyte

Date Published

Sept. 26, 2022

Author

Balachandar Seetharaman

Word count

1015

Language

English

Hacker News points

None

URL

www.yugabyte.com/blog/export-import-data-with-azure-databricks-yugabytedb

Summary

Azure Databricks is a cloud analytics platform that can export and import data from a YugabyteDB database to supported file formats such as Avro and Parquet, helping developers and data engineers build full end-to-end data analytics workloads. To achieve this, Azure Databricks uses its File System (DBFS) and Data Lake Storage (ADLS), along with Spark clusters that can be auto-scaled based on data requirements. The import process involves reading Parquet or Avro files from DBFS or ADLS into a DataFrame, importing the data into YugabyteDB tables without transformation, and querying the table to confirm the data has been imported. Similarly, the export process involves loading data from YugabyteDB into a DataFrame, exporting it into Parquet or Avro format, and saving it into the Azure Databricks DBFS folder or ADLS. The pre-requisites for this include creating a Spark Cluster in Azure Data bricks and configuring it with necessary PostgreSQL or Yugabyte Cassandra Driver through Maven or other repositories.