Home / Companies / Yugabyte / Blog / Post Details
Content Deep Dive

Export and Import Data with Azure Databricks and YugabyteDB

Blog post from Yugabyte

Post Details
Company
Date Published
Author
Balachandar Seetharaman
Word Count
1,015
Language
English
Hacker News Points
-
Summary

Azure Databricks is a cloud analytics platform that can export and import data from a YugabyteDB database to supported file formats such as Avro and Parquet, helping developers and data engineers build full end-to-end data analytics workloads. To achieve this, Azure Databricks uses its File System (DBFS) and Data Lake Storage (ADLS), along with Spark clusters that can be auto-scaled based on data requirements. The import process involves reading Parquet or Avro files from DBFS or ADLS into a DataFrame, importing the data into YugabyteDB tables without transformation, and querying the table to confirm the data has been imported. Similarly, the export process involves loading data from YugabyteDB into a DataFrame, exporting it into Parquet or Avro format, and saving it into the Azure Databricks DBFS folder or ADLS. The pre-requisites for this include creating a Spark Cluster in Azure Data bricks and configuring it with necessary PostgreSQL or Yugabyte Cassandra Driver through Maven or other repositories.