A deep dive into the TileDB data format & storage engine

Post Details

Company

TileDB

Date Published

Sept. 21, 2021

Author

Stavros Papadopoulos

Word Count

848

Language

English

Hacker News Points

-

Source URL

www.tiledb.com/blog/a-deep-dive-into-the-tiledb-data-format-storage-engine

Summary

A recent webinar hosted by Stavros Papadopoulos, the founder and CEO of TileDB, delved into the features and applications of TileDB Embedded, an open-source storage engine integral to the TileDB Cloud. The session provided a comprehensive overview of using TileDB with Python, R, and SQL, showcasing its ability to efficiently handle multi-dimensional arrays for data storage and access. Highlighting key topics such as dense and sparse arrays, data format basics, and advanced internal mechanics like versioning and schema evolution, the webinar also introduced two new features: attribute filter condition push-down and schema evolution. Participants were offered insights into the advantages of using TileDB over other storage systems like HDF5 and Parquet, and were provided with access to Jupyter notebooks for hands-on practice, either by downloading them or running them directly via TileDB Cloud, with $10 in free credits available upon signing up. The webinar emphasized how TileDB's design optimizes data layout to enhance input/output performance across various storage backends, particularly in cloud environments like AWS S3 and Google Cloud Storage.