Governing your lakehouse with Fivetran Managed Data Lake Service
Blog post from Fivetran
Fivetran's Managed Data Lake Service simplifies data governance on data lakes by ensuring consistent metadata management and access control while addressing common governance challenges such as snapshot sprawl, orphaned files, and schema drift. The service writes data in Parquet format and maintains both Iceberg and Delta Lake metadata, centralizing control by making Fivetran the sole writer of metadata, which it updates during every sync. This structure prevents inconsistencies across multiple query engines and external catalogs, with a REST Catalog serving as the single source of truth. Access control is enforced through OAuth2 for catalog access, scoped S3 credentials, and cross-account IAM roles, while optional features like AWS PrivateLink provide network-level governance. The service supports multi-engine access by allowing integration through Iceberg REST protocol or AWS Glue, with automatic synchronization preventing drift. Lifecycle management is automated, with configurable snapshot retention, metadata file management, and orphan file cleanup, all while ensuring schema governance through automated table maintenance. The service extends beyond AWS, supporting Azure and Google Cloud with consistent governance strategies, emphasizing Fivetran's control over metadata to maintain data integrity and compliance across cloud environments.