Enriching your Snowflake data with external functions
Blog post from Fivetran
Data enrichment involves enhancing customer data with additional context, but integrating it into existing Extract, Load, Transform (ELT) pipelines poses challenges such as maintaining data freshness, consistency, and managing resource usage. At Fivetran Activations, this process is optimized using the Clearbit API, Snowflake External Functions, and dbt to enrich large datasets efficiently. The approach distinguishes between enriching data via third-party datasets, which can be joined within a data warehouse, and APIs that provide data on a request basis, posing more challenges. Snowflake's external functions enable calling external APIs, like Clearbit's, within SQL queries, which are then managed through AWS infrastructure. This setup allows for caching to save API quotas and improve performance, with dbt managing incremental updates to maintain enriched data. Implementing this in production can be complex, requiring careful consideration of system components and potential pitfalls such as creating cycles in dbt's dependency graph, which are circumvented using source references. This method ultimately streamlines incorporating API data into declarative ELT workflows, presenting a promising alternative to traditional enrichment jobs, though it necessitates a good understanding of the involved technologies.