Company
Date Published
Author
Tal Gluck
Word count
2079
Language
English
Hacker News points
None

Summary

In this blog post, the author discusses how to integrate and validate NYC real estate sales data from multiple sources, including GlareDB Cloud, local files, and Postgres, with NYC tree census data stored in Snowflake to explore the correlation between sold properties and nearby trees. The process involves using Great Expectations (GX) to perform data quality checks and create an Expectation Suite, which ensures the data meets certain criteria before being loaded into tables. The author provides a step-by-step guide on setting up a GlareDB connection, joining data from different sources, and using GX to validate data assumptions, such as the number of trees near properties. The post emphasizes the importance of these validations by demonstrating how assumptions about the data can be tested and adjusted when new sales data becomes available, thereby ensuring data integrity before integration into larger data pipelines. Additionally, the author hints at future posts exploring further integrations with data tools like dbt and invites readers to engage with GlareDB through various platforms.