Company
Date Published
Author
Sharath Vandanapu, Olivia Greene, Keshav Mathur, Ahmed Saef Zamzam, Prabha Manepalli, Shay Lin, Weifan Liang
Word count
2409
Language
English
Hacker News points
None

Summary

The Salesforce data streaming pipeline described in the text utilizes a combination of Apache Kafka, Confluent Cloud connectors, ksqlDB, and BigQuery to process and load Salesforce data into a real-time data warehouse for analytics purposes. The pipeline uses change data capture (CDC) events from Salesforce to stream data into raw topics, which are then processed by ksqlDB applications to filter out gap events and reconcile records. The reconciliation stream is then combined with raw CDC data in BigQuery to form a complete snapshot of the Salesforce data. The pipeline also leverages Confluent Cloud connectors for Salesforce data ingestion, including CDC Source, Bulk API Source, PushTopic Source, Platform Event Source and Sink, SObjects Sink, and more. Additionally, the text discusses handling gap events by making API calls to Salesforce and using ksqlDB applications to process these events in real-time.