Home / Companies / ScyllaDB / Blog / Post Details
Content Deep Dive

Making Shard-Aware Drivers for CDC

Blog post from ScyllaDB

Post Details
Company
Date Published
Author
Piotr Dulikowski
Word Count
1,121
Language
English
Hacker News Points
-
Summary

Change Data Capture (CDC) in ScyllaDB, which became production-ready in version 4.3, allows users to track and respond to data changes via a CQL-compatible interface, enabling existing tools or drivers to process CDC data. The implementation of CDC involves creating a "CDC log" table for each CDC-enabled table, where changes in the base table are logged in streams corresponding to portions of the token ring. Initially, stream IDs were generated using a computationally expensive Las Vegas-type algorithm, but a new deterministic approach now simplifies this process by directly encoding tokens into stream IDs. However, this new method can confuse shard-aware drivers that still use MurmurHash3, potentially leading to increased latency when queries are sent to incorrect nodes or shards. The issue is addressed by teaching drivers to detect custom partitioning schemes and correctly compute tokens for partition keys. ScyllaDB is updating its drivers to support this feature, with GoCQL and Java drivers already updated, and plans for C++, Python, and Rust drivers underway.