We built SmithDB, the data layer for agent observability
Blog post from LangChain
SmithDB is a newly launched distributed database specifically designed to enhance agent observability for LangSmith, offering significant performance improvements by handling complex, large-scale trace data generated by modern AI agents. Built with Rust and utilizing the Apache DataFusion query engine, SmithDB supports advanced query patterns necessary for analyzing intricate agent behaviors, such as random access, interactive filtering, and full-text search, while maintaining low latency. Its architecture comprises object storage for trace data, a Postgres metastore for metadata management, and stateless services for ingestion and query processing, allowing it to efficiently manage workloads and scale across self-hosted and multi-cloud environments. SmithDB's implementation of a log-structured merge tree (LSM) and time-tiered compaction strategy optimizes for both write latency and query efficiency, while its innovative approach to handling large, unbounded payloads through late materialization and custom inverted indexing ensures fast query performance. The database has already been adopted by major clients like Clay, Vanta, Unify, and Cogent, who have noted significant improvements in speed and usability when analyzing large projects, making it a critical component in the ongoing development and evaluation of AI agents.