The perfect data ingestion API design
Blog post from Tinybird
The discussed data ingestion API is designed to be user-friendly and compatible with web technologies, accepting NDJSON and JSON formats while potentially considering Parquet support. It emphasizes a schema-based approach to enhance efficiency by transforming attributes into columns in a columnar database, significantly improving storage and processing performance compared to schemaless methods. The API provides an acknowledgment when data is received and stored, ensuring reliability through idempotency, allowing retries within a five-hour window without duplicating data entries. It incorporates buffering to optimize performance and manage database overloads, making it suitable for handling high query per second (QPS) loads and large payloads, while still enabling near real-time data availability, typically within four seconds or less. The API's design aims to address common ingestion challenges while inviting feedback on its effectiveness.