なぜOpenAIはペタバイト規模のオブザーバビリティにClickHouseを選んだのか
Blog post from ClickHouse
OpenAI handles an immense volume of log data daily, comparable to billions of iPhone photos, necessitating a robust observability infrastructure to manage rapid data influx and ensure system reliability across its diverse operations like model research, ChatGPT, and enterprise APIs. To meet these demands, OpenAI uses ClickHouse, an open-source database solution chosen for its high performance, scalability, and flexibility, allowing them to manage complex queries and data spikes efficiently. During a significant traffic surge following the release of GPT-4o's image generation feature, OpenAI's team faced a critical challenge when CPU usage spiked, leading to performance issues. They resolved this by optimizing their Bloom filter operations, which reduced CPU usage by 40%, demonstrating the benefits of ClickHouse’s open-source nature and cloud-native design. Moving forward, OpenAI continues to enhance its observability system, focusing on improving query planning and developing more autonomous observability stacks, with the ultimate aim of integrating AI agents into incident response workflows.