In the realm of AI observability, traditional metrics of performance such as metrics, logs, and traces have evolved to address the challenges associated with AI systems, which are probabilistic and data-coupled, requiring a focus on traces, evals, and annotation. At Braintrust, a focus on these pillars enables a comprehensive understanding and improvement of AI systems. Traces reconstruct the decision paths across model calls and other components, while evals measure performance both in production and development settings to facilitate systematic improvements. Annotation involves expert input to correct and refine AI behavior, and this data is used to enhance system performance continuously. Braintrust has developed Brainstore, a database purpose-built for handling large-scale AI data, enabling efficient tracing, evaluation, and annotation workflows. This approach shifts the focus from merely determining system uptime to assessing the quality of AI outputs, fostering collaboration among engineers, product managers, and domain experts to ensure that AI systems are both reliable and aligned with user expectations.