Opik, an open-source LLM evaluation framework, enhances AI applications through a Human-in-the-Loop annotation workflow that combines human insight with scalable evaluation and observability. This system is particularly beneficial for developers working with agentic AI applications, which involve complex, multi-step processes and require end-to-end evaluation rather than just trace-level checks. Opik allows developers to collect expert feedback at scale by facilitating low-friction interaction between domain experts and AI systems, enabling them to flag issues, rate conversations, and leave comments. This feedback is then seamlessly integrated into the workflow to refine prompts, models, and overall system behavior. By automating this feedback into an LLM-as-a-Judge metric, Opik allows the AI to self-improve, reflecting the reasoning of domain experts. This innovative approach ensures that AI systems not only align with user goals but also adapt to real-world complexities, thus enhancing their reliability and effectiveness across various industries.