monday Service + LangSmith: Building a Code-First Evaluation Strategy from Day 1
Blog post from LangChain
Monday.com has developed an evals-driven development framework for its AI Native Enterprise Service Management platform, designed to automate and resolve inquiries across various service departments. By integrating evaluation as a core component from the outset, the company has significantly accelerated feedback loops, enabling comprehensive testing across numerous examples in minutes rather than hours. The framework employs a dual-layered evaluation approach: offline evaluations act as a safety net, using curated datasets to ensure core logic and specific edge cases are robust, while online evaluations provide continuous quality monitoring in real-time production environments. The evaluations are managed as version-controlled code, using GitOps-style CI/CD deployment, which enhances agent observability and ensures high-quality AI interactions. The platform's architecture allows for parallel and concurrent testing, drastically improving evaluation speed and efficiency. This structured approach to evaluations, with tools like LangSmith and Vitest, reflects a commitment to rigorous testing standards akin to production code management, ensuring the AI service workforce remains reliable and adaptable to various enterprise service management use cases.