As generative AI applications move from prototypes to production, evaluating large language models (LLMs) is essential for success, with current state-of-the-art techniques combining automated and human assessments, although human evaluation remains resource-intensive. To address this, Labelbox and Google Cloud have collaborated to offer an integrated, managed service within the Vertex AI platform, allowing users to efficiently evaluate LLMs through a streamlined process that includes selecting evaluation types and criteria, and receiving quality-reviewed results swiftly. This service gives customers access to human raters for extensive evaluation across various criteria and integrates seamlessly with existing Google Cloud services such as BigQuery and CloudSQL. Furthermore, Labelbox offers a full suite of products available on the Google Cloud Marketplace, including AI-assisted labeling, data curation, and model diagnostics, which enhance the development of intelligent applications by blending AI assistance with human oversight. This partnership aims to simplify and enhance LLM development by incorporating human evaluation seamlessly, allowing organizations to focus on creating and shipping AI products with reduced manual effort.