ProfBench is a new benchmark designed to test large language models (LLMs) on complex, open-ended tasks requiring professional-grade knowledge across domains like Finance, Chemistry, and Physics, aiming to evaluate AI's ability to handle nuanced reasoning tasks similar to those of PhD or MBA professionals. Supported by the NVIDIA NeMo Evaluator SDK, ProfBench contains over 7,000 response-criterion pairs designed by experts to assess models on three key dimensions: data extraction, reasoning, and style. The benchmark highlights the challenges current AI models face, with top performers like GPT-5-High scoring significantly lower than human experts, particularly in domains such as Physics. By providing a robust, rubric-based evaluation framework, ProfBench seeks to advance the development of AI systems capable of tackling real-world professional challenges, serving as a critical tool for both the open-source community and enterprise users.