OpenAI's o1-preview model has demonstrated significant advancements in language model reasoning capabilities, but it still produces incorrect responses, or "hallucinates." The Trustworthy Language Model (TLM), designed to evaluate and enhance response accuracy, can detect and reduce the rate of these erroneous outputs by over 20% when used with o1 as the base model. Benchmarks conducted on datasets like TriviaQA, SVAMP, and PII Detection reveal TLM's ability to improve accuracy and detect errors by scoring the trustworthiness of responses, allowing for more reliable AI workflows. In particular, TLM enhances the accuracy of o1-preview across these datasets, making it a valuable tool for trustworthy AI applications, including human-in-the-loop processes, by identifying when LLM responses may be unreliable and need human oversight.