The Trustworthy Language Model (TLM) is an AI system that integrates a trustworthiness score into responses from large language models (LLMs) to improve reliability, now compatible with models from OpenAI and Anthropic such as GPT-4o and Claude 3 Haiku. The article presents comprehensive benchmarks evaluating TLM's hallucination detection performance against other strategies like Self-Eval and Probability across various datasets, including TriviaQA and ARC. Results demonstrate TLM's superior ability to detect erroneous LLM responses with higher precision and recall, making it an effective tool for ensuring accuracy in AI applications, especially when abstaining from low-confidence responses in human-in-the-loop workflows. The TLM framework provides universal uncertainty quantification, outperforming other methods that assess limited model uncertainty, thus offering a viable solution for mitigating AI hallucinations and enhancing trustworthiness across different LLM models.