GPT 4.5 Released: Here Are the Benchmarks
Blog post from Helicone
GPT-4.5, OpenAI's latest model, emphasizes conversational abilities and emotional intelligence over reasoning power, marking it as their largest and most knowledgeable model to date. It excels in generating factual content and natural conversations, with improved emotional intelligence and reduced hallucination rates, but struggles with complex problem-solving tasks. While it outperforms previous models in coding tasks like the SWE-Lancer benchmark, it remains expensive, costing significantly more than alternatives like GPT-4o and Claude 3.7 Sonnet. Despite its strengths, the release has received a mixed reception due to its limitations, such as basic reasoning errors exemplified by the "strawberry test." The model is gradually being rolled out to users, with implications that future OpenAI models will incorporate stronger reasoning capabilities alongside the conversational improvements seen in GPT-4.5.