Home / Companies / Clarifai / Blog / Post Details
Content Deep Dive

Choosing the Right Models for Vision, OCR and Language Tasks

Blog post from Clarifai

Post Details
Company
Date Published
Author
Clarifai
Word Count
1,917
Language
English
Hacker News Points
-
Summary

Clarifai's platform has undergone significant changes, moving away from older, task-specific models to embrace modern large language and vision-language models that handle multiple tasks within a single family, offering improved stability and performance across diverse inputs. Legacy models are being deprecated in favor of these newer, more capable models, with compute orchestration managing scheduling and resource allocation for seamless operation across both open-source and custom deployments. The updated platform supports core tasks such as visual classification, recognition, moderation, OCR, and NLP, leveraging models like MiniCPM, Qwen, and MM-Poly for efficient and reliable outcomes. Users can access these models via Clarifai’s OpenAI-compatible API, utilize them in the Playground, or deploy their own custom models using Compute Orchestration, which offers flexibility across various cloud environments. With a focus on zero-shot and instruction-tuned classification, the platform enables streamlined workflows, reducing the need for dedicated training while supporting multilingual and complex document tasks.