Scribe is presented as the world's most accurate speech-to-text model, capable of transcribing speech across 99 languages with remarkable precision, as evidenced by its superior performance in FLEURS and Common Voice benchmark tests. The model offers features like word-level timestamps, speaker diarization, and audio-event tagging, making it suitable for applications ranging from meeting summaries to movie subtitles. It significantly reduces transcription errors, especially in languages that are typically underserved, outperforming competitors like Gemini 2.0 Flash and Whisper Large V3. Developers can access Scribe through an API for structured JSON transcripts, while creators and businesses can utilize it directly via the ElevenLabs dashboard. The model's development involved contributions from several experts, with plans to release a low-latency version for real-time applications soon.