Home / Companies / Deepgram / Blog / Post Details
Content Deep Dive

What Is Spoken Language Understanding (SLU)?

Blog post from Deepgram

Post Details
Company
Date Published
Author
Bridget McGillivray
Word Count
2,369
Language
English
Hacker News Points
-
Summary

Spoken Language Understanding (SLU) is a method for processing speech to extract structured meanings such as intents, slots, and domain classifications directly from audio, enabling applications to comprehend user desires rather than just transcribe their words. The text discusses the choice between cascade STT→NLU architectures and end-to-end SLU systems, highlighting that the decision impacts error propagation, latency, and training data requirements. Modern STT APIs, like Deepgram Nova-3, have advanced to offer low-latency and high-accuracy transcriptions, allowing cascade architectures to compete with end-to-end systems while maintaining modularity. A key determinant for selecting between these architectures is the Word Error Rate (WER) on production audio, with cascade architectures being effective when WER is below 5–8% and end-to-end systems being preferable when WER exceeds this range. Cascade systems offer flexibility and modularity, allowing independent component upgrades and compliance with transcript preservation needs, whereas end-to-end systems require less training data but are less adaptable to changes. The text emphasizes the importance of measuring WER on actual production samples to make informed decisions tailored to specific application requirements.