Making Celebrity Voice

Post Details

Company

Chroma

Date Published

March 6, 2023

Author

-

Word Count

130

Language

English

Hacker News Points

-

Source URL

trychroma.com/blog/voice

Summary

An audio embedding model can be used to analyze an uploaded audio clip of a person's voice by embedding it and comparing it against a dataset of celebrity voices using Chroma, which facilitates scaling from a simple prototype in a Jupyter notebook to a full-fledged deployed application. This process is demonstrated using the VoxCeleb dataset, which comprises 1,251 speakers and 145,265 utterances, each a few seconds long and stored as a WAV file. The initial prototype for identifying celebrity voices involved a few lines of code in a Jupyter notebook, showcasing the simplicity and effectiveness of the approach in both prototype and deployed versions.