We've built a video conferencing system that can manage multiple speakers in a room, using artificial intelligence to separate individual speakers' utterances and identify them uniquely, allowing for accurate meeting notes and transcripts, even when multiple speakers are present, and providing a useful and friendly video application. The system uses the Vonage Video API to capture raw audio streams from live video sessions and sends them to Deepgram for real-time processing of audio streams, employing the power of AI to manage room systems as well as hybrid video scenarios, allowing for added value like meeting notes and transcripts by separating out who is talking from a single audio source. The system can handle traditional SIP room conferencing systems and has been implemented using a Vonage Video API and Deepgram account, with the Audio Connector providing secure access to raw audio on the server to ensure fast processing times.