Coding an AI Voice Bot from Scratch: Real-Time Conversation with Python
Company
AssemblyAI
Date published
March 6, 2024
In this video, I'll show you how to build an AI voicebot in python. It will be able to understand real time audio input and at the same time generate real time audio responses. Here's a scenario where our AI voicebot is working at a dental clinic. Thank you for calling Vancouver Dental clinic. My name is Sandy, how may I assist you? Hi Sandy, my name is Smitha and I would like to book an appointment with the dentist tomorrow. Hello Smitha, I can definitely help you with that. Let me check our schedule for availability. Could you please tell me your preferred time for the appointment tomorrow? Of course. I would like to meet the dentist tomorrow at twelve noon. Great choice, Smitha. I have an opening at noon tomorrow with the dentist. Shall I go ahead and book that appointment for you? That will be perfect. Thanks Sandy. You're welcome. Smitha, I have successfully booked your appointment with the dentist for tomorrow at noon. Please make sure to bring your insurance information. There are four steps involved in building our AI voice bot. First off is installing all of the necessary Python libraries. So that is AssemblyAI, OpenAI, and eleven labs. AssemblyAI is going to be used for accurate, real time speech to text transcription. That means that whatever you're saying in real time is going to be transcribed, and once that is transcribed, that transcript is passed to OpenAI where we'll be generating text responses of how a dental assistant would respond. Once we get that text response from OpenAI, we'll pass that text response to elevenlabs where audio is going to be generated. And that is exactly how our AI voicebot will work. First off, let's start by installing all of the necessary Python libraries. I've went ahead and created a virtual environment in my project folder. Once I've activated that, now I'm going to install all the necessary libraries. So I'm going to start off with port audio. So brew install port audio. And also we are going to do pip install AssemblyAI extras. And then we're going to do pip install eleven labs, followed by brew install MVP. And lastly, we are also going to install OpenAI. All of these commands and the code for this project will be in the description box below. So do check out the GitHub link. First, let's start off by importing the python libraries that we've just downloaded. So let's start off with AssemblyAI. After that, we're going to import eleven labs. Specifically, we're importing the generate function and stream function. And then also let's import OpenAI. Once we've done that, let's create a class called AI assistant. Next, we're going to initialize this class. Most importantly, we need the API keys for all three of these services that we're using. To get an AssemblyAI API key, click on the link in the description box below. Once you've created API keys for all three of these services, you can declare them here. Let's do AssemblyAI settings API key equals to API key and this is where you can enter your AssemblyAI API key. Once you've done that, let's also define the Openei API key. Once you have all three API keys defined, let's go ahead and create an empty transcriber object. After we've done this, let's also create a list containing the full transcripts of everything that we're saying, and also what the AI assistant is saying as well. So let's do self full transcript before the conversation starts, we want full transcript to only include a single thing, which is the prompt that we want to give to OpenAI. So let's start writing that prompt. We'll first have to define the role as system, and once we've done that, we also have to define content. And this is where we write our prompt. So let's write. You are a receptionist at a dental clinic. Be resourceful and efficient. So that's all our prompt will contain. And that is what our full transcript list will contain. This full transcript list is actually really important because every time that we communicate to OpenAI's API, we will be sending a full transcript of whatever has been said by you and also by the voicebot. So it's really important that you follow this specific format. Next, we can move on to step number two, which is real time transcription with AssemblyAI. The first thing we want to do is create a method called start transcription. In start transcription, we will now create a transcriber object and store it into the transcriber variable that we've just created. So self dot transcriber object equals to AssemblyAI real time transcriber. We'll also set the sample rate to 16,000. And we want to define something called natural and silent threshold. And you want to set this to 1000. This defines the time in which the program will actually wait before determining that you have ended a sentence when you're talking in real time. So what this code does is it connects your microphone and streams data to assembly AI's API. Next up, we'll define a method called stop transcription. What this method does is it closes the transcriber and it sets it to none again. Next, we need to define these four methods on data, on error, on open, and on close. These four methods define how the real time transcriber works. So let's head on over to assembly AI's documentation in order to do so. So inside of assembly AI's documentation for real time streaming, we want to look at this first code example. What we want to do is copy this four functions right here, which we need for our code. So go on over and copy this. Once you've done that, we have to make a few changes to the code that we've just pasted. First off, let's change the parameters to import self in each of these methods. Once you've done that, what I want to do is actually comment out this code in on open because I don't want to actually print out anything in terminal besides the transcripts, and instead I'm just going to write return. I'm also going to be doing the same thing for the methods called on error and on closed. We also want to make some changes to the on data method. So the on data method is really important because we actually get to define what we want to do with the real time transcript, which is coming in from AssemblyAI's API. So in the second if statement right here is where we actually receive the final real time transcript. That means that whenever you finish saying a sentence, that entire sentence is actually being printed out or sent to you right here instead of printing it out. What I want to do is send that over to a new method called generate AI response, which we will be defining. And the parameter for this will be the transcript. We are now at step three where we're going to write code to pass the real time transcript to OpenAI's API. We'll start off by writing a function called generate AI response. Here the parameters will be self and transcript. The very first thing that we're going to do in this method is call the stop transcription method. The reason why we're doing that is because we want to pause the real time transcription stream while we are passing and communicating with Openei's API. So let's do self stop transcription, after which we want to now add our real time transcript to our full transcript list. Next, we also want to print out our real time transcript, which the user has just said. Now we're ready to pass this transcript directly to OpenAI's API. It for the model, we're going to be making use of GPT 3.5 turbo and for messages we are going to be passing the full transcript after which, let's define a parameter called AI response. AI response is equals to response choices. What this line of code does is it retrieves the response from OpenAI's API and stores it into AI response. And at this point what we can do is we can go ahead and generate audio. So that's exactly what we'll do. We'll do self generate audio. And this is a method that we now have to go ahead and create and we're going to pass AI response as a parameter. At this point. Once we have generated audio, we can go ahead and restart the real time transcription so you can continue having that conversation. So what we want to do is now call the start transcription function. At this point we're at the last and final step where we'll be generating audio with eleven labs. So we're going to create a method called generate audio and the parameters will be self and text. This text right here is actually the response from OpenAI's API. And the first thing that we want to do is add that into full transcript. Next, we also want to print out this text saying that it is actually from the AI assistant. Next we have to write the code to send a request to Elevenlabs API and we'll be making use of the generate function that we imported at the beginning of this. You for voice I'm going to be selecting Rachel, but there's a bunch of different voices available on eleven labs which you can feel free to browse and select the ones that you want. And I'm also setting the stream parameter to true and I'm going to call the stream function and pass this audio stream. So this is the end of the generate audio method. Next we're actually going to define the start and end of our project. We'll start off by defining the initial greeting that our AI voicebot has to say. So we'll say thank you for calling Vancouver dental clinic. My name is Sandy, how may I assist you? So this is the initial greeting which our AI voicebot will read out to us before starting our real time transcription, passing it to OpenAI, and then generating more audio. Now let us initialize the class AI assistant and the first thing that we want to do is call the generate audio method and pass greeting inside, after which we also want to call the start transcription function. At this point you can hit save and start running this project. Now I can go into terminal and run our python file. Thank you for calling Vancouver Dental clinic. My name is Sandy, how may I assist you? Hi Sandy, I'm Smitha and I'd like to book an appointment with Dr. Lee tomorrow. Hello Smitha, I'm happy to help you with that. Let me check Dr. Lee's availability for tomorrow. Could you please tell me your preferred time for the appointment? I would like to book it at 03:00 p.m.. Tomorrow. I'm sorry, but Dr. Lee is fully booked tomorrow afternoon. However, we do have availability at 10:00 a.m. Or 01:00 p.m. Would any of these times work for you? Smitha? Yes, 01:00 p.m. Actually works for me. Sandy great. I've successfully scheduled your appointment with Dr. Lee for tomorrow at 01:00 p.m. Can I have your phone number to confirm the appointment? Smitha? Yes, my phone number is 123-45-6789 thank thank you Smitha. I have your phone number as 123-45-6789 you will receive a confirmation call or text shortly. If you have any other questions or need further assistance, feel free to let me know. Thank you for choosing Vancouver dental clinic. Check out this next video to learn how to transcribe a live phone call in Python using AssemblyAI and Twilio.