YouTube Q&A Chatbot with OpenAI Whisper, Embeddings, ChatGPT & Pinecone

YouTube Q&A Chatbot with OpenAI Whisper, Embeddings, ChatGPT & Pinecone


In today's world, chatbots have become an essential tool for businesses to interact with their customers. With the advancements in Natural Language Processing (NLP) technology, chatbots are becoming smarter and more efficient. OpenAI, a leading AI research organization, has developed a powerful suite of tools for NLP tasks, including text embeddings and transcription services through their Whisper API.

In this blog, we will explore OpenAi Whisper and its use cases and create a YouTube question-answering chatbot that is integrated with OpenAI's Whisper API for accurate and efficient transcription of audio and video content. We will also leverage OpenAI's text embedding model, ChatGPT, to understand user input and generate relevant responses. Finally, we will use a vector database to find semantically matching content for the given user request.

By the end of this blog, you will have a deep understanding of how to leverage OpenAI's tools to create a powerful chatbot that can answer your queries in a natural and seamless way. So, let's dive in and see how we can create an intelligent chatbot from scratch.

Unboxing OpenAi Whisper

OpenAI has recently unveiled its latest breakthrough in automatic speech recognition (ASR) technology - Whisper. This system is trained on a massive 680,000 hours of multilingual and multitask supervised data sourced from the web, making it a highly accurate and robust transcription tool.

One of the key benefits of Whisper is its ability to transcribe audio with 50% fewer errors than previous models. It is designed to be highly robust to accents, background noise, and technical language, making it an ideal tool for a variety of applications.

Whisper also offers impressive multilingual capabilities, enabling transcription in 99 different languages, as well as translation from those languages into English. It also supports full punctuation, making it even easier to use for various applications.

OpenAi Whisper Use Cases

OpenAI’s Whisper API allows transcription service providers to transcribe audio and video content in multiple languages accurately and efficiently. The API's advanced machine-learning algorithms enable it to transcribe the audio in near real-time, ensuring faster turnaround times.

Another advantage of using the Whisper API is its support for multiple file formats, including MP3, WAV, and FLAC. This feature provides greater flexibility, allowing transcription service providers to work with a wide range of audio and video files.

So now the time has come to do some hands-on coding by exploring a few use-cases of Whisper.

Use Case 1: Subtitle Generation

Step 1

pip install openai-whisper

Install OpenAi Whisper library

Step 2

import whisper

Install the libraries

Step 3

model = whisper.load_model("base")
result = model.transcribe("provide path of your audio file")

Load the base model from the whisper library and store it in a variable. Then pass your audio file to the whisper model and it will generate subtitles for the audio using its transcribe method and finally print the transcribed text.

Use Case 2: Audio Language Translation

Whisper is able to perform voice translation from a vast set of languages to English.

Follow Steps 1 and 2 of Whisper Use Case 1

Step 3

model = whisper.load_model("base")
result = model.transcribe("provide path of your audio file", task = 'translate')

Load the 'base' model from the whisper library and store it in a variable. Then pass the audio path and task = 'translate' as parameters to the whisper transcribe method and store the returned object in a variable and finally print the translated audio text.

Let's Create A Youtube Question Answering Chatbot

With this app, we can interact with youtube videos in the form of a Q/A chatbot. First, we will generate subtitles of our desired youtube videos and then we will create the embeddings and we will store the embeddings in a vector database(In this case Pinecone). Finally, we will be using Streamlit to create the interface of our chatbot app.

Install The Requirements


Fork the repository from GitHub -
Open the folder in your local machine and execute the command in your terminal
pip install -r requirements
This will install all the libraries required to execute the project.

Create a .env file To Initialize API Keys

openai_key = "Enter your openai key"
pinecone_key = "Provide your pinecone key"

Go to and get your API key.

Retrieving Content From Youtube Videos

Import The Libraries

# file name
import openai
import tempfile
import numpy as np
import pandas as pd
from pytube import YouTube, Search
import os

Initialize a dictionary to store the data extracted from youtube videos using pytube and whisper libraries

# file name
openai.api_key = os.getenv("openai_key")

video_dict = {
    "url": [],
    "title": [],
    "content": []

Extracting Audio Content And Transcribing It Using Whisper

# file name
def video_to_audio(video_URL):
    # Get the video
    video = YouTube(video_URL)
        video_dict["title"].append("Title not found")

    # Convert video to Audio
    audio = video.streams.filter(only_audio=True).first()

    temp_dir = tempfile.mkdtemp()
    variable = np.random.randint(1111, 1111111)
    file_name = f'recording{variable}.mp3'
    temp_path = os.path.join(temp_dir, file_name)
    # audio_in = AudioSegment.from_file(, format="m4a")
    # with open(temp_path, "wb") as f:
    #     f.write(uploaded_file.getvalue())

    # Save to destination
    output =

    audio_file = open(output, "rb")
    textt = openai.Audio.translate("whisper-1", audio_file)["text"]

    return textt

video_to_audio: The video_to_audio method takes a youtube URL as an argument. Then we create an object for the Youtube class and name it a video and then we extract the attributes of the video like title, etc.
Next, we convert the video to audio using the below code

audio = video.streams.filter(only_audio=True).first()

and create a temporary directory to store the audio file.

Finally, we extract the subtitles from the audio using openai 'whisper-1" model and save it in a text variable and return the variable.

Saving The Data To A Dataframe

def create_dataframe(data):
    df = pd.DataFrame(data)

s = Search("Youtube video title")

for ele in s.results[0:5:1]:
    transcription = video_to_audio(ele.watch_url)


print("Created Dataframe")

create_dataframe: This method takes a dictionary as an argument and converts that dictionary into a pandas data frame and then stores it in a CSV file.

We search for youtube videos using the Search method. Then we loop through each of the videos and generate its subtitles and store them in a dictionary that we had initialized earlier and store all the data in a CSV file.

Transferring Youtube Content To Pinecone

Import The Libraries

import pinecone
import pandas as pd
import openai
import os

Pinecone is a cloud-native vector database that allows us to build high-performance vector search applications. If you are interested to learn more about Pinecone and its use cases then refer to this link -

Instantiate Pinecone Index

pinecone.init(api_key=os.getenv("pinecone_key"), environment="us-east-1-aws")


index = pinecone.Index("demo-youtube-app")

Get Embeddings From The Context

def get_embedding(text):
    response = openai.Embedding.create(

    return response['data'][0]['embedding']

An embedding is a numerical vector representation of words or phrases in a high-dimensional space. It captures the context and meaning of words by analyzing large amounts of text data using machine learning algorithms.

If you want to know more about embeddings and their use cases then refer to our blog

Save The Embeddings To Pinecone Database

def addData(index,url, title,context):
    my_id = index.describe_index_stats()['total_vector_count']

    chunkInfo = (str(my_id),
                 {'video_url': url, 'title':title,'context':context})


The arguments passed to the method are -
index: pinecone index where we will upsert our data.
url : Youtube video url
title: Youtube video title, we will pass this as metadata so that when we query our database, we can also look into the title and URL of the video to confirm the accuracy of the generated answer
context: Subtitles generated from the video

Finally, with the help of the upsert method, we append data to the vector database.

Answering User Queries From Stored Youtube Content

Import The Libraries

import streamlit as st
import openai
from streamlit_chat import message
import pinecone
import os
import pinecone_utils

Find Context Semantically Similar To User Query

def find_top_match(query, k):
    query_em = pinecone_utils.get_embedding(query)
    result = index.query(query_em, top_k=k, includeMetadata=True)

    return [result['matches'][i]['metadata']['video_url'] for i in range(k)], [result['matches'][i]['metadata']['title']
                                                                               for i in range(k)], [
               for i in range(k)]

With this method, we are able to find the most related video based on the input query.
The method takes two parameters - query which is the user query and k which specifies the number of top results to return.

The variable query_em stores the vector representation of the query generated by the openai embedding model.

The index.query() is called to query the vector database. top_k parameter is used to specify the number of most relevant results to return and the includeMetadata parameter specifies whether to include any metadata associated with the search results.

Integrate ChatGPT To Provide Answers

def get_message_history(contexts):

    message_hist = [
        {"role": "system",
         "content": """As a Bot, it's important to show empathy and understanding when answering questions.You are a smart AI who have to answer the question only from the provided context If you 
     are unable to understand the question and need more clarity then your response should be 'Could you please be 
     more specific?'. If you are unable to find the answer from the given context then your response should be 'Answer is not present in the provided video' \n"""},
        {"role": "system", "content": contexts},

    return message_hist

def chat(user_query, message, role="user"):
    message_history.append({"role": role, "content": f"{var}"})
    completion = openai.ChatCompletion.create(
    reply = completion.choices[0].message.content
    message_history.append({"role": "assistant", "content": f"{reply}"})
    return reply

get_message_history sets up the initial conversation between the user and the AI tutor.
We pass the most similar context passed by the vector database to this method to ensure that answers are only generated from the provided context.

chat takes the message, user_query, and the role as input, makes an API call to chatGPT, and returns the generated response.

Process User Query And Generate Bot Response

# container for chat history
response_container = st.container()
# container for text box
textcontainer = st.container()

with textcontainer:
    user_input = get_text()

    if st.session_state.past or user_input:
        urls, title, context = find_top_match(user_input, 1)
        message_history = get_message_history(context[0])

        with st.spinner("Generating the answer..."):
            response = chat(user_input, message_history)



        link_expander = st.expander("Context obtained from url")

First, the user asks a query and that query gets stored in the use_input variable.

Next, we pass the user query to the find_top_match() method which queries the vector database and provides the document with the highest semantic score.

Then we pass this document and query to chatGPT and its answers to our queries based on the context provided to it.

ChatBot In Action

Congratulations, you have made your AI chatbot with semantic search functionalities.

Now it is time to execute your chatbot

Execute the following command in your terminal

streamlit run


In conclusion, building a YouTube Q/A chatbot with OpenAI Whisper, Pinecone vector database, ChatGPT, and OpenAI embeddings was an exciting project that showcases the potential of cutting-edge technologies in natural language processing. By integrating these tools and technologies, we were able to create a chatbot that accurately understands and responds to user queries in real time. Furthermore, with the addition of Streamlit as a frontend, we were able to create an engaging and interactive user interface that enhances the overall user experience. This project is just the beginning of what is possible with advanced NLP techniques and AI technologies, and we look forward to seeing how these innovations will continue to transform the world of conversational AI.

If you are more interested to learn about embeddings and their use cases, then be sure to check out the below YouTube tutorial.

To know more about Vector Database and its applications, refer to the following video

Also, check the video demonstrations of a chatbot integrated with ChatGPT

To learn about more interesting and cool applications of LLMs look into our other Blogs and YouTube channel.

Also, want to learn about the state-of-the-art stuff in AI? Don't forget to subscribe to AI Demos. A place to learn about the latest and cutting-edge tools in AI!