Voice/text AI chatbot help

I’m a newbie to programming in general. I know the very basics. I managed to build a simple ai voice chatbot that I can place on a website. You can open and close the chat window via the chat icon or descriptive tooltip next to it. It’s powered by OpenAI Whisper and GPT 3.5 Turbo. I found this to be a good combo when it comes to the cost. But you can swop out 3.5 turbo for Groq, but I did find the responses sub-optimal compared with 3.5 turbo. The bot works fine in voice and text, but I have encountered a few problems I cannot seem to solve. Firstly when you talk or text the bot and it starts it’s speech recognition and playback of the output.mp3 audio file, you can open and close the chat window via the chat icon or tooltip which enables the buttons. This allows the user to tap the buttons and interact with the chatbot while it’s busy with speech recognition or audio playback. In doing so the bot responds to it’s own speech or doubling up on the user’s text input or speech, layering everything on top of each other and creating an incoherent mess. I managed to sort out the problem when the bot is listening by introducing the isListening variable into the toggleChatContainer function. Can you look at the code, run it and help me fix it so it works as it should. Here’s the index.html which is the frontend and the app.py which is the engine running the backend. I should also highlight that the problem is in the frontend, not the backend. The backend is working as it should. You can give the html file the name of index.html and the python app.py. Run the python script locally and open the index.html file in your browser.

Here’s the index.html frontend:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <link rel="icon" type="image/svg+xml" href="/logo.svg" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Talking AI Assitant</title>
        <style>
        /* Styles for the floating chat icon and chat container */
        /* Style for the floating chat icon */
        #chatIcon {
            position: fixed;
            bottom: 20px;
            right: 20px;
            z-index: 9999;
            background-color: #007bff;
            color: #fff;
            width: 50px;
            height: 50px;
            border-radius: 50%;
            text-align: center;
            line-height: 50px;
            cursor: pointer;
            font-size: 25px; /* Default size for desktop */
        }

        /* Tooltip style */
        #chatTooltip {
            font-size: 14px;
            position: fixed;
            bottom: 25px;
            right: 85px; /* Adjust as needed */
            background-color: #007bff;
            color: #fff;
            padding: 10px;
            border-radius: 5px;
            z-index: 9999;
            width: 220px; /* Fixed width for the tooltip */
            white-space: normal; /* Allow text to wrap */
            display: flex; /* Use flexbox layout */
            justify-content: center; /* Center horizontally */
            align-items: center; /* Center vertically */
            text-align: center; /* Align text within the tooltip */
            cursor: pointer; /* Change cursor to pointer on hover */
        }

        #chatContainer {
            display: none;
            position: fixed;
            bottom: 100px;
            right: 50px;
            z-index: 9999;
            width: 25%;
            max-width: 320px;
            height: 550px;
            overflow-y: auto;
            overflow-x: hidden;
            background-color: white;
            color: black;
            border-radius: 5px;
        }

        @media only screen and (max-width: 768px) {
            #chatIcon {
                width: 40px;
                height: 40px;
                line-height: 38px;
            }
            #chatContainer {
                width: 300px;
                height: 500px;
            }
        }

        /* Style for the text input */
        #textInput {
            margin-top: 20px;
            width: calc(100% - 20px);
            padding: 8px;
            box-sizing: border-box;
            margin-bottom: 10px;
            border: 1px solid #ccc;
            border-radius: 4px;
            resize: vertical;
            overflow-y: auto;
            height: 90px;
            font-size: 14px;
        }

        body {
            overflow-x: hidden;
            margin: 0;
            padding: 0;
        }

        /* Style for the processing indicator */
        #processingIndicator {
            display: none;
            position: fixed;
            top: 20px;
            right: 160px;
            z-index: 9999;
            text-align: center;
            margin-bottom: 10px;
        }

        /* Style for chat entries */
        .chat-entry {
            margin-bottom: 8px;
        }

        .user-text {
            color: black; /* Example color for user's text */
            word-wrap: break-word; /* Allow long words to wrap */
            padding-left: 20px;
            padding-right: 20px;
            font-size: 14px;
        }

        .ai-text {
            color: green; /* Example color for AI's text */
            word-wrap: break-word; /* Allow long words to wrap */
            padding-left: 20px;
            padding-right: 20px;
            font-size: 14px;
        }

        .chat-title {
            padding-left: 20px;
            padding-right: 20px;
            padding-top: 20px;
            padding-bottom: 10px;
            font-weight: bold;
            font-size: 18px;
        }

        .chat-paragraph {
            padding-left: 20px;
            padding-right: 20px;
            font-size: 14px;
        }
        
        #startButtonContainer {
            padding-left: 20px;
            padding-top: 20px;
        }

        #startButton {
          font-size: 14px;
          background-color: #007bff;
          padding: 10px;
          border-radius: 6px;
          color: white;
          transition: background-color 0.3s ease; /* Adding transition for smooth effect */
        }

        #startButton:hover {
          background-color: #0056b3; /* New background color on hover */
        }



        #textInput {
          background-color: white;
        }

        #sendTextButton {
          font-size: 14px;
          background-color: #007bff;
          padding: 10px;
          border-radius: 6px;
          color: white;
          transition: background-color 0.3s ease; /* Adding transition for smooth effect */
          margin-bottom: 20px;
        }

        #sendTextButton:hover {
          background-color: #0056b3; /* New background color on hover */
        }
    </style>
  </head>
  <body>
    <div id="root"></div>
    <script type="module" src="/src/main.jsx"></script>

<!-- Floating chat icon -->
<div id="chatIcon">💬</div>

<!-- Tooltip label -->
<div id="chatTooltip">Click here to speak to the AI assistant</div>

<!-- Chat container -->
<div id="chatContainer" class="shadow p-3 rounded">
    <h5 class="chat-title">Talking AI Assistant</h5>
    <p class="chat-paragraph">Speak to the assistant by clicking on "Start Listening" or type your message below and hit "Send".</p>
    <div id="startButtonContainer">
        <button id="startButton" class="btn btn-primary talk-btn">🎙️ Start Listening</button>
        <textarea id="textInput" placeholder="Type your message..." rows="3"></textarea>
        <button id="sendTextButton" class="btn btn-primary">Send</button>
    </div>
    <div id="echoText" class="mb-3"></div>
</div>

<!-- Processing indicator -->
<div id="processingIndicator">
    <div class="spinner-border text-primary" role="status">
        <span class="sr-only">Loading...</span>
    </div>
    <p>Processing...</p>
</div>
<script>
const chatIcon = document.getElementById('chatIcon');
const chatTooltip = document.getElementById('chatTooltip');
const chatContainer = document.getElementById('chatContainer');
const startButton = document.getElementById('startButton');
const textInput = document.getElementById('textInput');
const echoTextEl = document.getElementById('echoText');
const sendTextButton = document.getElementById('sendTextButton');
const processingIndicator = document.getElementById('processingIndicator');

let recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)();
recognition.lang = 'en-US';
recognition.interimResults = false;
recognition.maxAlternatives = 1;
recognition.continuous = true;
let isListening = false;
let chatOpened = false;
let listeningTimeout = null;
const TIMEOUT_DELAY_MS = 15000; // 15 seconds in milliseconds

document.addEventListener('DOMContentLoaded', startSession);

chatIcon.addEventListener('click', toggleChatContainer);
chatTooltip.addEventListener('click', toggleChatContainer); // Add click listener to chatTooltip
startButton.addEventListener('click', toggleListening);
sendTextButton.addEventListener('click', sendMessage);
textInput.addEventListener('keypress', function(event) {
    if (event.key === 'Enter') {
        sendMessage();
    }
});

function toggleChatContainer() {
    if (isListening) {
        // If speech recognition is active, do nothing
        return;
    }

    chatOpened = !chatOpened;

    if (chatOpened) {
        chatContainer.style.display = 'block';
        // Re-enable buttons when chat container is opened
        startButton.disabled = false;
        sendTextButton.disabled = false;
        if (resetButton) {
            resetButton.disabled = false;
        }
    } else {
        chatContainer.style.display = 'none';
        recognition.stop();
    }
}

function toggleListening() {
    if (isListening) {
        recognition.stop();
    } else {
        recognition.start();
    }
}

function sendMessage() {
    const text = textInput.value.trim();
    if (text) {
        processSpeech(text);
        textInput.value = '';
        
        // Disable buttons after sending message
        startButton.disabled = true;
        sendTextButton.disabled = true;
        resetButton.disabled = true;
    }
}

recognition.onstart = function () {
    isListening = true;
    startButton.disabled = true;
    startButton.textContent = "Listening...";
    sendTextButton.disabled = true;
    resetButton.disabled = true;

    // Set a timeout to stop recognition after TIMEOUT_DELAY_MS milliseconds
    listeningTimeout = setTimeout(() => {
        if (isListening) {
            recognition.stop();
            isListening = false;
            startButton.textContent = "🎙️ Start Listening";
            reEnableButtons();
            resetButton.disabled = false;
            recognition.stop();
            isListening = false;
        }
    }, TIMEOUT_DELAY_MS);
};

recognition.onend = function () {
    isListening = false;
    startButton.textContent = "🎙️ Start Listening";
    clearTimeout(listeningTimeout);
};



recognition.onresult = function (event) {
    clearTimeout(listeningTimeout);

    const currentResultIndex = event.resultIndex;
    for (let i = currentResultIndex; i < event.results.length; ++i) {
        if (event.results[i].isFinal) {
            const transcript = event.results[i][0].transcript.trim();
            startButton.disabled = true;
            sendTextButton.disabled = true;
            resetButton.disabled = true;
            processSpeech(transcript);
            recognition.stop();
            break;
        }
    }
};

recognition.onerror = function (event) {
    console.error('Speech recognition error', event.error);
    clearTimeout(listeningTimeout);
    resetButton.disabled = false;
    reEnableButtons();
};

function startSession() {
    // You can add any initialization logic here if needed
    
    // Create and append the reset button
    const resetButton = document.createElement('button');
    resetButton.id = 'resetButton';
    resetButton.className = 'btn btn-secondary';
    resetButton.textContent = 'Enable Disabled Buttons';
    startButtonContainer.appendChild(resetButton);

    // Add event listener to the reset button
    resetButton.addEventListener('click', function() {
        startButton.disabled = false;
        sendTextButton.disabled = false;
        resetButton.disabled = false;
    });

   resetButton.style.display = 'none'; // Hide the reset button
}

function processSpeech(text) {
    const userEntry = document.createElement('div');
    userEntry.className = 'chat-entry user-text'; // Apply user-text class
    userEntry.innerHTML = `<strong>You:</strong> ${text}`;

    // Prepend the user's chat entry to the top of the chat container
    echoTextEl.insertBefore(userEntry, echoTextEl.firstChild);

    fetch('http://localhost:5000/process-speech', {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json',
        },
        body: JSON.stringify({ text: text }),
    })
    .then(response => {
        if (!response.ok) {
            throw new Error('Network response was not ok');
        }
        return response.json();
    })
    .then(data => {
        const aiText = data.response;
        const aiEntry = document.createElement('div');
        aiEntry.className = 'chat-entry ai-text';
        aiEntry.innerHTML = `<strong>AI:</strong> ${aiText}`;

        // Prepend the AI's chat entry to the top of the chat container
        echoTextEl.insertBefore(aiEntry, echoTextEl.firstChild);

        speak(aiText);
    })
    .catch(error => {
        console.error('Error during AI processing:', error);
        reEnableButtons();
    });
}


function speak(text) {
    processingIndicator.style.display = 'block';

    startButton.disabled = true;
    sendTextButton.disabled = true;
    resetButton.disabled = true;

    fetch('http://localhost:5000/synthesize-speech', {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json',
        },
        body: JSON.stringify({ text: text }),
    })
    .then(response => {
        if (!response.ok) {
            throw new Error('Network response was not ok');
        }
        return response.blob();
    })
    .then(blob => {
        const audioUrl = URL.createObjectURL(blob);
        const audio = new Audio(audioUrl);

        audio.onended = function () {
            processingIndicator.style.display = 'none';
            reEnableButtons();
            recognition.start();
        };

        audio.play();
    })
    .catch(error => {
        console.error('Error during TTS synthesis:', error);
        reEnableButtons();
    });
}

function reEnableButtons() {
    startButton.disabled = false;
    sendTextButton.disabled = false;
}
</script>
  </body>
</html>

Here’s the backend app.py file:

from flask import Flask, request, send_file, jsonify, render_template
from flask_cors import CORS
from openai import OpenAI
from groq import Groq
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
import schedule
import threading
import time
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

app = Flask(__name__)
CORS(app)

# Initialize Groq and OpenAI clients with environment variables
# client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
# openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# or use below. Enter your actual keys directly in between the double quotes.

# Initialize Groq and OpenAI clients with environment variables
client = OpenAI(api_key="")
openai_client = OpenAI(api_key="")

history_messages = []

# Function to save responses to a file
def save_to_file(user_text, ai_response):
    try:
        with open("chat_logs.txt", "a") as file:
            file.write(f"User: {user_text}\n")
            file.write(f"AI: {ai_response}\n\n")
    except Exception as e:
        print("Error writing to file:", e)

# Initial message from the chatbot
INIT_MESSAGE = {
    "role": "assistant",
    "content": """You are a helpful assistant ready to answer any of the user's questions.""",
}
history_messages.append(INIT_MESSAGE)

@app.route('/synthesize-speech', methods=['POST'])
def synthesize_speech():
    data = request.json
    text = data['text']
    sound = "output.mp3"
    
    voice_response = openai_client.audio.speech.create(
        model="tts-1",
        voice="alloy",
        input=text,
    )

    voice_response.stream_to_file(sound)
    
    return send_file(sound, mimetype="audio/mpeg")

@app.route('/process-speech', methods=['POST'])
def process_speech():
    data = request.json
    user_text = data['text']
    history_messages.append({"role": "user", "content": user_text})
    
    completion = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=history_messages,
    )
    ai_response = completion.choices[0].message.content
    history_messages.append({"role": "assistant", "content": ai_response})

    # Save user input and AI response to file
    save_to_file(user_text, ai_response)

    return jsonify({'response': ai_response})

@app.route('/start-speech', methods=['POST'])
def start_speech():
    global history_messages
    history_messages = []  # Reset message history
    history_messages.append(INIT_MESSAGE)  # Re-append the initial message
    return jsonify({'response': 'OK'})

def send_email():
    sender_email = os.getenv("EMAIL_SENDER")
    receiver_email = os.getenv("EMAIL_RECEIVER")
    api_key = os.getenv("EMAIL_API_KEY")

    message = MIMEMultipart()
    message["From"] = sender_email
    message["To"] = receiver_email
    message["Subject"] = "Daily Chat Logs"

    # Read the content of chat_logs.txt
    with open("chat_logs.txt", "r") as file:
        body = file.readlines()  # Read lines to preserve formatting

    # Use HTML formatting for the email body
    html_body = "<html><body>"
    for line in body:
        html_body += f"<p>{line}</p>"  # Wrap each line in a paragraph tag
    html_body += "</body></html>"

    message.attach(MIMEText(html_body, "html"))

    # Send email using SMTP
    try:
        server = smtplib.SMTP('smtp.elasticemail.com', 587)
        server.starttls()
        server.login(sender_email, api_key)
        server.sendmail(sender_email, receiver_email, message.as_string())
        print("Email sent successfully!")
        
        # Clear the chat_logs.txt file after sending email
        with open("chat_logs.txt", "w") as file:
            file.truncate(0)
            print("Chat log file cleared successfully!")
            
    except Exception as e:
        print("Error sending email:", e)
    finally:
        server.quit()

# Schedule email sending once a day
schedule.every().day.at("17:51").do(send_email)  # Adjust the time as needed

# Route for serving the index.html file
@app.route('/')
def index():
    return render_template('index.html')

if __name__ == '__main__':
    # Start the Flask app in a separate thread without the reloader
    threading.Thread(target=app.run, kwargs={'host': '0.0.0.0', 'port': 5000, 'debug': True, 'use_reloader': False}).start()

    # Run the schedule in the main thread
    while True:
        schedule.run_pending()
        time.sleep(1)

Here’s the requirements.txt file for the python file:

aiohttp==3.9.4
aiosignal==1.3.1
alembic==1.13.1
altair==5.3.0
annotated-types==0.6.0
anyio==4.3.0
appdirs==1.4.4
asgiref==3.8.1
attrs==23.2.0
backoff==2.2.1
bcrypt==4.1.2
beautifulsoup4==4.12.3
blinker==1.7.0
Brotli==1.1.0
build==1.2.1
cachetools==5.3.3
certifi==2024.2.2
cffi==1.16.0
charset-normalizer==3.3.2
chroma-hnswlib==0.7.3
chromadb==0.4.24
click==8.1.7
colorama==0.4.6
coloredlogs==15.0.1
contextlib2==21.6.0
crewai==0.28.8
crewai-tools==0.1.7
cryptography==42.0.5
dataclasses-json==0.6.4
decorator==5.1.1
Deprecated==1.2.14
deprecation==2.1.0
distro==1.9.0
docstring-parser==0.15
embedchain==0.1.100
fastapi==0.110.1
filelock==3.13.4
Flask==3.0.3
Flask-Cors==4.0.0
flatbuffers==24.3.25
frozenlist==1.4.1
fsspec==2024.3.1
gitdb==4.0.11
GitPython==3.1.43
google-api-core==2.18.0
google-auth==2.29.0
google-cloud-aiplatform==1.47.0
google-cloud-bigquery==3.20.1
google-cloud-core==2.4.1
google-cloud-resource-manager==1.12.3
google-cloud-storage==2.16.0
google-crc32c==1.5.0
google-resumable-media==2.7.0
googleapis-common-protos==1.63.0
gptcache==0.1.43
greenlet==3.0.3
groq==0.5.0
grpc-google-iam-v1==0.13.0
grpcio==1.62.1
grpcio-status==1.62.1
gTTS==2.5.1
h11==0.14.0
httpcore==1.0.5
httptools==0.6.1
httpx==0.27.0
huggingface-hub==0.22.2
humanfriendly==10.0
idna==3.7
importlib-metadata==7.0.0
importlib_resources==6.4.0
iniconfig==2.0.0
instructor==0.5.2
itsdangerous==2.1.2
Jinja2==3.1.3
jsonpatch==1.33
jsonpointer==2.4
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
kubernetes==29.0.0
lancedb==0.5.7
langchain==0.1.16
langchain-community==0.0.33
langchain-core==0.1.43
langchain-openai==0.0.5
langchain-text-splitters==0.0.1
langsmith==0.1.48
Mako==1.3.3
markdown-it-py==3.0.0
MarkupSafe==2.1.5
marshmallow==3.21.1
mdurl==0.1.2
mmh3==4.1.0
monotonic==1.6
mpmath==1.3.0
multidict==6.0.5
mutagen==1.47.0
mypy-extensions==1.0.0
nodeenv==1.8.0
numpy==1.26.4
oauthlib==3.2.2
onnxruntime==1.17.3
openai==1.17.1
opentelemetry-api==1.24.0
opentelemetry-exporter-otlp-proto-common==1.24.0
opentelemetry-exporter-otlp-proto-grpc==1.24.0
opentelemetry-exporter-otlp-proto-http==1.24.0
opentelemetry-instrumentation==0.45b0
opentelemetry-instrumentation-asgi==0.45b0
opentelemetry-instrumentation-fastapi==0.45b0
opentelemetry-proto==1.24.0
opentelemetry-sdk==1.24.0
opentelemetry-semantic-conventions==0.45b0
opentelemetry-util-http==0.45b0
orjson==3.10.1
outcome==1.3.0.post0
overrides==7.7.0
packaging==23.2
pandas==2.2.2
pillow==10.3.0
pluggy==1.4.0
posthog==3.5.0
proto-plus==1.23.0
protobuf==4.25.3
pulsar-client==3.5.0
py==1.11.0
pyarrow==15.0.2
pyasn1==0.6.0
pyasn1_modules==0.4.0
pycparser==2.22
pycryptodomex==3.20.0
pydantic==2.7.0
pydantic_core==2.18.1
pydeck==0.8.1b0
PyGithub==1.59.1
Pygments==2.17.2
PyJWT==2.8.0
pylance==0.9.18
PyNaCl==1.5.0
pypdf==3.17.4
PyPika==0.48.9
pyproject_hooks==1.0.0
pyreadline3==3.4.1
pyright==1.1.358
pysbd==0.3.4
PySocks==1.7.1
pytest==8.1.1
python-dateutil==2.9.0.post0
python-dotenv==1.0.0
pytube==15.0.0
pytz==2024.1
PyYAML==6.0.1
ratelimiter==1.2.0.post0
referencing==0.34.0
regex==2023.12.25
requests==2.31.0
requests-oauthlib==2.0.0
retry==0.9.2
rich==13.7.1
rpds-py==0.18.0
rsa==4.9
schedule==1.2.1
schema==0.7.5
selenium==4.19.0
semver==3.0.2
setuptools==69.5.1
shapely==2.0.3
six==1.16.0
smmap==5.0.1
sniffio==1.3.1
sortedcontainers==2.4.0
soupsieve==2.5
SQLAlchemy==2.0.29
starlette==0.37.2
streamlit==1.33.0
sympy==1.12
tenacity==8.2.3
text-to-speech==1.6.1
tiktoken==0.5.2
tokenizers==0.15.2
toml==0.10.2
toolz==0.12.1
tornado==6.4
tqdm==4.66.2
trio==0.25.0
trio-websocket==0.11.1
typer==0.9.4
typing-inspect==0.9.0
typing_extensions==4.11.0
tzdata==2024.1
urllib3==2.2.1
uvicorn==0.29.0
watchdog==4.0.0
watchfiles==0.21.0
websocket-client==1.7.0
websockets==12.0
Werkzeug==3.0.2
wrapt==1.16.0
wsproto==1.2.0
yarl==1.9.4
youtube-transcript-api==0.6.2
yt-dlp==2023.12.30
zipp==3.18.1

Hello, can you go into greater detail on how to get this up and running on my comp.
In other words how do you run python locally ,
Do you simply paste it into the console and Html with notepad etc…

Hi,

Thanks to everyone that was trying to help. I’ve managed to sort out everything. I completely re-designed the frontend. I even re-designed the UI to look more appealing. It works perfectly now in the desktop versions of edge and chrome. I’ve also made it responsive in the developer consoles in those browsers emulating most major devices. The only thing left is to deploy and do live testing on mobile devices. Hopefully no issues will arise in the live deployed version. After that I’m going to try to include a backend knowledge base in JSON format for the bot to reference, instead of including all knowledge base information in the prompt.