How to quickly create a multilingual audio transcription service for IP telephony


In corporate VoIP systems, calls are routed to voicemail, where the caller leaves a message. The audio message needs to be converted into text and sent to an email along with the audio file. This is a common scenario, but how can we automatically transcribe when callers speak in different languages? In the past, this was a challenge, and callers had to navigate through an annoying IVR with typical prompts like, “For English, press ‘1’…”. Now, a powerful neural network from OpenAI handles it all.

Solution: For high-quality multilingual transcription, we use the Whisper-1 audio model.


  1. Develop a REST API service that receives a POST request with the caller’s phone number and a link to the call recording.
  2. This file is then passed to OpenAI, and we receive the transcription.
  3. The resulting text is sent via email along with the audio file.

Below is a simplified code to explain the process:

import os
import requests
from flask import Flask, request, jsonify
import openai
from pydub import AudioSegment
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText

from email.mime.base import MIMEBase
from email import encoders
import logging

app = Flask(__name__)

openai.api_key = os.getenv("OPENAI_API_KEY")

UPLOAD_FOLDER = 'uploads'
os.makedirs(UPLOAD_FOLDER, exist_ok=True)


@app.route('/transcribe', methods=['POST'])

def transcribe_audio():
    data = request.get_json()

    if not data or not data.get('recordingUrl') or not data.get('callerNumber'):
        return jsonify({'error': 'Missing recording URL or callerNumber'}), 400

    recording_url, caller_number = data['recordingUrl'], data['callerNumber']
    file_path = os.path.join(UPLOAD_FOLDER, "audio.wav")

        # Upload file
        response = requests.get(recording_url)
        if response.status_code == 200:
            with open(file_path, 'wb') as f:
            return jsonify({'error': 'Failed to download recording'}), 500

        with open(file_path, 'rb') as audio_file:
            transcription = openai.Audio.transcribe(model="whisper-1", file=audio_file)['text']

    except Exception as e:
        return jsonify({'error': f'Error during processing: {str(e)}'}), 500
    send_email(transcription, caller_number, file_path)
    return jsonify({'message': 'Transcription completed', 'transcription': transcription}), 200

def send_email(transcription_text, caller_number, attachment_path):
    msg = MIMEMultipart()
    msg['From'] = "info@sender.com"
    msg['To'] = ", ".join(["mail@mydomain.com"])
    msg['Subject'] = f"Incoming Voicemail from {caller_number}"
    msg.attach(MIMEText(f"Phone number: {caller_number}nnText: {transcription_text}", 'plain'))
    with open(attachment_path, 'rb') as attachment:
        part = MIMEBase('application', 'octet-stream')
        part.add_header('Content-Disposition', f'attachment; filename={os.path.basename(attachment_path)}')

        server = smtplib.SMTP_SSL('sender.com', 465)
        server.login("info@sender.com", "password")
        server.sendmail(msg['From'], msg['To'].split(", "), msg.as_string())
    except Exception as e:
        logging.error(f"Failed to send email: {e}")

if __name__ == '__main__':
    app.run(host='', port=9810, debug=True)

Code language: Python (python)

We will conduct the test on two files in different languages (English and Spanish).

We send a POST request and receive a response.

We also receive an email with the phone number, transcription text, and the audio file.

We repeat the request for the Spanish version of the file

That’s all you need to create a quick service. I hope this information was helpful!