How to quickly create a multilingual audio transcription service for IP telephony

In corporate VoIP systems, calls are routed to voicemail, where the caller leaves a message. The audio message needs to be converted into text and sent to an email along with the audio file. This is a common scenario, but how can we automatically transcribe when callers speak in different languages? In the past, this was a challenge, and callers had to navigate through an annoying IVR with typical prompts like, “For English, press ‘1’…”. Now, a powerful neural network from OpenAI handles it all.

Solution: For high-quality multilingual transcription, we use the Whisper-1 audio model.

Steps:

Develop a REST API service that receives a POST request with the caller’s phone number and a link to the call recording.
This file is then passed to OpenAI, and we receive the transcription.
The resulting text is sent via email along with the audio file.

Below is a simplified code to explain the process:

import os import requests from flask import Flask, request, jsonify import openai from pydub import AudioSegment import smtplib from email.mime.multipart import MIMEMultipart from email.mime.text import MIMEText from email.mime.base import MIMEBase from email import encoders import logging app = Flask(__name__) openai.api_key = os.getenv("OPENAI_API_KEY") UPLOAD_FOLDER = 'uploads' os.makedirs(UPLOAD_FOLDER, exist_ok=True) logging.basicConfig(level=logging.DEBUG) @app.route('/transcribe', methods=['POST']) #Transcribation def transcribe_audio(): data = request.get_json() if not data or not data.get('recordingUrl') or not data.get('callerNumber'): return jsonify({'error': 'Missing recording URL or callerNumber'}), 400 recording_url, caller_number = data['recordingUrl'], data['callerNumber'] file_path = os.path.join(UPLOAD_FOLDER, "audio.wav") try: # Upload file print(recording_url) response = requests.get(recording_url) if response.status_code == 200: with open(file_path, 'wb') as f: f.write(response.content) else: return jsonify({'error': 'Failed to download recording'}), 500 with open(file_path, 'rb') as audio_file: transcription = openai.Audio.transcribe(model="whisper-1", file=audio_file)['text'] except Exception as e: return jsonify({'error': f'Error during processing: {str(e)}'}), 500 print(transcription) send_email(transcription, caller_number, file_path) return jsonify({'message': 'Transcription completed', 'transcription': transcription}), 200 def send_email(transcription_text, caller_number, attachment_path): msg = MIMEMultipart() msg['From'] = "info@sender.com" msg['To'] = ", ".join(["mail@mydomain.com"]) msg['Subject'] = f"Incoming Voicemail from {caller_number}" msg.attach(MIMEText(f"Phone number: {caller_number}nnText: {transcription_text}", 'plain')) with open(attachment_path, 'rb') as attachment: part = MIMEBase('application', 'octet-stream') part.set_payload(attachment.read()) encoders.encode_base64(part) part.add_header('Content-Disposition', f'attachment; filename={os.path.basename(attachment_path)}') msg.attach(part) try: server = smtplib.SMTP_SSL('sender.com', 465) server.login("info@sender.com", "password") server.sendmail(msg['From'], msg['To'].split(", "), msg.as_string()) server.quit() except Exception as e: logging.error(f"Failed to send email: {e}") if __name__ == '__main__': app.run(host='0.0.0.0', port=9810, debug=True)Code language: Python (python)