Speech recognition (or Speech To Text) is still far from perfect. However, the SpeechRecognition library provides an easy way to interact with many speech-to-text APIs. In this post, we will show how to use the Python SpeechRecognition library to easily start converting the spoken language in our audio files to text.
Speech To Text with SpeechRecognition
SpeechRecognition is a library for performing speech recognition, with support for several engines and APIs, online and offline.
Speech recognition engine/API support:
- CMU Sphinx (works offline)
- Google Speech Recognition
- Google Cloud Speech API
- Wit.ai
- Microsoft Azure Speech
- Microsoft Bing Voice Recognition (Deprecated)
- Houndify API
- IBM Speech to Text
- Snowboy Hotword Detection (works offline)
For our example we will use the recognize_google
, however there are also some other choices like recognize_bing()
, recognize_wit()
. The audio .wav
file that we are going to use for this example can be found here. Note that the recognize_google
allows 50 free calls per day.
Example of Speech to Text in Python
# Importing the speech_recognition library import speech_recognition as sr # Create an instance of the Recognizer class recognizer = sr.Recognizer() # Set the energy threshold recognizer.energy_threshold = 300 # Convert audio to AudioFile clean_support_call = sr.AudioFile("staytuned.wav") # Convert AudioFile to AudioData with clean_support_call as source: clean_support_call_audio = recognizer.record(source) # Transcribe AudioData to text text = recognizer.recognize_google(clean_support_call_audio, language="en-US") print(text)
And the output that we get is:
hello everybody today we are going to talk about speech-to-text stay tuned
Speech to Text with Noisy Audio
Sometimes, we have to deal with noisy audio files. We can use the adjust_for_ambient_noise()
function of Recognizer
to negate the background noise. We will use this audio text for our example.
# Importing the speech_recognition library import speech_recognition as sr recognizer = sr.Recognizer() # Convert audio to AudioFile noisy_support_call = sr.AudioFile("2-noisy-support-call.wav") # Record the audio from the noisy support call with noisy_support_call as source: # Adjust the recognizer energy threshold for ambient noise recognizer.adjust_for_ambient_noise(source, duration=0.5) noisy_support_call_audio = recognizer.record(noisy_support_call) # Transcribe the speech from the noisy support call text = recognizer.recognize_google(noisy_support_call_audio, language="en-US") print(text)
And the output that we get is:
hello I'd like to get to help setting up my account please
Discussion
That was a simple reproducible example of how you can easily convert Text-To-Speech. In the following posts, we will give more examples. Feel free to send us your preferences about the new posts.
2 thoughts on “Simple Example of Speech To Text”
Hi, Very good article. I hope you will publish such type of
post. Thank you!
King regards,
Thomassen Cannon
Thank you so much !!