Under construction…
For many years, the only speech-to-text applications available were costly proprietary apps such as Dragon Speech-to-Text software. Speech recognition software is now available in several apps, both on- and offline, and most of these services are free.
In this tutorial:
Required knowledge:
1. Terminology
Speech recognition (or voice recognition) software is any software able to recognise human speech. This can be used to operate software and or a device.
Speech-to-Text software types out the words that the user speaks into the microphone. Many businesspeople dictate to a typist when they need something typed out. This person would either sit with a typist who would type while they dictated (spoke) or make a recording using a dictaphone which the typist could transcribe (type out). If you watch one of the legal series or movies on TV, you will have seen the person in court who types out everything as it is spoken as a legal record.
Transcribe software is like Speech-to-Text software, but an audio file is uploaded to the software, and it returns the text.
2. Speech recognition
The following software can recognise and respond to spoken commands:
- Siri, Apple’s virtual personal assistant
- Cortana, Microsoft’s personal assistant included in Windows 10.
- Google Voice Search allows users to use Google Search by speaking.
- Google Assistant, a virtual assistant.
- Alexa, Amazon’s personal assistant.
3. Speech-to-Text
Software capable of speech recognition that outputs a text version of the spoken words.
3.1 Word Dictate
3.2 Google Docs Voice typing
Open a Google Doc. Select the Tools menu and then select the Voice typing option. You will need to grant permission for the browser to access your microphone.
The words you speak into the microphone will be typed in the Doc.
4. Transcribe
4.1 Word Transcribe
Upload a file with audio for the text from the audio to be returned.
4.2 Cloud Speech-to-Text
Further research required…
When you select the option to upload a file, the following options are available:
*.opus, *.flac, *.webm, *.weba, *.wav, *.ogg, *.m4a, *.oga, *.mid, *.mp3, *.aiff, *.wma, *.au, *.ogm, *.wmv, *.mpg, *.webm, *.ogv, *.mov, *.asx, *.mpeg, *.mp4, *.m4v, *.avi
References:
- Google Assistant (2023) Wikipedia. Wikimedia Foundation. Available at: https://en.wikipedia.org/wiki/Google_Assistant (Accessed: 12 November 2023).
- Google Voice Search (2023) Wikipedia. Wikimedia Foundation. Available at: https://en.wikipedia.org/wiki/Google_Voice_Search (Accessed: 12 November 2023).
- Cortana (virtual assistant) (2023) Wikipedia. Wikimedia Foundation. Available at: https://en.wikipedia.org/wiki/Cortana_(virtual_assistant) (Accessed: 12 November 2023).
- Speech-to-Text: Automatic Speech Recognition | Google Cloud (no date) Google. Google. Available at: https://cloud.google.com/speech-to-text/ (Accessed: 12 November 2023).