Speech to Text | Speech Recognition | Voice Recognition and People with Disabilities

Page site map

What is Speech to Text, Speech Recognition and Voice Recognition?

Background on modeling speech for voice recognition

Speaker Independent versus Speaker Dependent Speech Recognition

How speech to text STT benefits the person who is blind or has low vision

Voice recognition or speech recognition is used in creating cell phone accessibility for blind or low vision cell phone users

How speech to text STT benefits the person who is physically disabled

Voice recognition or speech recognition is used in creating cell phone accessibility for physically disabled cell phone users

Voice Dialing Services

Voice recognition or speech recognition is used in creating computer accessibility for physically disabled users

Computer access with DNS

Digital recorder with DNS

Multi-pairing Bluetooth headset with DNS

Built in speech recognition on computers

Mac computers

PC computers

Voice recognition or speech recognition is used in creating home or residential accessibility for physically disabled users

Environmental Control Units

Custom Search

What is Speech to Text, Speech Recognition and Voice Recognition?

Speech to text is the conversion of spoken words into text by a computer or microprocessor based system. The term speech recognition is the use of a speech to text recognition system that is capable of recognizing general human speech and does not require any training to a specific human voice. The term voice recognition is the use of a speech to text recognition system that implies the need for training to a specific human voice. Voice recognition can be capable of recognizing general human speech as well. When the speech to text recognition system is capable of recognizing general human speech and not necessarily that of a specific person, the tag "speaker independent" is applied. For example, "speaker independent voice recognition". When the speech to text recognition system is not capable of recognizing general human speech and can only recognize that of a specific person to which it is trained, the tag "speaker dependent" is applied. For example, "speaker dependent voice recognition".

Background on Modeling Speech for Voice Recognition

Speech is modeled as energy at a given frequency over a time period. Speech is very difficult to model in developing a speech to text STT voice recognition system. Among the difficulties are:

·         individual speech factors such as the speaker’s particular voice, emotional state, and mood

·         environmental speech factors such as background noise T

·         creation speech factors as there are different ways in which speech is created

o primary vowels - speech and sounds come from mouth with open vocal tract, air forced through vocal cords, frequencies are amplified

o fricatives – speech and sounds come with constant closure of the mouth, high frequency, little formant structure

o stops – airflow and speech stops

The same word, when repeatedly spoken by the same person, will be generated differently and when analyzed by a computer system has different energy levels of a different frequency. So then how does a computer or microprocessor recognize human speech and what a person is actually saying? Essentially it guesses intelligently. Two components are used to develop a successful speech to text STT voice recognition system. The first component is the speech to speech to text STT voice recognition system uses probability models to tell it what is most likely to have been said in the speech. In making a comparison between what the speaker actually says and what the computer thinks is said, probability models allow the speech to speech to text STT voice recognition system to narrow the comparison choices. For example, if you say “I’ll meet you at…”, the speech to text STT voice recognition system will know that most likely the next word will be a time such as “3pm” or a place “home”. The speech to text STT voice recognition system can therefore narrow down its recognition focus to a much smaller number of possible word choices. Another example is if you say a short b sound, the speech to text STT voice recognition system will know that most likely the next sound will not be a “kw” sound. This same probability model is used not only on the sound level and word level but also on the sentence and phrase level. For example, if you say “Hi Kristen…” it is much more likely that you might follow up by saying “How are you?” as opposed to gibberish such as “whaling, walking, whining.” Again, these probability models allow the speech to text STT voice recognition system to focus their comparison on what you are actually saying with a higher speech recognition probability. The second component of the speech to text STT voice recognition system is the use of speech libraries. The speech to text STT voice recognition system employs large libraries and databases where sounds are recorded in all possible backgrounds and conditions by many different human voices. If need be, real-time data can be added to the library to improve its accuracy. This might be in the form of recording and adding the speech data to the library from consumers calls to a call center or recording your voice for specific commands. As computer technology and microprocessor performance has improved, the libraries have grown in the amount of data they contain. The speed with which the speech to text STT voice recognition system can access the speech libraries has also improved. The speech to text STT voice recognition system takes in the human speech, models it as energy at a given frequency over a time period, compares it to large human voice libraries, and uses probability models to narrow the comparison. If the speech libraries are sufficient and the probability models can mimic actual human speech, then there will be very few speech recognition errors.

Speaker Independent versus Speaker Dependent Speech Recognition

The earliest speech to text STT voice recognition installed on cell phones used speaker dependent voice recognition. Before the speech recognition on the phone could be used, the user had to record their voice for each command and each contact. You can imagine that for some people with many contacts, recording their voice for each one was difficult. Even though a recording of the user’s voice for each command and contact was being taken, the speech recognition accuracy was still not that good. This was especially true if you recorded your voice in a quiet background and then tried to issue a voice command on the phone in a noisy environment. Speech recognition software installed on a cell phone or smartphone these days is considered speaker independent voice recognition. In most cases, sensitivity settings can be adjusted if the voice recognition accuracy is poor. In other cases, digits or commands can be trained to the blind or low vision user’s voice to increase voice recognition accuracy. The reason behind this shift in speaker independent voice recognition has been improvements in the speech recognition software (speech models and speech libraries) as well as cell phone capabilities. The benefit is that in most cases, the cell phone user can start using the speech recognition software to dial contacts without going through the large time it might take to train their voice to all contacts in their contacts menu.

How Speech to text STT benefits the person who is blind or has low vision

Voice recognition or speech recognition is used in creating cell phone accessibility for blind or low vision cell phone users

Speech to text STT voice recognition allows the blind or low vision user to dial phone numbers by name, dial by speaking the specific phone digits, and accessing specific phone information such as battery level, signal strength or coverage. Using just the Speech to text STT feature of voice recognition, a cell phone can be made accessible to a blind or low vision user to a limited degree. The voice recognition allows a blind or low vision user the basic functionality of making and receiving cell phone calls independently.

Even when a smartphone with screen reader software is being used by the blind or low vision user, often the accessibility is supplemented with built in voice recognition on the smartphone. Broader control of the cell phone or smartphone can be achieved by the blind or low vision user with built-in or added-on speech to text STT voice recognition software. Speech recognition software allows more cell phone or smartphone features to be accessed by the blind or low vision user’s voice. By using speech to text STT voice recognition as an input method as opposed to relying solely on the actual cell phone or smartphone keypad or touchpad, the blind or low vision can operate the phone faster, in a more direct manner, bypassing menu options. Usually some combination of speech to text STT voice recognition and operation of the cell phone and smartphone buttons and keypad is used in whichever combination most benefits the blind or low vision user. An example of the benefit of speech to text STT voice recognition even for a smartphone being used with a built-in or third-party screen reader would be in generating a text message response using the blind or low vision user’s voice rather than typing out the message.

How speech to text STT benefits the person who is physically disabled

Voice recognition or speech recognition is used in creating cell phone accessibility for physically disabled cell phone users

Cell phones or smartphones with speech to text STT voice recognition allow the physically disabled to access many of the phone features using only their voice. A Bluetooth headset or accessory is used in combination with a cell phone or smartphone that supports voice dialing over Bluetooth. The term hands free must be defined because the cell phone industry and what a PD user will consider hands free are different. The cell phone industry considers hands free to include having to press a button or switch to access speech or voice recognition on a cell phone. A PD user considers hands free to mean just that, access to the cell phone or smartphone with only voice recognition and without the use of hands. Often either a press of the cell phone or smartphone screen or voice command button OR a press of the Bluetooth headset’s multi-function button MFB is required to access the speech to text STT voice recognition features. IF the physically disabled user has the ability to press these buttons, no further adaptation is needed to access the speech to text STT voice recognition features. IF pressing these buttons on the headset or phone is difficult, a switch adapted headset or speakerphone can be used. Switch adaptation is where an accessible switch is used in combination with a switch adapted headset or speakerphone so that rather than pressing the button on the headset or phone, the accessible switch is pressed with the same result. Finding a Bluetooth headset or speakerphone that lends itself to being switch adapted is very difficult and is getting more so all the time. Bluetooth headsets have been getting smaller, models quickly change, and they are manufactured such that opening the casing is impossible in almost all cases. Bluetooth speakerphones are usually easier to switch adapt as the casing is bigger and more accessible but are becoming rarer as built in car systems and GPS systems predominate the automotive industry. There are also hands free options for those for whom pressing these buttons on the headset or phone is difficult whether that is accomplished by obtaining the right Bluetooth accessory or installing the right voice command software.

Voice Dialing Services

Voice dialing services that use speech to text STT voice recognition to allow the PD user to use their voice to find information, phone numbers and in some cases initiate calls. Examples of these services are Bing 411 and AT&T VoiceDial.

Voice recognition or speech recognition is used in creating computer accessibility for physically disabled users.

Computer access with DNS

DNS is speech recognition software that is installed on a computer. DNS functions to provide both voice command input to the computer and running software programs as well as dictation. The PD user is able to issue commands to control a running software application, control the computer operating system or have their speech transcribed as written text into a document.

Digital recorder with DNS

With a digital recorder, speech to text STT voice recognition can be used to transcribe a lecture or meeting that can be later accessed by the PD user with DNS. Speech to text STT voice recognition allows the PD user to work and study independently without the assistance of a personal note-taker or transcriber.

Multi-pairing Bluetooth headset with DNS

A multi-pairing Bluetooth headset is one that can pair or bond to more than one device simultaneously, i.e. a cell phone or smartphone that supports voice commands and a computer running Dragon Naturally Speaking DNS. With a multi-pairing Bluetooth headset, the PD user can wear one headset to have both computer and cell phone access. Some examples of these types of Bluetooth headsets that give simultaneous access to the computer using DNS (tested and rated by Nuance with Dragon Stars) and a cell phone or smartphone with voice control capability to a PD user:

Manufacturer	Bluetooth Headset
Plantronics	Voyager Pro
Plantronics	Savi Go

If a headset has multi-pairing capability and is not rated by DNS, it can still be used. The microphone performance with DNS will be unknown until tested by the PD user.

Built in speech recognition on computers

Mac

Mac OS X has speech to text STT voice recognition built into the operating system called Speakable Items. Speakable Items is located in the Speech pane of System Preferences. Speakable Items lets the PD user control the Mac computer using their voice instead of the keyboard, mouse or other input device. Speakable Items is speaker independent voice recognition and doesn’t require the PD user to train their voice to their Mac computer to use it. Speakable Items allows the PD user to navigate menus, enter keyboard shortcuts, speak checkbox names, radio button names, list items, and button names, and open, close, control, and switch among applications.

PC

Windows Speech Recognition is integrated into the Windows operating system, Vista and later. Windows Speech Recognition uses speech to text STT voice recognition to allow the PD user to control their computer by saying specific voice commands. Speech Recognition will run programs and interact with the Windows operating system. Windows Speech Recognition can also be used for the dictation of text. Speech Recognition can be used to dictate words into word-processing programs, fill out online forms, or edit text on your PC. Windows Speech Recognition has voice commands for controlling the dictation, operating the mouse and keyboard, working with Windows, and operating programs.

Voice recognition or speech recognition is used in creating home or residential accessibility for physically disabled users.

Environmental Control Units ECUs

An environmental control unit ECU is a computer or microprocessor based system that allows the PD user to control electrical devices in their home or apartment. The ECU can control devices such as telephone, bed, doors, and lights. Speech to text STT voice recognition allows a PD person to operate their ECU and control their environment using their voice. The advantage to speech to text STT voice recognition is that it allows the PD ECU user to directly access the ECU capabilities. Without speech to text STT voice recognition, the PD ECU user would have to access the ECU using an accessible switch. Switch only access to an ECU requires the PD ECU user to wait through all menu options until the desired option is presented. Switch only access to an ECU involves navigating and waiting through multiple menus and is a much slow process than voice controlled access. Original voice controlled ECUs used speaker dependent speech to text STT voice recognition and require the user to train their voice to allowed menu choices. However, newer voice controlled ECUs use speaker independent speech to text STT voice recognition which takes advantage of advances in speech recognition technology available to the market.

BACK TO TOP

Speech to Text | Speech Recognition | Voice Recognition and People with Disabilities

What is Speech to Text, Speech Recognition and Voice Recognition?

Background on Modeling Speech for Voice Recognition

Speaker Independent versus Speaker Dependent Speech Recognition

How Speech to text STT benefits the person who is blind or has low vision

Voice recognition or speech recognition is used in creating cell phone accessibility for blind or low vision cell phone users

How speech to text STT benefits the person who is physically disabled

Voice recognition or speech recognition is used in creating cell phone accessibility for physically disabled cell phone users

Voice Dialing Services

Voice recognition or speech recognition is used in creating computer accessibility for physically disabled users.

Computer access with DNS

Digital recorder with DNS

Multi-pairing Bluetooth headset with DNS

Built in speech recognition on computers

Mac

PC

Voice recognition or speech recognition is used in creating home or residential accessibility for physically disabled users.

Environmental Control Units ECUs