DESIGNED & ENGINEERED IN LONDON, UK

© 2019 Creoqode

Voice recognition, alternatively referred to as speech recognition, is the ability of a computer-based device to decode the human voice or specific sounds. Voice recognition is commonly used to operate a device, perform commands, or write without having to use a keyboard, mouse, or press any buttons.

To convert speech to a computer command, a computer has to go through several complex steps. When you speak, you create vibrations in the air. The analog-to-digital converter (ADC) translates this analog wave into digital data that the computer can understand. To do this, it samples, or digitizes, the sound by taking precise measurements of the wave at frequent intervals. The system filters the digitized sound to remove unwanted noise, and sometimes to separate it into different bands of frequency (frequency is the wavelength of the sound waves, heard by humans as differences in pitch). It also normalizes the sound, or adjusts it to a constant volume level. 

Next the signal is divided into small segments as short as a few hundredths of a second, or even thousandths in the case of plosive consonant sounds -- consonant stops produced by obstructing airflow in the vocal tract -- like "p" or "t." The program then matches these segments to known phonemes in the appropriate language. A phoneme is the smallest element of a language; a representation of the sounds we make and put together to form meaningful expressions. There are roughly 40 phonemes in the English language, while other languages have more or fewer phonemes.

The next step seems simple, but it is actually the most difficult to accomplish and is the is focus of most speech recognition research. The program examines phonemes in the context of the other phonemes around them. It runs the contextual phoneme plot through a complex statistical model and compares them to a large library of known words, phrases and sentences. The program then determines what the user was probably saying and issues a computer command.

Below are some examples of voice recognition systems:

Speaker dependent system - The voice recognition requires training before it can be used, which requires you to read a series of words and phrases.

Speaker independent system - The voice recognition software recognizes most users voices with no training.

Discrete speech recognition - The user must pause between each word so that the speech recognition can identify each separate word.

Continuous speech recognition - The voice recognition can understand a normal rate of speaking.

Natural language - The speech recognition not only can understand the voice but also return answers to questions or other queries that are being asked.

The method we will mostly be using on Nova (unless a smartphone or PC is used) with add-on voice recognition modules will be "Speaker Dependent System" and "Discrete Speech Recognition". Voice recognition modules record and save your own words/commands. When you repeat the same words with a similar tonation and speed, the module is able to match the sound data with the previously saved one. Depending on their similarity, Nova can then act according to the commands it is programmed to.

Next chapter explains in detail how you can control Nova with voice commands.