Basic concepts of speech recognition

Speech is a complex phenomenon. People rarely understand how is it produced and perceived. The naive perception is often that speech is built with words and each word consists of phones. The reality is unfortunately very different. Speech is a dynamic process without clearly distinguished parts. It’s always useful to get a sound editor and look into the recording of the speech and listen to it. Here is for example the speech recording in an audio editor.

All modern descriptions of speech are to some degree probabilistic. That means that there are no certain boundaries between units, or between words. Speech to text translation and other applications of speech are never 100% correct. That idea is rather unusual for software developers, who usually work with deterministic systems. And it creates a lot of issues specific only to speech technology.

career-form
close slider

    Get in touch