Thursday, March 04, 2010

Speech Recognition: Men harder to understand than women

Researchers at Stanford University (USA) and the University of Edinburgh (UK) have tested various automatic speech recognition (ASR) systems and found that in general they have a more difficult time recognizing speech from males than females.  One of the causes mentioned by the researchers was that the men tended to use fillers such as "emm" more often.

I should point out that the tests were conducted using recordings of telephone calls.  This is important because the type of "channel" the audio is being carried over makes a significant difference to a computer-based speech recognition system.

The work was sponsored by the US Office of Naval Research.

For a quick overview of the research, you can read the BBC (center-left news media, UK) article here.



Allison said...

I know the article mentioned how pitch could affect the ability to recognize speech, but couldn't that also be why the computer has a hard time between phrases like "I see him" vs. "I see them?" People tend to trail off at the end of sentences and for men, that would lower pitch as well. Would the computer have difficulty recognizing the difference between "him" and "them" if they were in the middle of a sentence?

Keith said...

Hi, Allison!
Yes, I believe that you are correct.

In addition to the pitch change you pointed out, the "falling off" at the end of the word also implies a decrease in amplitude and (often) how clearly the ending sound is annunciated. Lowering the amplitude makes it less likely the microphone will catch the sound distinctly and the computer algorithm will detect and recognize it. Same goes for a mumbled ending.

The computer tracks pitch (and corresponding harmonic structure), speed, and variation over frequency (presumably what the article means by "tone") as large components in its recognition procedure. This applies to humans too, of course!

Good to hear from you again. I hope you are well.