IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

Formants and speech perception

simonson at ACS.BU.EDU simonson at ACS.BU.EDU
Mon Dec 4 23:06:21 EST 1995

Hi there,

Perhaps I am posting this question in the entire wrong news group. In 
that case just ignore it.

Right now I am working on my master thesis on speech recognition with a
"new" kind of neural network (called "Little Linguistic Creature"). With
this network I try to model the ear and the brain, anatomical and functional, 
as good as possible. 

Collecting information about the anatomical part isn't such a hard task, 
but less is known about the way the brain computes speech from the signals 
delivered by the ear and the auditory pathway. The ear converts the sound 
waves to a frequency spectrum, which is send to the auditory cortex. Speech 
is known to be build up by phonemes and phonemes can be identified by their 
formants, or even by formant ratios (for speaker independency). The question 
which rises now is does the brain computes speech from the enire frequency 
spectrum, or does it use just the formants? 

Does somebody know the answer to this question (which is summarized as 
"are formants biological plausible"), or perhaps a reference of a publication 
with a discussion about this subject?

Thanks in advance,

Martin Grim
Student Computer Science - Natural Language Processing
University of Twente, The Netherlands

mgrim at cs.utwente.nl

It has been a while since my last English writing, so I hope you
will forgive all the errors I've made.

The question that you pose is a good one, and is an open question, to my
knowledge.  Two conflicting sources of information come to mind.  The
first is the most recent literature on signal processing for cochlear
implants (see Sandlin's book on digitally programmble hearing aids, 
published by Allyn & Bacon for a good summary of this research).  The
results seem to be better speech understanding when the signal is 
NOT digitally processed by features (formants), but rather, when the
frequency spectrum is maintained.  

The second area of research worth looking at is a collection of work
by Remez et al. (sorry, I don't have the references here, at home)
on time-varying sinusoidal speech (TVS).  The premise was that synthetic
speech was created with only sinusoidal tracings of formants,  so that
the speech sounded very distorted, but intelligibe to some.  No other
acoustic cues were available, so the movement of the formants must 
be important for speech perception.  I may not be explaining this
all that well, as it is late, and I have been on the net too long 
tonight, but I'd be happy to search out the references if you desire.

Good luck on your work,

Andrea Simonson
Boston University

More information about the Audiolog mailing list

Send comments to us at biosci-help [At] net.bio.net