Perhaps I am posting this question in the entire wrong news group. In
that case just ignore it.
Right now I am working on my master thesis on speech recognition with a
"new" kind of neural network (called "Little Linguistic Creature"). With
this network I try to model the ear and the brain, anatomical and functional,
as good as possible.
Collecting information about the anatomical part isn't such a hard task,
but less is known about the way the brain computes speech from the signals
delivered by the ear and the auditory pathway. The ear converts the sound
waves to a frequency spectrum, which is send to the auditory cortex. Speech
is known to be build up by phonemes and phonemes can be identified by their
formants, or even by formant ratios (for speaker independency). The question
which rises now is does the brain computes speech from the enire frequency
spectrum, or does it use just the formants?
Does somebody know the answer to this question (which is summarized as
"are formants biological plausible"), or perhaps a reference of a publication
with a discussion about this subject?
Thanks in advance,
Student Computer Science - Natural Language Processing
University of Twente, The Netherlands
mgrim at cs.utwente.nl
It has been a while since my last English writing, so I hope you
will forgive all the errors I've made.
The question that you pose is a good one, and is an open question, to my
knowledge. Two conflicting sources of information come to mind. The
first is the most recent literature on signal processing for cochlear
implants (see Sandlin's book on digitally programmble hearing aids,
published by Allyn & Bacon for a good summary of this research). The
results seem to be better speech understanding when the signal is
NOT digitally processed by features (formants), but rather, when the
frequency spectrum is maintained.
The second area of research worth looking at is a collection of work
by Remez et al. (sorry, I don't have the references here, at home)
on time-varying sinusoidal speech (TVS). The premise was that synthetic
speech was created with only sinusoidal tracings of formants, so that
the speech sounded very distorted, but intelligibe to some. No other
acoustic cues were available, so the movement of the formants must
be important for speech perception. I may not be explaining this
all that well, as it is late, and I have been on the net too long
tonight, but I'd be happy to search out the references if you desire.
Good luck on your work,