A New Speech Coding Strategy for Cochlear Implant

Transcription

A New Speech Coding Strategy for Cochlear Implant
Journal of Medical and Biological Engineering, 30(5): 335-342
335
Technical Note
A New Speech Coding Strategy for Cochlear Implant
Wei-Dong Wang1,†,*
Hong-Yun Liu1,†
Hu Yuan2
Qing Ang1
1
Department of Biomedical Engineering, General Hospital of Chinese PLA, Beijing 100853, China
2
Department of Otolaryngology, General Hospital of Chinese PLA, Beijing 100853, China
Received 5 Aug 2009; Accepted 1 Feb 2010; doi: 10.5405/jmbe.30.5.10
Abstract
Cochlear implants are widely accepted as the unique and most effective ways for individuals with severe to
profound hearing loss to restore some degree of hearing. Speech coding strategies play an extremely important role in
optimizing the cochlear implant user‟s communicative potential. Various speech coding strategies have been developed
in the past fifty years to simulate the peripheral auditory system as naturally as possible. Most of the strategies are used
to mimic the human cochlea‟s spatial encoding pattern, which stimulates auditory fibers at given frequencies based on
the frequency characterizations of speech. However, these strategies cannot simulate the human cochlea‟s temporal
encoding pattern well. Also, current spreading as well as channel interactions are major problems. This paper presents a
new solution, which generates stimulating pulsatile series at zero-crossings in the domain of wavelet transform, called
wavelet zero-crossings stimulation (WZCS). With amplitude modulation and phase information (zero-crossings)
encoded, the WZCS is aimed at improving the recognition of tonal language speech and speech in multi-talker
backgrounds. WZCS, frequency amplitude modulation encoding (FAME) and continuous interleaved sampling (CIS)
were compared by computer simulation, and hearing test experiment results showed that the recognition of speech
synthesized through WZCS was better in both quiet and noisy environment than CIS strategy. Results of the
experiment also showed significant improvement with WZCS over FAME on tone recognition, both in quiet and noisy
conditions. Further study demonstrated that WZCS could keep the temporal cues (or phase information), and some fine
structure of speech remained in the stimulating pulsatile series. Most important is that the correlations of original
sounds were found to be obviously higher than in signals reconstituted through CIS and FAME strategies. Thus, the
application of WZCS strategy to cochlear implants may be a significant improvement.
Keywords: Stimulating pulsatile series, One-octave wavelet transform, Zero-crossings
1. Introduction
Cochlear implants are widely accepted as the unique and
most effective way for individuals with severe to profound
hearing loss to restore some degree of hearing. Typically, these
medical interventions consist of a microphone, a speech
processor, a transmitter, a receiver, and an electrode array
which is located inside the cochlea. The speech processor is
responsible for decomposing the input audio signal into
different frequency bands or channels and delivering the most
appropriate stimulation pattern to the electrodes [1,2]. Speech
coding strategies play an extremely important role in
optimizing the cochlear implant user‟s communicative
potential, and new and better speech coding strategies have led
to great strides forward during the past 30 years in the
performance and widespread application of cochlear implants.
In a word, remarkable progress has been made in the
† These authors contributed equally to this work
* Corresponding author: Wei-Dong Wang
Tel: +86-010-66936921; Fax: +86-010-66936921
E-mail: [email protected]
development of speech coding strategies and cochlear implants,
but much room still remains for improvements and
enhancements, especially for patients presently at low
recognition in noise and for the problems of current spreading
and channel interactions [1-5].
So far, there are two types of methods to synthesize the
stimulating pulsatile sequences for cochlear implants. One
school of thought, which is based on the vocal model, extracts
fundamental frequency (F0), formants (F1, F2, F3) and other
parameters of speech signal to synthesize corresponding
stimulating pulsatile sequences; this approach is defined as
feature extraction [6-8]. F0/F1, F0/F1/F2, F0/F1/F2/F3 and
Multi-peak are all this kind of scheme. Another type coding,
called filter bank strategy, is based on the hearing model. The
principle of this kind strategy is that speech signal is passed
through a digital band-pass filter bank and then is processed
respectively to generate the stimulating pulsatile sequences.
Filter bank strategy includes spectral maxima sound processor
(SMSP), spectral peak (SPEAK), compressed analog (CA) and
continuous interleaved sampling (CIS), asynchronous
interleaved sampling (AIS), frequency amplitude modulation
encoding (FAME) and so on [8-15]. In cochlear implants,
336
J. Med. Biol. Eng., Vol. 30. No. 5 2010
One-actave
Wavelet
transform
Φ1(t)
Absolute value
LPF
Sampling holder
Modulation Differentiator
A
×
Delay
+
Threshold1
Σ
1/Z
-
+
Σ
LPF
Asynchronous
algorithm
Sampling holder
Modulation
Differentiator
A
Microphone
MIC
Delay
Preemphasis
Φ2(t)
.
.
.
+
Threshold2
Σ
1/Z
-
+
LPF
0.5
-
Sign detector
Absolute value
×
Σ
Amplifier
Asynchronous
algorithm
Sampling holder
+
ThresholdN
Σ
1/Z
-
Cochlear implants simulate the physiological mechanism
of normal hearing, so the hearing induced by electrical
stimulation is different from that of acoustic stimulation. It is
thought that by filtering audio information into
multi-frequency bands and selectively stimulating the different
locations of basilar membrane through electrode implanted in
the cochlea, then sound information can be recognized by the
brain [16-21].
The auditory system has traditionally been viewed as a
frequency analyzer, which provides a faithful spectral
representation of the acoustic waveform for higher-level
processing. Also, studies indicate that the human cochlea is
like a set of band-pass filters with equal relative bandwidth and
regular central frequency distribution; when passed through
the cochlea, the speech signal is wavelet-transformed and
outputs of each band-pass filter activate corresponding
auditory nerve fibers [21]. However, the insulation of
implanted electrodes in cochlear implant systems is not so
.
.
.
×
Σ
0.5
electrode
Amplifier
Figure 1. block diagram of WZCS strategy.
2. Materials and methods
E2
electrode
-
Sign detector
either feature extraction or the filter bank strategy is applied;
the former strategies provide too little while the latter type
provides too much indiscriminate information. Although
cochlear implants system based on these strategies can already
restore partial hearing to the deaf person, there are still great
variations among individuals in their speech-communication
ability. The problems of current spreading, channel interaction
and „fixed stimulating frequency‟ and other issues are not
solved well. This paper presents a new solution, which
generates stimulating pulsatile series at zero-crossings in the
domain of wavelet transform, called wavelet-zero-crossings
stimulation (WZCS). This solution can mitigate the interactions
among channels and preserve the temporal cues and some fine
structures of speech remaining in stimulating pulsatile series,
and with the flexible stimulating rate determined by original
acoustic signal itself, which is not provided by some other
strategies [16,17].
+
D
Modulation
Differentiator
D
EN
A
Delay
ΦN(t)
E1
electrode
- Amplifier
Sign detector
Absolute value
D
0.5
Asynchronous
algorithm
good as hair cells, so it is impossible to eliminate the
interactions between electrodes. To mitigate the interactions
between electrodes, conventional strategies utilize interleaved
sampling pulses for stimulation, which consequently result in
the breakage of temporal cues and some fine structure of
acoustic signal. The CA approach can keep the temporal cues
and fine structure of speech, but the perception is adverse
because of the interaction among electrodes. Other strategies
with fixed-frequency biphasic pulse modulation also destroy
the temporal cues and fine structure of audio signal [20,22].
Combining the amplitude modulation and frequency
modulation appropriately, a strategy based on one-octave
wavelet transform and zero-crossings is proposed in this paper.
The strategy, which preserves temporal cues and some fine
structure of original speech remaining in stimulus signals, is
expected to enhance speech perception in noise, as well as
tonal language recognition. Figure 1 is a flow diagram
representing an acoustic simulation of the WZCS strategy.
In the functional block diagram, input audio signal, which
is captured by a microphone, is pre-emphasized to compensate
the high-frequency components. Then the emphasized signal is
presented into a set of wavelet functions with center frequency
arranged from low to high to implement one-octave wavelet
transform. The outputs of N channels are processed through
two independent parallel pathways to extract the amplitude
envelope and zero-crossings, which includes phase
information in each band. Wavelet function Φ(t) is selected to
implement one-octave continuous wavelet transform. The
one-octave wavelet transform is a linear operation that
decomposes the audio signal into components that appear at
different scales.
At each scale, the amplitude envelope of one-octave
wavelet-transformed acoustic signal is extracted through
full-wave rectification and low-pass filtering. The cut-off
frequency of the low-pass filter determines the slowly varying
rate information preserved in the envelope. In addition, the
sampling holder box keeps the envelope detection and
zero-crossing pulse extraction synchronous.
Speech Processing Strategy for Cochlear Implant
Simultaneously, in another pathway, the output audio
signal of one-octave wavelet transform in each band is
subtracted by a threshold, which is determined by the
characteristics of noise at each scale, to decrease the noise
effect [23]. The sign detector generates positive pulse when the
processed signal passes through baseline from positive to
negative, and it generates negative pulse when the processed
signal passes through baseline from negative to positive.
Delaying the pulse series generated by the sign detector
one-unit time in each band, and zero-crossing pulsatile series
(FM of phase information) are picked out through using the
delayed signal to subtract the un-delayed pulse series.
Sampling hold can keep the envelope signal and zero-crossings
signal synchronous, and the amplifier make the amplitude of
zero-crossings pulses equal 1. Then the stimuli are obtained by
amplitude modulating each band‟s zero-crossings (frequency
and phase information). The stimuli of each band is processed
by an asynchronous algorithm, which utilizes a software
program to detect the pulses of 8 channels at anytime; if there
are two or more channels in which the pulse appears at the
same time, the asynchronous algorithm will make the pulses
stimulate in proper order to guarantee that only one electrode
is stimulated at one time. Finally, the differentiators make the
pulse biphasic to keep charge and current balance, synthesized
speech signal can be obtained by summarizing each sub-band‟s
stimuli.
It is noticeable that the algorithm for generating
simulation pulsatile sequences between WZCS and other
strategies is absolutely different. Though the amplitudes of
activating pulses is determined by the envelope in WZCS, CIS
and SMSP strategies, time order of sequences or stimulating
rate in AIS, CIS and SMSP is artificial and fixed, while in
WZCS, that is dependent on the audio signal itself.
Zero-crossings of the acoustic signal contain some phase
information so the stimulus obtained from the WZCS could
preserve some fine structure of acoustic signal.
It is essential to point out that the analytical one-octave
wavelet used in wavelet transform is diverse, such as Mexican
hat function, Meyer wavelet, Gaussian function and so on. Take
the Meyer wavelet for example; the band-pass characteristics of
Meyer wavelet are shown in Figure 2(a), and the Fourier
transform of Meyer is shown in figure 2(b). What is more
crucial is that the Meyer wavelet is biorthogonal; sequentially, it
can be configured as one-octave function according to the
characteristic of band-pass. With this difference distinguished
from conventional audio signal processing measures based on
filter bank, the output of each channel can be reconstructed
completely according to the zero-crossings of its output signal.
That‟s the famous Logan‟s theorem; we describe in some detail
the theorem because it provides a good understanding of
mathematical issues [24,25]. Let f ( x )  L2 , L2 denotes the
Hilbert space of measurable, square-integrable onedimensional functions, and let us suppose that the Fourier
transform of f(x) has a support included in one-octave intervals.
Logan‟s theorem proves that if f(x) does not share any
zero-crossings with its Hilbert transform, then it is uniquely
characterized by its zero-crossings.
337
(a)
(b)
Figure 2. Meyer wavelet and its spectrum. (a) Temporal characteristic
of Meyer wavelet. (b) Amplitude and frequency
characteristic of Meyer wavelet.
WZCS is a type of strategy that incorporates spatial code
and temporal code from the aspect of hearing physiology while
taking mathematics and signal processing into consideration; it
is as well an approach with multi-resolution in both temporal
and frequency domains. Comparatively, other strategies, taking
CIS for instance, could damage the temporal fine structure of
acoustic signal, in which only the variation of intensity
remains. The frequency resolution of the cochlea is about
30 Hz at 1000 Hz through calculation, but after being
processed by the auditory central nervous system, it can be
evaluated to 3 Hz. These indicate that cochlea not only
processes the audio signal, but also provides excitation
pulsatile series which could be perceived effectively by
auditory central nervous system [26]. So with temporal code
included in the WZCS strategy, the audio signal processed by
one-octave wavelet transform can keep their temporal
characteristics on the basis of zero-crossings.
In order to demonstrate the advantages of WZCS
compared to CIS and FAME strategies, corresponding measure
was taken to compare the characteristics of the stimulus
generated by these strategies. In the procedure of computer
simulation, bandwidth of filters selected in CIS and FAME
was the same as those in WZCS. All of them were 30-60 Hz,
60-120 Hz,
120-240 Hz,
240-480 Hz,
480-960 Hz,
960-1920 Hz, 1920-3840 Hz and 3840-7680 Hz. Figure 3
represents a piece of Chinese speech signal, “da jia hao”.
Figure 3. Original speech signal.
338
J. Med. Biol. Eng., Vol. 30. No. 5 2010
We also conducted a hearing test experiment to resolve the
differences in speech perception among CIS, FAME and WZCS
strategies in quiet and in noisy environment, with the SNR of
processed test materials fixed at 5 dB under the latter
circumstance. Fifteen normal hearing and well-educated
subjects were recruited and then listened test materials through
headphones. Thirty Chinese sentences, 40 Chinese words and
50 tone variations (level tone, rising tone, falling-rising tone
and falling tone) of Chinese characters were processed by CIS,
FAME and WZCS, respectively, and then were presented to the
subjects to test speech recognition [25,26]. All speech test
materials were digitized at a sampling rate of 16 KHz and
stored in a 16-bit format.
All subjects were arranged in a sound-attenuated
laboratory to perform the experiment. In the tone recognition
experiment, a custom graphic user interface was created by
MATLAB, to present 50 Chinese characters with different
tones and collect responses. When a processed Chinese
character with certain tone was presented stochastically, the
subject had to choose an answer which they thought was
correct by clicking the button corresponding to the presented
tone. When one test condition was finished, the percent correct
score was calculated for further statistical analysis. The
synthesized stimulus was presented via a headphone
(HYUNDAI CJC-860A), and the order of all experimental
conditions was randomized for each subject.
In the word and sentence recognition experiment, the
subject was presented with words and sentence, respectively.
The subject was instructed to type in as many words as
possible from the words or sentence through a computer
keyboard. The number of correctly recognized words was
calculated to produce the final recognition rate. All words or
sentences were presented stochastically through the headphone
as well.
Before the experiment, some synthesized speech
materials were provided to the subjects via the headphone for
practice, the procedure lasted about 5 minutes. In noisy
environment, the SNR of synthesized speech materials equaled
5 dB and other conditions were the same with those in quiet.
During the test procedure, guessing was encouraged, but no
feedback was given after the experiment in tone, words and
sentence recognition.
3. Results
The results of computer simulation for CIS, FAME and
WZCS are shown in Figure 4. From 1st to 8th channel‟s stimulus
obtained through CIS, FAME and WZCS strategies are shown in
Figure 4(a), respectively; the envelope of corresponding channel
for both strategies was almost the same, but the stimulating rate
of corresponding channels is very different. Figure 4(b) is the
zoom-in details of Figure 4(a). As shown in Figure 4(b), for CIS
strategy, the 8 channels‟ stimulating rates were the same and
fixed at 900 pps. For FAME, the zero-crossings in each band
were frequency-modulated by the corresponding band‟s center
frequency and then band-limited by using slowly varying FM
component to generate pulses. Thus the stimulating rate of
FAME in each band was limited at about 400 pps [15]. For
WZCS strategy, the stimulating rates changed from about 50 pps
to more than 4000 pps in 8 channels, which were determined by
the all the zero-crossings of wavelet-transformed speech signal
in each sub-band. Because the zero-crossings include the
frequency and phase information of original speech signal, the
stimulus generated by WZCS contained partial fine structure of
original speech signal. Figure 4(c) shows the synthesized speech
signals “da jia hao” through the CIS, FAME and WZCS
strategies; they were obtained by summation of each sub-band in
the different strategies.
Figure 5 shows the spectra of original signal and spectra
of synthesized signals for the CIS, FAME and WZCS
strategies. The spectra were obtained through using the integer
FFT length 1024 to calculate the power spectral density of
original and synthesized speech signals. It is obvious that the
synthesized stimulus based on WZCS was more natural and
closer to the original speech signal than that of CIS. The main
difference between the three synthesized stimuli is that the
frequency component, as shown in the figure, is diverse. The
dash-dotted curve in Figure 5 is the spectrum of synthesized
stimulus for WZCS, which inosculates the spectrum of the
original signal (solid curve) very well with main frequency
component. On the contrary, the frequency components of
stimulus for CIS (dashed curve) and FAME (dotted curve) are
widely different from those of the original signal.
The results of experiment were analyzed using SPSS.
Paired t-test between WZCS and CIS under different conditions
with 3 kinds of test materials was carried out. Similarly analysis
was taken between WZCS and FAME to test this new strategy.
Table 1(a) demonstrates that the WZCS produced significantly
better performance both in quiet and in noisy environment than
CIS strategy (p < 0.01) for recognition of sentence, words and
tones, with the largest improvement being about 11 percentage
points at sentence and word recognition in quiet, and the largest
WZCS advantage was about 31 percentage points at tone
recognition in quiet, as shown in Figure 6(a). Figure 6(b)
illustrates that in noisy environment, normal hearing subjects
achieved at least 35 percentage points higher with WZCS than
CIS strategy in test material recognition. Table 1(b), Figure 6(a)
and Figure 6(b) show that normal-hearing subjects again
achieved high recognition rate with WZCS and FAME strategy
on test materials. They also produced significantly better
performance with WZCS than FAME on tone recognition both
in quiet and noisy environment (p = 0.019 in quiet and
p = 0.004 in noise). In quiet and noisy environment,
normal-hearing subjects produced effect similar between
WZCS and FAME on sentence and word recognition (with
p > 0.05). These results indicate that while current speech
coding strategies can help cochlear implant users recognize
what is said in quiet, they may get in trouble with perception of
tones and what is said in noise.
4. Discussion
The present study has offered strong evidence for the
corresponding contribution of temporal envelope and phase
Speech Processing Strategy for Cochlear Implant
(a)
(b)
(c)
Figure 4. (a) Stimulus of 8 channels for CIS, FAME and WZCS, (b) zoom-in details of stimulus processed by CIS, FAME and WZCS and (c)
original speech signal and synthesized speech signals through CIS, FAME and WZCS.
339
J. Med. Biol. Eng., Vol. 30. No. 5 2010
340
Figure 5. Spectrum of original speech signal and synthesized speech signals.
(a)
100
(b)
80
90
70
80
60
Recognition Rate(%)
70
60
50
50
40
40
CIS
30
CIS
30
FAME
WZCS
20
Sentence
Tone
Words
FAME
WZCS
20
Sentence
Tone
Words
Figure 6. (a) Recognition of different test materials with different strategies in quiet. (b) Recognition of different test materials with different
strategies in noise.
Table1. Results of paired t-test between WZCS and CIS as well as WZCS and FAME in quiet and noise.
SNR
Quiet
5 dB
SNR
Quiet
5 dB
(a) Paired t-test between WZCS and CIS (p value)
Sentence
Words
Tone
0.06
0.093
0.019
0.447
0.109
0.004
(b) Paired t-test between WZCS and FAME (p value)
Sentence
Words
Tone
<0.01
<0.01
<0.01
<0.01
<0.01
<0.01
information. Heretofore, some speech coding strategies have
definitely presented temporal envelope information to the
cochlear implant users, whereas most or all phase information
or frequency modulations were discarded. The analog coding
strategies (CA or SAS), which deliver the compressed
analogue outputs of sub-bands directly, allow some degree of
phase of frequency information representation, but their effect
is limited because of interactions among frequency bands and
electrodes. CIS strategy uses high rate of stimulation to allow
higher frequency components of speech through the envelope
detector and does present phase or frequency information in
this way, but how much the information cochlear implant users
could be able to perceive is critical. While the temporal
envelope cues from several frequency bands are sufficient to
support speech recognition in quiet, the phase cue is critical for
speech recognition in noise, particularly reflecting more
realistic listening situations. In the FAME strategy, the carrier
likely contains different FM cues for the target and masker,
allowing the target envelope to form another stream for better
sound segregation [5]. Similar with the FAME strategy, in the
WZCS strategy, the temporal envelope was used to amplitude
modulate the phase information instead of a common and fixed
Speech Processing Strategy for Cochlear Implant
carrier, preserving the listener‟s ability to separate target
speech signal from noise.
The proposed algorithm is based mostly on pioneering
work on information in the zero-crossings of band-pass signals
by Logan [24]. Some recent studies on fine structure and
frequency modulation representation were also noted [13-14].
Based on the Hilbert transform, FAME and AIS, two new
popular algorithms to obtain temporal envelope and frequency
modulation or fine structure components from the original
speech signal were developed [9,15]. In such a case, much
more frequency bands should be divided for these two
strategies to guarantee that the sub-band is a narrow band;
otherwise, the temporal envelope derived from the
Hilbert-transformed signal is not equal to that of the
corresponding sub-band, theoretically. WZCS strategy adopts
eight one-octave wavelet filters to realize the function of
band-pass filters facilitate the procedure of implementation.
The experiment results presented herein yield the very first
results for the WZCS speech coding strategy. Taking into
account the excellent results with significant improvements in
the hearing tests using the new strategy, this implementation of
a phase modulation strategy maybe could offer a new quality
of hearing with cochlear implants.
Prior studies have shown that continuously varying the
presentation frequency improves speech recognition over
constant-frequency strategies such as FAME strategy [13,15].
We hypothesized that the improvement could be achieved with
a discrete number of presentations of temporal envelope and
phase information. Our hearing test experimental results
support this hypothesis, so the strategy was proved to be
feasible. As the new strategy WZCS is difficult to apply
directly to cochlear implants because the derived phase
information generally varies too widely in range and too
rapidly in rate. These could be the limitations for cochlear
implant users to perceive it. While this strategy is very
encouraging, there is still a great deal to be learned about
electrical stimulation of the auditory nerve and many questions
to be answered. To apply this presented strategy to cochlear
implant, more further study is needs to be taken into account.
5. Conclusions
Cochlear implant users have shown widely varying
results due to many reasons, such as the history of their
deafness, the procedure of their implant surgery, the speech
coding strategies and so on. The success of cochlear implants
owes to the improvement of speech coding strategies
developed during the past decades. In this study, we have
proposed the one-octave wavelet zero-crossings stimulation
strategy and discussed this approach of synthesis stimulation
pulsatile series, particularly with regard to computer
simulation and hearing test experiment. The temporal dynamic
characteristic of audio signal could be completely
reconstructed at the zero-crossings of audio signal. Through
computer simulation and hearing test experiment, we found
that although present strategies with amplitude modulation
may be sufficient for speech perception in quiet, they may not
341
work well in noise; WZCS strategy with zero-crossings
modulation (phase information) obviously enhanced speech
recognition in noise and tone perception.
Though the existing strategies based on amplitude
modulation and filter bank could provide better speech
perception in quiet, results from previous and recent studies
reveal that the utility of these amplitude and spectral cues are
seriously limited to ideal listening conditions [13-15]. The
WZCS strategy, generating stimulating pulsatile series at
zero-crossings in the domain of one-octave wavelet transform,
could solve the problem of channel interaction and noise effect.
With varying stimulating rates determined by zero-crossings
(phase information) of audio signal, the WZCS could preserve
phase information and some fine structure remained in the
stimulus. This characteristic is not provided with some other
conventional strategies like CIS, SPEAK and so on. In
conclusion, WZCS strategy encodes zero-crossings modulation
and amplitude modulation extraction analysis, which
highlights the limitation of current speech coding strategies in
cochlear implants and the essentiality of encoding phase
information or zero-crossings to improve speech recognition in
noise and tonal language speech perception in realistic
listening environments.
Acknowledgements
This work was supported financially by the National
Natural Science Funds. We thank all the people who
participated in the research work. We also appreciate the
helpful comments made by several anonymous reviewers on a
previous version of this manuscript.
References
F. G. Zeng, “Cochlear implants in China,” Audiology. 34: 61-75,
1995.
[2] N. Waldo, B. Andreas, L. Thomas and E. Bernd, “A
psychoacoustic “N of M”-type speech coding strategy for
cochlear implants,” EURASIP J. Appl. Signal Processing, 18:
3044-3059, 2005.
[3] B. S. Wilson, D. T. Lawson, M. Zerbi, C. C. Finley and R. D.
Wolford, “New processing strategies in cochlear implantation,”
Am. J. Otol., 16: 669-675, 1995.
[4] B. S. Wilson, C. C. Finley, D. T. Lawson, R. D. Wolford, D. K.
Eddington and M. R. William, “Better speech recognition with
cochlear implants,” Nature, 352: 236-238, 1991.
[5] K. B. Nie, S. Ginger and F. G. Zeng, “Encoding frequency
modulation to improve cochlear implant performance in noise,”
IEEE Trans. Biomed. Eng., 52: 64-73, 2005.
[6] B. S. Wilson, C. C. Finley, D. T. Lawson, R. D. Wolford and M.
Zerbi, “Design and evaluation of continuous interleaved
sampling (CIS) processing strategy for multi-channel cochlear
implants,” J. Rehabil. Res. Dev., 30: 110-116, 1993.
[7] F. G. Zeng, “Temporal pitch in electric hearing,” Hear. Res., 174:
101-106, 2002.
[8] K. H. Kim, S. J. Choi, J. H. Kim and D. H. Kim, “An improved
speech processing strategy for cochlear implants based on an
active nonlinear filterbank model of the biological cochlea,”
IEEE Trans. Biomed. Eng., 56: 828-836, 2009.
[9] J. J. Sit, A. M. Simonson, A. J. Oxenham, M. A. Faltys and R.
Sarpeshkar, “A low-power asynchronous interleaved sampling
algorithm for cochlear implants that encodes envelope and
phase information,” IEEE Trans. Biomed. Eng., 54: 138-149,
2007.
[10] C. M. Zierhofer, I. J. Hochmair and E. S. Hochmair, “Electronic
[1]
342
[11]
[12]
[13]
[14]
[15]
[16]
[17]
J. Med. Biol. Eng., Vol. 30. No. 5 2010
design of a cochlear implant for multi-channel high rate pulsatile
stimulation strategies,” IEEE Trans. Rehabil. Eng., 3: 112-116,
1995.
H. J. McDermott, A. E. Vandali, R. J. M. Van Hoesel, C. M.
McKay, J. M. Harrison and L. T. Cohen, “A portable
programmable digital sound processor for cochlear implant
research,” IEEE Trans. Rehabil. Eng., 1: 94-100, 2002.
H. J. McDermott, C. M. McKay and A. E. Vandali, “A new
portable sound processor for the University of Melbourne
Nucleus Limited multi-electrode cochlear implant,” J. Acoust.
Soc. Am., 91: 3367-3371, 1992.
K. B. Nie, B. Amy and F. G. Zeng, “Spectral and temporal cues
in cochlear implant speech perception,” Ear Hear., 27: 208-217,
2006.
X. Luo, Q. J. Fu, C. G. Wei and K. L. Cao, “Speech recognition
and temporal amplitude modulation processing by
Mandarin-speaking cochlear implant users,” Ear Hear., 29:
957-970, 2008.
F. G. Zeng, K. B. Nie and S. S Ginger, Y. Y. Kong, V. Michael, B.
Ashish , C. G. Wei and K. L. Cao, “Speech recognition with
amplitude and frequency modulations,” Proc. Natl. Acad. Sci.
USA, 102: 2293-2298, 2005.
C. A. Miller, N. Hu, F. Zhang, B. K. Robinson and P. J. Abbas,
“Changes across time in the temporal responses of auditory
nerve fibers stimulated by electric pulse trains,” JARO, 9:
122-137, 2008.
C. M. John, “Auditory cortex phase locking to
amplitude-modulated cochlear implant pulse trains,” J.
Neurophysiol., 100: 76-91, 2008.
[18] K. Wang and S. A. Shamma, “Auditory analysis of
spectro-temporal information in acoustic signal,” IEEE Eng.
Med. Biol. Mag., 14: 186-194, 1995.
[19] P. J. Blamey, R. C. Dowell and G. M. Clark, “Acoustic
parameters measured by a formant-estimating speech processor
for a multiple-channel cochlear implant,” J. Acoust. Soc. Am.,
82: 38-47, 1987.
[20] F. G. Zeng, S. Rebscher, W. Harrison, X. A. Sun and H. H. Feng,
“Cochlear implants: system design, integration and evaluation,”
IEEE Rev. Biomed. Eng., 1: 115-142, 2008.
[21] C. N. Jolly, F. A. Spelman and B. M. Clopton, “Quadrupolar
stimulation for cochlear prostheses: modeling and experimental
data,” IEEE Trans. Biomed. Eng., 43: 857-865, 1996.
[22] D. Marr (Ed.), Vision, New York: W. H. Freeman and Company,
1982.
[23] H. Q. Wang, Q. Y. Zhang and J. B. Xue, “Research of speech
de-noising method based on multi-resolution of wavelet
transform,” Comput. Eng. Des., 27: 235-237, 2006.
[24] B. Logan, “Information in the zero-crossings of band pass
signals,” Bell Syst. Tech. J., 56: 487-510, 1977.
[25] G. E. Loeb, C. L. Byers, S. J. Rebscher, D. E. Casey, M. M.
Fong, R. A. Schindler, R. F. Gray and M. M. Merzenich,
“Design and fabrication of experimental cochlear prosthesis,”
Med. Biol. Eng. Comput., 21: 241-254, 1983.
[26] L. Xu and E. P. Bryan, “Spectral and temporal cues for speech
recognition: implications for auditory prostheses,” Hear. Res.,
242: 132-140, 2008.