Lâm Thị Diễm Khoa Điện tử - Viễn thông Trường Đại Học Khoa Học

Transcription

Lâm Thị Diễm Khoa Điện tử - Viễn thông Trường Đại Học Khoa Học
Lâm Thị Diễm
Khoa Điện tử - Viễn thông
Trường Đại Học Khoa Học Tự Nhiên – ĐHQG TPHCM
CÁC THÀNH TÍCH ĐẠT ĐƯỢC TỪ NĂM 2011 ĐẾN NAY:
1) Giải khuyến khích cuộc thi học thuật ELABS lần IV khối cơ bản (4/2011), cuộc thi
được tổ chức hàng năm bởi khoa Điện tử - Viễn thông, trường Đại học Khoa học Tự
nhiên
2) Đạt giải nhất cuộc thi học thuật ELABS lần VI khối nâng cao (5/2013), cuộc thi được
tổ chức hàng năm bởi khoa Điện tử - Viễn thông, trường Đại học Khoa học Tự nhiên
3) Tham gia cuộc thi LSI Contest lần thứ 17, được tổ chức vào tháng 3 - 2014 tổ chức ở
Okinawa, Nhật Bản với đề tài : “Thiết kế và xây dựng kiến trúc phần cứng cho bộ lọc
nhiễu âm thanh của tiếng nói dựa trên việc ước lượng biên độ phổ tiếng nói theo thuật
toán Maximum A Posteriori (MAP) thích nghi”
Đường dẫn trang web cuộc thi : http://www.lsi-contest.com/index_e.html
4) Tham gia trong nhóm nghiên cứu đề tài nghiên cứu khoa học “Nghiên cứu và thực
hiện bộ giải mã H.264 trên FPGA”, dưới sự hướng dẫn trực tiếp của thầy trưởng khoa
Điện tử Viễn thông – Ts Huỳnh Hữu Thuận.

The 17th LSI Design Contest 2014
A Novel Hardware Architecture for Noise Cancellation based on Adaptive MAP
Speech Spectral Amplitude Estimator
Duy-Hoang HOANG, Doc-Truong DAO, Thi-Diem LAM
Faculty of Electronics and Telecommunication
University of Science – Ho Chi Minh City
Ho Chi Minh City, Vietnam
[email protected], [email protected], [email protected]
Abstract—A novel hardware architecture for noise
cancellation (NC) based on adaptive MAP speech spectral
amplitude estimator is proposed in this paper. The used speech
probability density function (PDF) has two adaptive shape
parameters which affect the quality of enhanced speech. Noise
can be efficiently suppressed by estimating properly these
parameters so that the adaptive speech PDF shape fits to the
real-speech PDF one. A novel hardware implementation is
accomplished based on proposed algorithm. Proposed hardware
system can run with very high clock frequency at 127 Mhz. Some
techniques also applied to get low resources for design. A
parameter controller configured by system firmware is
implemented to make NC system more flexible. Experimental
results show that proposed NC system can perform in real-time
with high quality of enhanced speech.
Keywords—Hardware Architecture, Noise Cancellation,
Adaptive MAP, Speech Spectral Amplitude Estimator
I.
INTRODUCTION
Continuous improvement of communication and
multimedia systems has led to the widespread use of speech
recording and processing devices, e.g., mobile phones, speech
recognition tools. In most practical situations, these devices
are being used in environments where undesirable background
noise exists. Degraded speech can cause problems for both
mobile communication and speech recognition systems
[1],[9].
Speech enhancement technique is necessary in a wide
range of applications including mobile communication and
speech recognition systems. Single microphone speech
enhancement has been a research topic for decades [8], and
one of the famous methods in the spectral domain is the
spectral subtraction algorithm proposed by Boll [4].
The Spectral Subtraction (SS) [4] is the most popular
method among stationary noise reduction techniques. SS
method achieves noise reduction by simply subtracting a priori
estimated noise spectral amplitude from an observed one.
Since SS method is easy for implementation and effectively
reduces stationary noises, it has been extensively researched.
SS algorithm has been evolved to more efficient methods socalled spectral gain approaches [17]. However, those methods
also require a priori estimated noise spectral amplitude.
An improved method of SS is
Adaptive Noise
Cancellation (ANC) [1]. ANC systems can effectively
suppress noise in non-stationary conditions. However, the
drawback of ANC systems is that we need at least two
microphones -one for input speech and one for reference noise
[1]. Unfortunately it provides annoying artifacts called
“musical noise” in the enhanced speech [10].
Ephraim and Malah have thus proposed an effective
method for removing musical noise, called the MMSE-STSA
(Minimum Mean Square Error Short Time Spectral
Amplitude) method [17]. The MMSE-STSA becomes a strong
tool of speech enhancement. The improved methods are also
proposed in [8], and a noise suppressor employing the
algorithm is implemented in a cellular phone [8]. The MMSESTSA method minimizes the mean square error of the short
time spectral amplitude. This method assumes that the
Discrete Fourier Transform (DFT) coefficient of speech obeys
Gauss probability density function (PDF). The PDF of the
speech spectral amplitude then results in Rayleigh distribution.
However, Martin has pointed out that the DFT coefficient is
more likely to fit a Gamma PDF and has shown that the
estimator designed under Gamma model decreases the mean
square error as compared with the one under Gaussian model
[8]. However, neither of the speech models fits the actual DFT
coefficient of the speech sufficiently.
In recent years, Lotter and Vary proposed a new method
using MAP (Maximum a Posteriori) estimation with a
parametric probability density function (PDF) [2]. In this
method, a speech spectral density is approximated by the PDF
whose parameters are calculated and fixed from the large
amounts of clean speech data. Based on the PDF, speech
enhancement is achieved by applying the MAP estimation
rule. They showed that the performance of the MAP estimator
is superior to that of the MMSE-STSA method in terms of
noise attenuation. However, the speech intelligibility in a
speech segment may not be sufficiently good because
parameters of the speech spectral density is uniquely decided
from the amounts of speech data including both a speech
segment and a non-speech segment. In the same way, the
noise reduction in a non-speech segment may not be enough
because of the fixed parameters [10].
To solve this problem, Arata Kawamura, W. Thanhikam,
and Youji Iiguni have previously proposed an adaptive
algorithm for speech enhancement, so that it adaptively
changes the PDF parameters depending whether the observed
signal is in a speech segment or in a non-speech segment. In a
speech segment, they adjust the parameters so that the speech
PDF approaches a Rayleigh distribution under the assumption
that the speech PDF in speech segment approaches a Rayleigh
distribution [8].