Lâm Thị Diễm Khoa Điện tử - Viễn thông Trường Đại Học Khoa Học
Transcription
Lâm Thị Diễm Khoa Điện tử - Viễn thông Trường Đại Học Khoa Học
Lâm Thị Diễm Khoa Điện tử - Viễn thông Trường Đại Học Khoa Học Tự Nhiên – ĐHQG TPHCM CÁC THÀNH TÍCH ĐẠT ĐƯỢC TỪ NĂM 2011 ĐẾN NAY: 1) Giải khuyến khích cuộc thi học thuật ELABS lần IV khối cơ bản (4/2011), cuộc thi được tổ chức hàng năm bởi khoa Điện tử - Viễn thông, trường Đại học Khoa học Tự nhiên 2) Đạt giải nhất cuộc thi học thuật ELABS lần VI khối nâng cao (5/2013), cuộc thi được tổ chức hàng năm bởi khoa Điện tử - Viễn thông, trường Đại học Khoa học Tự nhiên 3) Tham gia cuộc thi LSI Contest lần thứ 17, được tổ chức vào tháng 3 - 2014 tổ chức ở Okinawa, Nhật Bản với đề tài : “Thiết kế và xây dựng kiến trúc phần cứng cho bộ lọc nhiễu âm thanh của tiếng nói dựa trên việc ước lượng biên độ phổ tiếng nói theo thuật toán Maximum A Posteriori (MAP) thích nghi” Đường dẫn trang web cuộc thi : http://www.lsi-contest.com/index_e.html 4) Tham gia trong nhóm nghiên cứu đề tài nghiên cứu khoa học “Nghiên cứu và thực hiện bộ giải mã H.264 trên FPGA”, dưới sự hướng dẫn trực tiếp của thầy trưởng khoa Điện tử Viễn thông – Ts Huỳnh Hữu Thuận. The 17th LSI Design Contest 2014 A Novel Hardware Architecture for Noise Cancellation based on Adaptive MAP Speech Spectral Amplitude Estimator Duy-Hoang HOANG, Doc-Truong DAO, Thi-Diem LAM Faculty of Electronics and Telecommunication University of Science – Ho Chi Minh City Ho Chi Minh City, Vietnam [email protected], [email protected], [email protected] Abstract—A novel hardware architecture for noise cancellation (NC) based on adaptive MAP speech spectral amplitude estimator is proposed in this paper. The used speech probability density function (PDF) has two adaptive shape parameters which affect the quality of enhanced speech. Noise can be efficiently suppressed by estimating properly these parameters so that the adaptive speech PDF shape fits to the real-speech PDF one. A novel hardware implementation is accomplished based on proposed algorithm. Proposed hardware system can run with very high clock frequency at 127 Mhz. Some techniques also applied to get low resources for design. A parameter controller configured by system firmware is implemented to make NC system more flexible. Experimental results show that proposed NC system can perform in real-time with high quality of enhanced speech. Keywords—Hardware Architecture, Noise Cancellation, Adaptive MAP, Speech Spectral Amplitude Estimator I. INTRODUCTION Continuous improvement of communication and multimedia systems has led to the widespread use of speech recording and processing devices, e.g., mobile phones, speech recognition tools. In most practical situations, these devices are being used in environments where undesirable background noise exists. Degraded speech can cause problems for both mobile communication and speech recognition systems [1],[9]. Speech enhancement technique is necessary in a wide range of applications including mobile communication and speech recognition systems. Single microphone speech enhancement has been a research topic for decades [8], and one of the famous methods in the spectral domain is the spectral subtraction algorithm proposed by Boll [4]. The Spectral Subtraction (SS) [4] is the most popular method among stationary noise reduction techniques. SS method achieves noise reduction by simply subtracting a priori estimated noise spectral amplitude from an observed one. Since SS method is easy for implementation and effectively reduces stationary noises, it has been extensively researched. SS algorithm has been evolved to more efficient methods socalled spectral gain approaches [17]. However, those methods also require a priori estimated noise spectral amplitude. An improved method of SS is Adaptive Noise Cancellation (ANC) [1]. ANC systems can effectively suppress noise in non-stationary conditions. However, the drawback of ANC systems is that we need at least two microphones -one for input speech and one for reference noise [1]. Unfortunately it provides annoying artifacts called “musical noise” in the enhanced speech [10]. Ephraim and Malah have thus proposed an effective method for removing musical noise, called the MMSE-STSA (Minimum Mean Square Error Short Time Spectral Amplitude) method [17]. The MMSE-STSA becomes a strong tool of speech enhancement. The improved methods are also proposed in [8], and a noise suppressor employing the algorithm is implemented in a cellular phone [8]. The MMSESTSA method minimizes the mean square error of the short time spectral amplitude. This method assumes that the Discrete Fourier Transform (DFT) coefficient of speech obeys Gauss probability density function (PDF). The PDF of the speech spectral amplitude then results in Rayleigh distribution. However, Martin has pointed out that the DFT coefficient is more likely to fit a Gamma PDF and has shown that the estimator designed under Gamma model decreases the mean square error as compared with the one under Gaussian model [8]. However, neither of the speech models fits the actual DFT coefficient of the speech sufficiently. In recent years, Lotter and Vary proposed a new method using MAP (Maximum a Posteriori) estimation with a parametric probability density function (PDF) [2]. In this method, a speech spectral density is approximated by the PDF whose parameters are calculated and fixed from the large amounts of clean speech data. Based on the PDF, speech enhancement is achieved by applying the MAP estimation rule. They showed that the performance of the MAP estimator is superior to that of the MMSE-STSA method in terms of noise attenuation. However, the speech intelligibility in a speech segment may not be sufficiently good because parameters of the speech spectral density is uniquely decided from the amounts of speech data including both a speech segment and a non-speech segment. In the same way, the noise reduction in a non-speech segment may not be enough because of the fixed parameters [10]. To solve this problem, Arata Kawamura, W. Thanhikam, and Youji Iiguni have previously proposed an adaptive algorithm for speech enhancement, so that it adaptively changes the PDF parameters depending whether the observed signal is in a speech segment or in a non-speech segment. In a speech segment, they adjust the parameters so that the speech PDF approaches a Rayleigh distribution under the assumption that the speech PDF in speech segment approaches a Rayleigh distribution [8].