Spatial Acoustic Signal Processing
Transcription
Spatial Acoustic Signal Processing
Spatial Acoustic Signal Processing THÈSE NO 5240 (2012) PRÉSENTÉE le 16 janvier 2012 À LA FACULTÉ INFORMATIQUE ET COMMUNICATIONS LABORATOIRE DE COMMUNICATIONS AUDIOVISUELLES PROGRAMME DOCTORAL EN INFORMATIQUE, COMMUNICATIONS ET INFORMATION ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE POUR L'OBTENTION DU GRADE DE DOCTEUR ÈS SCIENCES PAR Mihailo Kolundžija acceptée sur proposition du jury: Prof. P. Thiran, président du jury Prof. M. Vetterli, Dr C. Faller, directeurs de thèse Prof. K. Brandenburg, rapporteur Prof. D. de Vries, rapporteur Prof. M. Hasler, rapporteur Suisse 2012 ii Abstract A sound field on a line or in a plane has an effectively limited spatial bandwidth determined by the temporal frequency. Similar can be said for sound fields from farfield sources when analyzed on circular and spherical apertures. Namely, for a given frequency and aperture size, a sound field is effectively composed of a finite number of circular or spherical harmonic components. Based on these two observations, it follows that if adequately sampled, sound fields can be represented and manipulated in a digital domain with negligible loss of information. The optimal sampling surface depends on the problem geometry, and the set of sampling points needs to be in accordance with the Nyquist criterion relative to the mentioned effective sound field bandwidth. In this thesis, we address the problems of sound field capture and reproduction from a practical perspective. More specifically, we present approaches that do not depend on acoustical models, but rely instead on obtaining an acoustic MIMO channel between transducers (microphones or loudspeakers) and a set of sampling (or control) points. Subsequently, sound field capture and reproduction are formulated as constrained optimization problems in a spatially discrete domain and solved using conventional numerical optimization tools. The first part of the thesis deals with spatial sound capture. We present a framework for analyzing and designing differential microphone arrays based on spatiotemporal sound field gradients. We also show how to record two- and three-dimensional sound fields with differential, circular, and spherical microphone arrays. Finally, we use the mentioned discrete optimization for computing filters for directional and sound field microphone arrays. In the second part of the thesis, we focus on spatial sound reproduction. We first present a design of a baffled loudspeaker array for reproducing sound with high directivity over a wide frequency range, which combines beamforming at low, and scattering from a rigid baffle at high frequencies. We next present Sound Field Reconstruction (SFR), which is an approach for optimally reproducing a desired sound field in a wide listening area by inverting a discrete, MIMO acoustic channel. In the end, we propose a single- and multi-channel low-frequency room equalization method, formulated as a discrete constrained optimization problem, with constraints designed to prevent excessive equalization filter gains, localization bias, and temporal distortions in the form of pre- and post-echos. Keywords: Microphone arrays, loudspeaker arrays, gradient microphones, differential microphone arrays, sound field microphones, sound field capture, directional iii iv Abstract sound reproduction, beamforming, sound field reproduction, room equalization. Zusammenfassung Ein auf einer Geraden oder einer Ebene observiertes Schallfeld hat eine limitierte Bandbreite in der räumlichen Dimension, bestimmt durch die Frequenz der sich ausbreitenden Schallwellen. Gleiches kann über Schallfelder von Fernfeldquellen gesagt werden, welche auf Kreisen oder Kugeloberflächen observiert werden. Das heisst, für eine gegebene Frequenz und Kreis- oder Kugelgrösse gibt es eine endliche Zahl von Kreis- oder Kugelflächenfunktionen, die das Schallfeld bestimmen. Aus diesen beiden Eigenschaften folgt, dass man Schallfelder diskret räumlich abtasten kann, ohne nennenswerten Informationsverlust. Die optimale Abtastfläche hängt von der Problemstellungsgeometrie ab. Die Abtastpunkte erfüllen das Nyquist-Shannonsche Abtasttheorem. Diese Arbeit befasst sich mit Problemstellungen der Aufnahme und Wiedergabe von Schallfeldern. Wir stellen Methoden vor, die nicht auf analytischen akustischen Modellen basieren, sondern auf Messungen der akustischen Transferfunktionen zwischen den akustischen Wandlern (Mikrophone oder Lautsprecher) und den definierten Schallfeldabtastpunkten. Die Schallfeldaufnahme und Wiedergabe kann dann als Optimierungsaufgabe mit Nebenbedingungen behandelt werden. Konventionelle numerische Optimierungstechniken können angewandt werden, um Lösungen zu finden. Der erste Teil dieser Arbeit behandelt die Aufnahme von Schall und Schallfeldern. Eine Theorie wird vorgestellt mittels der man differenzielle Mikrofonanordnungen analysieren und entwerfen kann. Die Theorie basiert auf zeitlich-räumlichen Schallfeldgradienten. Wir zeigen auch, wie man Schallfelder in zwei und drei Dimensionen mittels differentiellen, zirkularen, und sphärischen Mikrofonanordnungen aufnehmen kann. Schliesslich zeigen wir, wie die erwähnte Methode der Schallfeldabtastung mit Optimierung benutzt wird, um Filter für gerichtete Mikrofonanordnungen und Mikrofonanordnungen zur Schallfeldaufnahme zu berechnen. Der zweite Teil dieser Arbeit befasst sich mit der Wiedergabe von Schall und Schallfeldern. Zuerst stellen wir eine in einen zylindrischen Streukörper eingebettete Kreislautsprecheranordnung vor, um Schall sehr gerichtet wiederzugeben. Optimierte Keulenformung unter Berücksichtigung der auftretenden Streuung am Zylinder bei höheren Frequenzen ermöglicht hohe Richtwirkung über einen grossen Frequenzbereich. Wir stellen auch eine Methode zur Wiedergabe von ganzen Schallfeldern vor. Um das Schallfeld präzise in einem grossen Bereich wiederzugeben, wird das Mehrgrössensystem, definiert zwischen der Lautsprecheranordnung und den Schallfeldabtastpunkten, invertiert. Schliesslich stellen wir noch ein Verfahren vor, welches mittels Tieffrequenzabgleichung ein- und mehrkanalige Lautsprecheranordnungen durch diskrete Optimierung an einen Raum anpasst. Wieder verwenden wir diskrete Abtastung v vi Zusammenfassung des Schallfeldes kombiniert mit Optimierung mit Nebenbedingungen, um Filter ohne übermässige Verstärkung gewisser Frequenzen zu erhalten und um störende Vorechos und Echos zu vermeiden. Schlagworte: Mikrofonanordnungen, Lautsprecheranordnungen, Druckgradientenmikrofone, differenzielle Mikrofonanordnungen, Schallfeldmikrofone, Schallfeldaufnahme, gerichtete Schallwiedergabe, Keulenformung, Schallfeldwiedergabe, Raumentzerrung. Acknowledgments First and foremost, I would like to thank my two supervisors, Prof. Martin Vetterli and Dr. Christof Faller. Martin was a source of great ideas from the first day on, had all the patience to let me find my way, and believed in my work probably much more than I did, for which I am very grateful. He is an erudite scientist, brilliant teacher and supervisor, and a great source of positive energy and humor.1 Christof showed me how to take things pragmatically and practically, how to focus on realistic goals and make things work in the end. On a more personal note, he and his family have been great friends over the last four years, and in the moments of honesty,2 he dared admitting that I was his only drinking friend (which I took both as a compliment and criticism). I would also like to express my gratitude to Professors Diemer de Vries, Karlheinz Brandenburg, Martin Hasler, and Patrick Thiran for accepting to be on my thesis committee and giving me useful comments and remarks. I owe many thanks to our lab’s secretaries, Jocelyne and Jacqueline, for all the help and for greatly organizing the lab, and to all the colleagues and friends in LCAV for the great time on and off work. The list may not be comprehensive, but I will try: Ali, Amina, Andreas, Christophe, Clément, Florence, Francisco, Guillermo, Gunnar, Ivan, Ivana, Juri, Jay, Patrick, Pedro, RK, Roberto, Simon, Yann, and Zichong. I am also thankful to my office-mate Feng for all the interesting conversations and jokes, and for some great pieces of Chinese wisdom that helped me adjust some of my views in life. I am grateful to my friends at and outside of EPFL for all the good time we had together. The list may not be complete: Aleksandar (3x), Ana, Andrei, Bojana, Danica, Daniel, Dejan, Duško, Ivana (2x), Jelena (2x), Jugoslava, Marija, Marko, Miloš, Mirjana, Miroslav, Nedeljko, Nenad, Nevena, Nikodin, Nikola, Radu, Roberto, Ružica, Smiljka, Tamara, Tanja, Viktor, Violeta, Vladan, Vojin, and Zorana. I am thankful to Veronika for the time we spent together and for putting up with me almost until the end of this thesis. I am grateful to my cousin Branka, her husband Nebojša, and their children Marija, Dragiša, and Nataša for all the feasts and haircuts, and the great time we had during my visits to Bern. Last but not least, I am grateful to my parents, Nikola and Jovanka, and my sister Marija, for their unconditional love and support during all these years. 1 This 2 In is a typical east-European understatement. vino veritas. vii viii Acknowledgments Contents Abstract iii Zusammenfassung v Acknowledgments vii 1 Introduction 1.1 Thesis motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Acoustics Fundamentals 2.1 Fundamental acoustic equations . . . . . . . . . . . . . 2.1.1 Euler’s equation . . . . . . . . . . . . . . . . . 2.1.2 The acoustic wave equation . . . . . . . . . . . 2.2 Point source and Green’s function . . . . . . . . . . . 2.2.1 Point source . . . . . . . . . . . . . . . . . . . . 2.2.2 Green’s function . . . . . . . . . . . . . . . . . 2.2.3 Time-dependent Green’s function . . . . . . . . 2.2.4 General solutions of the acoustic wave equation 2.3 Helmholtz integral equation . . . . . . . . . . . . . . . 2.3.1 Rayleigh’s integrals . . . . . . . . . . . . . . . . 2.4 Plane waves . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Evanescent waves . . . . . . . . . . . . . . . . . 2.4.2 The angular spectrum . . . . . . . . . . . . . . 2.5 Cylindrical waves . . . . . . . . . . . . . . . . . . . . . 2.5.1 Boundary value problems . . . . . . . . . . . . 2.5.2 Helical wave spectrum . . . . . . . . . . . . . . 2.5.3 Rayleigh’s integral . . . . . . . . . . . . . . . . 2.5.4 Piston in a cylindrical baffle . . . . . . . . . . . 2.5.5 Scattering from rigid cylinders . . . . . . . . . 2.6 Spherical waves . . . . . . . . . . . . . . . . . . . . . . 2.6.1 Boundary value problems . . . . . . . . . . . . 2.6.2 Spherical wave spectrum . . . . . . . . . . . . . 2.6.3 Scattering from rigid spheres . . . . . . . . . . 2.7 Room acoustics . . . . . . . . . . . . . . . . . . . . . . 2.7.1 Wave theory of room acoustics . . . . . . . . . ix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 3 5 5 6 6 7 7 7 7 8 9 9 11 12 12 13 14 16 16 17 19 20 21 23 25 25 26 Contents x . . . . . . . . . . . . 30 31 34 35 36 38 3 Microphone Arrays For Directional Sound Capture 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Differential microphone arrays . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Spatial derivatives of a far-field sound pressure field . . . . . 3.2.2 Spatio-temporal derivatives of a far-field sound pressure field 3.2.3 Differential microphone arrays . . . . . . . . . . . . . . . . . 3.3 Directional microphone arrays as acoustic beamformers . . . . . . . 3.3.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 45 45 47 48 48 48 51 56 59 59 2.8 2.7.2 Statistical room acoustics 2.7.3 Geometrical acoustics . . 2.7.4 Reverberation time . . . . 2.7.5 Critical distance . . . . . Acoustic beamforming . . . . . . 2.8.1 Beamformer filter design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Microphone Arrays For Sound Field Capture 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Wave field decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Horizontal sound field decomposition . . . . . . . . . . . . . . . 4.2.2 Three-dimensional sound field decomposition . . . . . . . . . . 4.3 Measuring a horizontal sound field with gradient microphone arrays . 4.3.1 Gradient-based horizontal sound field microphones . . . . . . . 4.4 Circular microphone arrays . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Continuous circular microphone apertures . . . . . . . . . . . . 4.4.2 Sampling circular microphone apertures . . . . . . . . . . . . . 4.5 Spherical microphone arrays . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Continuous spherical microphone apertures . . . . . . . . . . . 4.5.2 Sampling spherical microphone apertures . . . . . . . . . . . . 4.6 Sound field microphones as acoustic beamformers . . . . . . . . . . . . 4.6.1 Filter design for a circular microphone array mounted on a rigid cylinder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.2 Soundfield microphone non-coincidence correction filter design 4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Baffled Loudspeaker Array For Spatial Sound Reproduction 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Chapter outline . . . . . . . . . . . . . . . . . . . . . . . 5.2 Acoustical design . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Baffled loudspeaker model . . . . . . . . . . . . . . . . . 5.3 Beamformer design . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Filter design procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 61 61 62 62 63 66 67 69 73 74 79 80 81 86 88 90 92 96 99 99 100 101 101 101 104 106 Contents 5.4 5.5 5.6 5.7 Simulations . Experiments . Applications . Conclusions . xi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 111 114 116 6 Reproducing Sound Fields Using MIMO Acoustic Channel Inversion 117 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 6.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 6.1.2 Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.2 Sound Field Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . 120 6.2.1 Plenacoustic sampling and interpolation . . . . . . . . . . . . . 120 6.2.2 Sound Field Reconstruction using MIMO channel inversion . . 122 6.2.3 Practical extensions of Sound Field Reconstruction . . . . . . . 124 6.2.4 Designing discrete-time filters for Sound Field Reconstruction . 127 6.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 6.3.1 Sound field snapshot analysis . . . . . . . . . . . . . . . . . . . 131 6.3.2 Impulse response analysis . . . . . . . . . . . . . . . . . . . . . 134 6.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 6.4 Practical considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 139 6.4.1 Computational complexity . . . . . . . . . . . . . . . . . . . . . 139 6.4.2 Performing system measurements . . . . . . . . . . . . . . . . . 140 6.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 7 Multichannel Room Equalization Considering Psychoacoustics 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Proposed room equalization . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Problem description . . . . . . . . . . . . . . . . . . . . . . 7.2.2 Desired response calculation . . . . . . . . . . . . . . . . . . 7.2.3 Choice of a cost function . . . . . . . . . . . . . . . . . . . 7.2.4 Equalization filter constraints . . . . . . . . . . . . . . . . . 7.2.5 Filter computation procedure . . . . . . . . . . . . . . . . . 7.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 MIMO equalization . . . . . . . . . . . . . . . . . . . . . . . 7.3.2 SIMO equalization . . . . . . . . . . . . . . . . . . . . . . . 7.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 143 144 145 145 146 147 148 152 153 153 154 154 157 8 Conclusion 159 Bibliography 161 Curiculum Vitæ 169 xii Contents Chapter 1 Introduction 1.1 Thesis motivation Recent work by Ajdler et al. (2006) has shown that sound fields are effectively bandlimited in the Fourier basis on linear and planar geometries, where the spatial effective bandwidth is determined by the temporal frequency. Similarly, a two- or three-dimensional sound field from far-field sound sources analyzed on a circular or spherical aperture has effectively a finite number of non-zero circular or spherical harmonics, respectively (e.g., see Ajdler et al., 2008; Rafaely et al., 2007b). The effective number of harmonics is dependent on the temporal frequency and aperture radius. The above observations have important implications for sound field capture, representation, and reproduction. Namely, it is well known that signals of finite bandwidth can be sampled without loss of information, provided that sampling is done according to the Nyquist criterion. Thus, given the geometry of a problem, an appropriate choice of a sampling strategy converts the problem of sound field capture or reproduction into a discrete domain. The advantages of working with sampled sound fields are manifold. • One avoids dependence on idealized models, commonly used when deriving sound field capture and reproduction strategies from the acoustic principles such as Helmholtz integral equation and Rayleigh’s integrals, boundary value problems, and scattering from rigid bodies.1 Instead, one can measure microphone or loudspeaker array’s characteristics using an appropriately designed sampling strategy, and implicitly account for any effects which make them inconsistent with a theoretical model. • One can formulate the tasks of sound field capture and reproduction in terms of a well-known array signal processing framework (Van Trees, 2002; Johnson and Dudgeon, 1993) and solve them efficiently using the tools for numerical optimization (Boyd and Vandenberghe, 2004). • Physical limitations or psychoacoustical observations relevant to the analyzed 1 The listed acoustic principles are presented in Chapter 2. 1 Introduction 2 loudspeaker array Figure 1.1: (a) Directional sound capture; (b) sound field capture; (c) directional sound reproduction; (d) sound field reproduction. problem can be made part of the numerical optimization procedure, e.g., through constraints or as a part of the cost function. We would like to particularly stress the last bullet point, since it gives an additional flexibility and advantage over purely physical approaches. Given these observations, we expect that working with discretized sound fields is a flexible way to achieve the optimal sound field capture and reproduction for a particular capture or reproduction system, taking into account its characteristics and including its limitations. This thesis describes the application of discretized sound field processing through constrained numerical optimization to the following problems in acoustics: • Directional sound capture with microphone arrays, illustrated in Figure 1.1(a) • Sound field capture with microphone arrays, illustrated in Figure 1.1(b) • Directional sound reproduction with baffled loudspeaker arrays, illustrated in Figure 1.1(c) • Sound field reproduction with loudspeaker arrays, illustrated in Figure 1.1(d) • Single-channel and multichannel room equalization. 1.2 Thesis outline 1.2 3 Thesis outline In Chapter 2, we give an overview of the acoustics fundamentals relevant for the problems presented in later chapters. We give a review of the acoustic wave equation and its eigenfunction decompositions in different coordinate systems, Helmholtz integral equation, and radiation and scattering from rigid cylinders and spheres. In addition, we present the room acoustics fundamentals relevant for room equalization, and we give a detailed account of acoustic beamforming. In Chapter 3, we describe directional sound capture. We give a framework for the analysis and design of differential and gradient microphone arrays, which is based on analyzing spatio-temporal gradients of a sound field from far-field sources (Kolundžija et al., 2009a, 2011a). We also show how directional microphone arrays can be designed by discretizing microphones’ directional responses and formulating and solving a numerical optimization problem for synthesizing a desired directional response. We verify the effectiveness of this approach through simulations. Chapter 4 focuses on microphone arrays for sound field capture. We relate decompositions of two- and three-dimensional sound fields in terms of circular and spherical harmonics, respectively, to sound capture with directional microphones. We show a way to capture a complete representation of a two-dimensional sound field with gradient microphones (Kolundžija et al., 2010b). In addition, we show how to capture twoand three-dimensional sound fields using unbaffled and baffled circular and spherical arrays of pressure or first-order gradient microphones. Similarly to the directional microphone arrays, we show how filters for sound field microphone arrays can be designed by discretizing the microphones’ directional responses and formulating and solving a numerical optimization problem for desired response synthesis. Using simulations, we show how this approach can be used to measure circular harmonic decomposition with a circular microphone array, and to obtain the so-called non-coincidence correction filters for the Soundfield microphone (Faller and Kolundžija, 2009). In Chapter 5, we describe a design of a baffled loudspeaker array used for highly directional sound reproduction over a wide frequency range (Kolundžija et al., 2010a, 2011b). Our design is based on optimal beamforming in the magnitude sense for directional reproduction at low frequencies, and scattering of sound from rigid bodies for directional reproduction at high frequencies. We show that for directional sound reproduction, a baffled loudspeaker delivers an essentially consistent directional performance at high frequencies. In addition, we use beamforming with multiple loudspeakers at low frequencies to synthesize the directional response of a single loudspeaker at high frequencies. We show the feasibility of our approach through simulations of an acoustical model, and through experiments with a prototype loudspeaker array. Chapter 6 describes an approach for reproducing sound fields, termed Sound Field Reconstruction (SFR), motivated by the essential spatio-temporal band-limitedness of sound fields (Kolundžija et al., 2009b,c, 2011c). This property allows expressing sound field reproduction as an inversion of the MIMO acoustic channel between a loudspeaker array and a grid of control points constrained by the Nyquist sampling criterion. The use of a MIMO channel inversion based on truncated singular value decomposition (SVD) of the acoustic channel matrix allows for the optimal reproduction accuracy subject to a limited effort constraint. We present a detailed procedure for obtaining loudspeaker driving signals that involves selection of active loudspeakers, 4 Introduction coverage of the listening area with control points, and frequency-domain FIR filter design. Through extensive simulations comparing SFR with Wave Field Synthesis, we show that on average SFR provides higher sound field reproduction accuracy. In Chapter 7, we consider the problem of multiple-loudspeaker low-frequency room equalization for a wide listening area, where the equalized loudspeaker is helped using the remaining ones. Using a spatial discretization of the listening area, we formulate the problem as a multipoint error minimization between desired and synthesized magnitude frequency responses. The desired response and cost function are formulated with a goal of capturing the room’s spectral power profile, and penalizing strong resonances. Considering physical and psychoacoustical observations, we argue for the use of gain-limited, short, and well-localized equalization filters, with an additional delay for loudspeakers that help the equalized one. We propose an optimization framework for computing room equalization filters, where the mentioned filter requirements are incorporated as convex constraints. We verify the effectiveness of our equalization approach through simulations. Chapter 8 gives a summary of the thesis and some directions for future work. Chapter 2 Acoustics Fundamentals In this chapter, we give an overview of the acoustic concepts that are used throughout the remaining chapters of this thesis. Without trying to thoroughly present strict derivations of every concept, we start from the fundamental principles, meaning fundamental acoustic equations, and uncover ways that lead from those fundamentals to some of the more sophisticated acoustic concepts, including the acoustic integral equations, plane wave decomposition, radiation and scattering from cylindrical and spherical bodies, and some important aspects of the sound fields formed in rooms. This chapter is organized as follows. Section 2.1 presents two important equations of linear acoustics—Euler’s equation and the acoustic wave equation. The related concepts of a point source and Green’s function are described in Section 2.2. Section 2.3 presents Helmholtz integral equation and Rayleigh’s integrals, obtained by introducing integral theorems to the realm of acoustics. The concept of plane waves, used very often in analyses of sound fields from distant sources, is discussed in Section 2.4. Section 2.5 presents the solution to the wave equation in cylindrical coordinates, and analyzes the problem of sound radiation from cylindrical geometries. Similarly, Section 2.6 discusses the solution to the wave equation in spherical coordinates, and the related problems of sound scattering and radiation from spheres. Section 2.7 presents some fundamental properties of sound fields in rooms, which one needs to be aware of when designing systems that alter the effects of room acoustics. Finally, Section 2.8 presents spatial filtering or beamforming—a technique used for controlling the spatial characteristic of sound acquisition or radiation with an array of microphones or loudspeakers, respectively. 2.1 Fundamental acoustic equations In this section, we present two fundamental equations of linear acoustics. These equations are obtained under the simplifying assumptions that the non-viscous fluid medium outside the region occupied by sources is homogeneous and at rest, and that the acoustic pressure and particle disturbances generated outside the source are small enough that one can take the first-order approximations of the general equations of fluid dynamics. 5 Acoustics Fundamentals 6 2.1.1 Euler’s equation Euler’s equation, obtained by applying the Newton’s second law on an infinitesimal fluid volume, relates the velocity vector of the fluid particles u(r, t), and the sound pressure p(r, t) in the point r. It is given by (Morse and Ingard, 1968; Williams, 1999) ∂ ρ0 u(r, t) = −∇p(r, t) , (2.1) ∂t where ρ0 is the fluid density at equilibrium, and ∇p(r, t) is the spatial gradient of the sound pressure. The spatial gradient in Cartesian coordinates has the form ∇p(x, y, z, t) = ∂ ∂ ∂ ex + ey + ez , ∂x ∂y ∂z (2.2) where ex , ey , and ez are unit vectors that point in positive coordinate directions. The forms of the spatial gradient in cylindrical and spherical coordinates, given by (2.34) and (2.62), respectively, are derived by using coordinate system transforms (Morse and Ingard, 1968). The acoustic phenomena are commonly analyzed in the steady state. The Fourier transform is the tool for interchangeably switching between the time and frequency domain, where the latter serves for the steady-state analysis. The frequency-domain version of Euler’s equation is obtained by applying the Fourier transform to both sides of (2.1), and it takes the form U (r, ω) = − 2.1.2 1 ∇P (r, ω) . iωρ0 (2.3) The acoustic wave equation If the considered source-free homogeneous non-viscous fluid medium is initially at rest, and fluid pressure and particle disturbances are small, the propagation of sound waves in the medium is governed by the homogeneous acoustic wave equation. The linear homogeneous acoustic wave equation in the time domain reads (Morse and Ingard, 1968) 1 ∂ 2 p(r, t) = 0, (2.4) ∇2 p(r, t) − 2 c ∂t2 where c is the speed of sound in the fluid medium,1 and ∇2 (r, t) = div grad p(r, t) is the Laplacian of the sound pressure. As will be seen in later sections of this chapter, the Laplacian takes on different forms in different coordinate systems. Like the Euler’s equation, the acoustic wave equation is commonly used in its steady-state, frequency-domain form, better known as Helmholtz equation. The homogeneous Helmholtz equation is obtained by applying the Fourier transform to (2.4), and has the form ∇2 P (r, ω) + k 2 P (r, ω) = 0 , (2.5) 1 The speed of sound in air at 20◦ C is 343 m/s. 2.2 Point source and Green’s function 7 where k denotes the acoustic wave number that depends on the angular frequency ω, or equivalently on the acoustic wave length λ through k= ω 2π = . c λ (2.6) The general solutions of the acoustic wave equation can take on different forms. As will be seen later in the chapter, the solutions are usually expressed as a superposition of eigenfunctions of the homogeneous wave equation,2 and those are determined by the geometry and the boundary conditions of the considered problem. 2.2 2.2.1 Point source and Green’s function Point source A point source is a model of an infinitesimally small volume acting as a source of energy that produces acoustic waves. For a given frequency ω, it is defined as the solution of the inhomogeneous Helmholtz equation for an unbounded medium (Morse and Ingard, 1968) ∇2 gω (r|r 0 ) + k 2 gω (r|r 0 ) = −δ(r − r 0 ) , (2.7) where r denotes the observation point, r 0 the position of the point source, and δ(r−r 0 ) is the three-dimensional Dirac delta function. 2.2.2 Green’s function The solution of (2.7) is also known under the name free-field Green’s function, and it represents the spatial kernel of the wave equation. The free-field Green’s function has the form (Morse and Ingard, 1968) 0 eikkr−r k . gω (r|r ) = 4πkr − r 0 k 0 (2.8) Note that a general solution of the inhomogeneous Helmholtz equation has the form Gω (r|r 0 ) = gω (r|r 0 ) + χω (r) , (2.9) where χω (r) is any solution to the homogeneous Helmholtz equation (2.5). The solution χω (r) is added to the free-field Green’s function in order to satisfy some predefined boundary conditions. 2.2.3 Time-dependent Green’s function The time-dependent free-field Green’s function is the solution of the inhomogeneous acoustic wave equation (Morse and Ingard, 1968) ∇2 g(r, t|r 0 , t0 ) − 1 ∂ 2 g(r, t|r 0 , t0 ) = −δ(r − r 0 )δ(t − t0 ) , c2 ∂t2 (2.10) 2 In many cases, when steady-state analysis is used, the solutions are expressed in the frequency domain as a superposition of the eigenfunctions of the homogeneous Helmholtz equation. Acoustics Fundamentals 8 where the expression on the right denotes a pulse wave travelling from a point source at r 0 and starting at t0 . The time-dependent free-field Green’s function takes the following form: kr − r 0 k 1 0 0 0 g(r, t|r , t ) = δ t−t − . (2.11) 4πkr − r 0 k c Without loss of generality, the pulse-starting instant t0 can be made equal to zero, giving the time-dependent Green’s function the form 1 kr − r 0 k gr|r0 (t) = g(r, t|r 0 , 0) = δ t − , (2.12) 4πkr − r 0 k c which can be viewed as the free-field acoustic channel impulse response between two points, r 0 and r. 2.2.4 General solutions of the acoustic wave equation If an arbitrary simple-harmonic source distribution F (r, ω) radiates sound waves into a fluid medium occupying a volume V0 , the sound pressure distribution inside the volume V0 and on its boundary surface S0 satisfies the inhomogeneous Helmholtz equation ∇2 P (r, ω) + k 2 P (r, ω) = −F (r, ω) , (2.13) The solution P (r, ω) of (2.13) is obtained by combining (2.13) and (2.7), and has the form (Morse and Ingard, 1968) ZZZ P (r, ω) = Gω (r|r0 ) S(r0 , ω) dV0 V0 ZZ ∂ ∂ + P (r, ω) − P (r, ω) Gω (r|r0 ) dS0 , (2.14) Gω (r|r0 ) ∂n0 ∂n0 S0 where n0 denotes an outward-pointing normal to the surface S0 . The sound pressure distribution defined by (2.14) emanates from a distribution of sources and reflections from the volume boundary surface. In case of an unbounded medium, the second term in (2.14) disappears, and the sound pressure distribution depends only on the active sound sources ZZZ P (r, ω) = Gω (r|rs ) S(rs , ω) dV0 . (2.15) V0 If the source distribution is defined as a function f (r, t) of space and time, the spatio-temporal sound pressure distribution is obtained using the time-dependent Green’s function. For instance, in the case of an unbounded medium, the sound pressure distribution is obtained as follows: ZZZ Z ∞ p(r, t) = f (r0 , t0 ) g(r, t|r0 , t0 ) dt0 dV0 . (2.16) V0 −∞ 2.3 Helmholtz integral equation 2.3 9 Helmholtz integral equation Helmholtz integral equation (HIE) relates the pressure inside a source-free volume to the pressure and normal component of the particle velocity on the volume’s boundary surface. Broadly speaking, there are two types of HIE: interior and exterior. Figure 2.1: The haded regions represent the source-free volume V where the sound pressure field is computed. (a) Interior HIE illustration, where the sound sources are outside of V ; (b) Illustration of exterior HIE, where the volume V encloses the sound sources and spreads to infinity. In the interior HIE, illustrated in Figure 2.1(a), one is interested in the sound pressure field inside of a finite source-free volume V , while the radiating sound sources are located outside of V . On the other hand, exterior HIE, shown in Figure 2.1(b), relates to problems where a source-free domain V spreads to infinity, and its interior boundary surfaces ∂V1 , ∂V2 , etc. enclose the sound sources that evoke the analyzed field. Also, the boundary surface ∂V represents the union of all source-enclosing boundary surfaces ∂Vi and the space-enclosing infinite boundary surface ∂V∞ : ∂V = ∂V1 ∪ ∂V2 ∪ · · · ∪ ∂V∞ . Both the interior and exterior HIEs are derived using Green’s second identity and the inhomogeneous Helmholtz equation, and have the following form (Williams, 1999): ZZ ∂ 0 P (r , ω) = P (r, ω) Gω (r|r 0 ) − iρ0 ω Gω (r|r 0 ) Vn (r, ω) d(∂V ) . (2.17) ∂n ∂V In (2.17), ∂V denotes the bounding surface of the volume V , Gω (r|r 0 ) is Green’s function, Vn (r, ω) is the projection of the particle velocity vector onto the inward ∂ pointing boundary surface normal n, and ∂n stands for the operation of taking a directional derivative along the direction of n. 2.3.1 Rayleigh’s integrals Rayleigh’s integrals are a special form of HIE, obtained from (2.17) by taking a particular volume V and a suitable choice of Green’s function Gω (r|r 0 ). Acoustics Fundamentals 10 Under the assumption that all radiating sound sources are located in the halfspace z < 0, the interior HIE is applied to the volume V that covers the half-space z > 0. In these circumstances, the integration surface ∂V includes the plane z = 0 and the infinite hemisphere centered at the origin and “enclosing” the half space z > 0. Figure 2.2 illustrates the particular domain used for deriving Rayleigh’s integrals. Figure 2.2: Shaded regions represent the particular half-space source-free volume V used for deriving Rayleigh’s integrals. Assuming that Green’s function satisfies the Sommerfeld radiation condition3 ∂ lim krk − ik Gω (r|r 0 ) = 0 , (2.18) ∂krk krk→∞ the surface integral over the bounding surface ∂V∞ vanishes, turning the HIE into ZZ ∂ 0 P (r , ω) = P (r, ω) Gω (r|r 0 ) − iρ0 ω Gω (r|r 0 ) Vn (r, ω) d(∂Vxy ) , (2.19) ∂n ∂Vxy where ∂Vxy denotes the xy-plane. Rayleigh’s I integral Rayleigh’s I integral is a particular form of (2.19), where Green’s function Gω (r|r 0 ) ∂ Gω (r|r 0 ) vanishes in the xy-plane. This is chosen such that the term P (r, ω) ∂n particular Green’s function has the form 0 0 eikkr−r k eikkr−rM k Gω (r|r ) = + 0 k , 4πkr − r 0 k 4πkr − rM 0 (2.20) 0 where the vector rM = (x, y, −z) is the mirror image of r 0 = (x, y, z) in the xy-plane. After substituting (2.20) in (2.19), Rayleigh’s I integral takes the form ZZ P (r 0 , ω) = −2 iρ0 ω Gω (r|r 0 ) Vn (r, ω) d(∂Vxy ) . (2.21) ∂Vxy 3 The Sommerfeld radiation condition in two dimensions reads » – 1 ∂ lim krk 2 − ik Gω (r|r 0 ) = 0 . krk→∞ ∂krk 2.4 Plane waves 11 Rayleigh’s II integral Unlike Rayleigh’s I integral, where Green’s function derivative term was made zero in the xy-plane, for obtaining Rayleigh’s II integral, Green’s function Gω (r|r 0 ) is chosen such that the term iρ0 ω Gω (r|r 0 ) Vn (r, ω) vanishes in the xy-plane. This choice of Green’s function is given by 0 Gω (r|r 0 ) = 0 eikkr−r k eikkr−rM k − 0 k , 4πkr − r 0 k 4πkr − rM (2.22) 0 where the vector rM is the same mirror image as in (2.20). Rayleigh’s II integral is given by ZZ ∂ Gω (r|r 0 ) d(∂Vxy ) P (r 0 , ω) = 2 P (r, ω) ∂n ∂Vxy ZZ 0 eikkr−r k ik = 2 P (r, ω) 1 + kr−r cos ϑ d(∂Vxy ) , (2.23) 0k 4πkr − r 0 k ∂Vxy where ϑ is the angle between the vector r 0 − r and the z axis, as shown in Figure 2.2. Discussion Rayleigh’s I and II integrals offer a practical interpretation, which is the essence of some sound field reproduction approaches, like Wave Field Synthesis discussed in Chapter 6. In case of Rayleigh’s I integral, it can be seen that a sound field emanating from sound sources in the half space z < 0 can be reproduced by driving omnidirectional point sources in the xy-plane with signals proportional to the z-component of the particle velocity vector Vz (r, ω) in the xy-plane. Rayleigh’s II integral can be interpreted slightly differently. There, a sound field from radiating sound sources in the half-space z < 0 can be reproduced by driving dipole4 sources in the xy-plane with signals proportional to the sound pressure P (r, ω) in the xy-plane. 2.4 Plane waves Plane waves are simple-harmonic functions of space and time, obtained by solving the homogeneous acoustic wave equation in Cartesian coordinates, which is given by ∂2p ∂2p ∂2p 1 ∂2p + 2 + 2 − 2 2 = 0. 2 ∂x ∂y ∂z c ∂t (2.24) They are usually expressed as more general, analytic (or complex) functions of the spatial coordinate r = (x, y, z) and time t, p(r, t) = P0 ei(k T r−ωt) , (2.25) 4 A dipole source has a bidirectional (or figure-of-eight) radiation characteristic, i.e., the waves radiated by the source towards direction ϑ relative to its axis have an amplitude proportional to cos ϑ. Acoustics Fundamentals 12 where P0 is a complex amplitude that accounts for the magnitude and phase at the origin, ω is the angular frequency, and k = (kx , ky , kz ) is the wave vector having the notion of a three-dimensional spatial frequency. The wave vector components, kx , ky , and kz , denoting the spatial frequencies along the axes x, y, and z, respectively, satisfy q ω kx2 + ky2 + kz2 = k = . (2.26) c An equivalent way to represent a plane wave is through its Fourier transform with respect to time T P (r, ω) = 2π P0 δ(ω − ω0 ) eik r . (2.27) 2.4.1 Evanescent waves Propagating plane waves, for which all the spatial frequencies kx , ky , and kz have real values, are characterized by harmonic oscillations of sound pressure with the same amplitude at any point in space. However, the wave vectors k = (kx , ky , kz ) with real-valued components are not the only solutions of the homogeneous acoustic wave equation. For instance, even if kx2 + ky2 > k 2 , the acoustic wave equation is satisfied when q kz = i kx2 + ky2 − k 2 = i kz0 . (2.28) This particular case defines an evanescent wave, which takes the form 0 p(r, t) = P0 e−kz z ei(kx x+ky y) . (2.29) The evanescent wave, defined by (2.29), is a plane wave that propagates parallel to the xy plane, in the direction kx ex + ky ey , while its magnitude decays exponentially in coordinate z. The evanescent waves are important in the analysis of vibrating structures and wave transmission and reflection, as they develop close to the surface of a vibrating structure and on boundaries between two differing media. However, in the problems of sound field reproduction or capture of sound waves from distant sources, the spatially ephemeral evanescent waves have less relevance. 2.4.2 The angular spectrum Consider the sound radiation problem used for deriving Rayleigh’s integrals, shown in Figure 2.2, where the radiating sound sources are located in the half-space z < 0, and the sound field is analyzed in the other half-space, z ≥ 0. This case somewhat simplifies the following analysis, but does not sacrifice generality. It follows directly from the definition of the multidimensional Fourier transform that any finite-energy field can be seen as a superposition of plane waves. However, sound fields come with an additional property. In source-free regions, sound fields satisfy the homogeneous Helmholtz equation, which limits the arbitrariness of the spatial frequencies kx , ky , and kz . In particular, given the value of the spatial frequencies along directions x and y, kx and ky , respectively, the spatial frequency in direction z, kz , can not take on an arbitrary value; instead, it needs to satisfy (2.26). Consequently, the plane wave spectrum, also known under the name of angular spectrum, 2.5 Cylindrical waves 13 can be defined as a function P (kx , ky , z, ω) of two spatial frequencies, kx and ky , and spatial coordinate z. The angular spectrum then completely determines the sound pressure field through the inverse Fourier transform given by ZZ 1 P (r, ω) = P (kx , ky , z, ω) ei(kx x+ky y+kz z) dkx dky , (2.30) 4π 2 with kz = q k 2 − kx2 − ky2 . (2.31) For the considered sound radiation problem, it is physically justified to chose positive real values kz for propagating plane waves, as the radiation in the half-space z ≥ 0 can only take place in the positive z direction. It should be noted that (2.30) holds for any z ≥ 0, and that the angular spectrum P (kx , ky , 0, ω), or equivalently the sound pressure field P (x, y, 0, ω) in the plane z = 0, provides a complete knowledge about the sound field in the entire half-space z ≥ 0. The relationship is given by P (kx , ky , z, ω) = P (kx , ky , 0, ω) eikz z , (2.32) where kz is given by (2.31). Naturally, this comes at no surprise, as it was already shown by Rayleigh’s II integral that knowing the sound pressure field in the plane z = 0 completely defines the sound field in the upper half-space, defined by z > 0. 2.5 Cylindrical waves Figure 2.3: Relations between cylindrical and Cartesian coordinates. The cylindrical coordinate system and its relation the Cartesian coordinate system is shown in Figure 2.3. In this section, both the sound pressure and particle velocity are represented as functions of cylindrical coordinates (r, φ, z) and angular frequency ω. Euler’s and Helmholtz equation have the basic forms (2.3) and (2.5), respectively. The Laplace operator in cylindrical coordinates has the form (Morse and Ingard, 1968) 1 ∂ 1 ∂2 ∂2 ∂2 + 2 2+ 2, (2.33) ∇2 = 2 + ∂r r ∂r r ∂φ ∂z Acoustics Fundamentals 14 while the gradient operator reads ∇= ∂ 1 ∂ ∂ er + eφ + ez . ∂r r ∂φ ∂z (2.34) The unit vectors er , eφ , and ez point in the positive coordinate directions. The general frequency-domain solution of the homogeneous acoustic wave equation in cylindrical coordinates can be derived using separation of variables, and has the form (Williams, 1999) P (r, φ, z, ω) = ∞ 1 X inφ e 2π n=−∞ Z ∞h i An (kz , ω) Hn(1) (kr r) + Bn (kz , ω) Hn(2) (kr r) eikz z dkz , (2.35) −∞ where kr = p k 2 − kz2 . (1) In (2.35), the Hankel function of the first kind Hn (kr r) represents an outgoing wave, (2) while the Hankel function of the second kind Hn (kr r) stands for an incoming wave. The coefficients An (kz , ω) and Bn (kz , ω) thus determine the strength of diverging and converging waves, respectively. Their values depend on boundary conditions, which are commonly specified on coordinate surfaces. Equivalently, the steady-state solution of the wave equation in cylindrical coordinates can be expressed using the Bessel Jn (· ) and Neumann Nn (· ) functions as (Williams, 1999) P (r, φ, z, ω) = ∞ 1 X inφ e 2π n=−∞ Z ∞ [Cn (kz , ω) Jn (kr r) + Dn (kz , ω) Nn (kr r)] eikz z dkz , (2.36) −∞ where the relation between the Hankel functions of the first and second kind on one, and Bessel and Neumann functions on the other hand, is given by Hn(1) (x) = Jn (x) + i Nn (x) Hn(2) (x) = Jn (x) − i Nn (x) . The time-domain solution p(r, φ, z, t) of the wave equation in cylindrical coordinates is obtained by applying the inverse temporal Fourier transform to P (r, φ, z, ω) Z ∞ 1 p(r, φ, z, t) = P (r, φ, z, ω) eiωt dω . (2.37) 2π −∞ 2.5.1 Boundary value problems There are two boundary value problems that are of importance in acoustics, and they are depicted in Figure 2.4. In the exterior boundary value problem in cylindrical 2.5 Cylindrical waves 15 Figure 2.4: Boundary value problem in cylindrical coordinates. (a) Interior boundary value problem, where all the sources si are outside the cylindrical boundary surface defined by r = b. (b) Exterior boundary value problem, where all the sources si are inside the cylindrical boundary surface defined by r = a. coordinates, all the sources are within an infinite cylindrical boundary defined by r = a, and the volume of validity of the homogeneous wave equation is defined by r > a. On the other hand, in the interior boundary value problem, all the sources are located outside the cylindrical boundary surface defined by r = b, whose interior is the cylindrical volume of validity of the homogeneous wave equation. For the interior boundary value problem in cylindrical coordinates, the general solution is of the standing-wave type. It is derived from (2.36), taking note of the fact that it needs to be finite at the origin. The solution has the following form: P (r, φ, z, ω) = 1 2π ∞ X einφ Z ∞ Cn (kz , ω) eikz z Jn (kr r) dkz , (2.38) −∞ n=−∞ where function Cn (kz , ω) is determined by the boundary condition on the surface r = b. The general solution to the exterior value problem in cylindrical coordinates must consist of outgoing waves only. It is thus derived from (2.35) by forcing to zero the incoming-wave part, and takes the form P (r, φ, z, ω) = 1 2π ∞ X n=−∞ einφ Z ∞ −∞ An (kz , ω) eikz z Hn(1) (kr r) dkz . (2.39) Specifying the boundary condition on the surface r = a determines the functions An (kz , ω), which allow one to obtain the radiated sound field in the region defined by r > a. Acoustics Fundamentals 16 2.5.2 Helical wave spectrum Consider the exterior boundary value problem on the infinite cylindrical boundary defined by r = a. The steady-state sound pressure field on the boundary is given by P (a, φ, z, ω) = 1 2π ∞ X Z einφ ∞ −∞ n=−∞ An (kz , ω) eikz z Hn(1) (kr a) dkz . (2.40) Taking the Fourier series expansion with respect to angle φ and the Fourier transform with respect to coordinate z of the steady-state sound pressure field P (r, φ, z, ω) yields the following Fourier series/transform pair: Pn (r, kz , ω) = P (r, φ, z, ω) = 1 2π 1 2π Z 2π ∞ Z P (r, φ, z, ω) e−inφ e−ikz z dz dφ 0 −∞ ∞ X inφ Z (2.41) ∞ Pn (r, kz ) eikz z dkz . e (2.42) −∞ n=−∞ By merely comparing (2.40) and (2.42), one can see that Pn (a, kz , ω) = An (kz , ω) Hn(1) (kr a) . (2.43) The term Pn (r, kz , ω) defines the helical wave spectrum of a sound field in cylindrical coordinates. Furthermore, expressing An (kz , ω) from (2.43) and substituting it into (2.38) leads to the following expression: P (r, φ, z, ω) = 1 2π ∞ X einφ n=−∞ Z (1) ∞ Pn (a, kz , ω) −∞ Hn (kr r) (1) Hn (kr a) eikz z dkz . (2.44) Again, a simple comparison of (2.44) with (2.42) reveals the relationship between helical spectra on cylindrical surfaces with different radii, (1) Pn (r, kz , ω) = 2.5.3 Hn (kr r) (1) Hn (kr a) Pn (a, kz , ω) . (2.45) Rayleigh’s integral Using the Euler’s equation in cylindrical coordinates, and a definition of the helical wave particle velocity spectrum analogous to (2.41), one can derive the following relationship between the pressure Pn (r, kz , ω) and radial velocity spectra Ẇn (r, kz , ω) (Williams, 1999): Pn (r, kz , ω) = iρ0 ck Hn (kr r) Ẇn (a, kz , ω) , kr Hn0 (kr a) (2.46) where the superscript in the Hankel function of the first kind was dropped for notational simplicity. 2.5 Cylindrical waves 17 Rayleigh’s I integral formula is obtained by applying the inverse Fourier transform to (2.46), Z ∞ Hn (kr r) ikz z iρ0 ck X inφ ∞ Ẇn (a, kz , ω) P (r, φ, z, ω) = e e dkz , (2.47) 0 2π n=−∞ k r Hn (kr a) −∞ where 1 Ẇn (a, kz , ω) = 2π Z 2π 0 Z ∞ 0 0 ẇ(a, φ0 , z 0 ) e−inφ e−ikz z dz 0 dφ0 . (2.48) −∞ In order to solve for the sound pressure distribution evoked by a particle velocity distribution Ẇn (a, kz ) on the cylindrical surface r = a, one would need to numerically solve the integral in (2.47). However, there is a method to approximately determine the far-field sound pressure distribution which uses the so-called stationary phase integral approximation (Williams, 1999) P (r, φ, θ, ω) ≈ ρ0 c eikr π r ∞ X (−i)n n=−∞ Ẇn (a, k cos θ, ω) inφ e . sin θ Hn0 (ka sin θ) (2.49) The far-field sound pressure decay indicated by (2.49) is not in accordance with the far-field decay rate of helical waves, given by √1r . However, the derivation of (2.49) hinges on the assumed smoothness of Ẇn (a, kz ), which is roughly equivalent to treating the vibrating area of the cylinder as having finite extent, making the sound pressure amplitude decay of 1r less of a surprise (Williams, 1999). 2.5.4 Piston in a cylindrical baffle Figure 2.5: Geometry of a rectangular piston of length 2L and width 2αa on the surface of an infinite cylindrical baffle of radius a. Since the model of a vibrating piston mounted on the surface of an acoustically rigid cylinder is used in Chapter 5, we give a short overview of how to obtain its far-field radiation pattern. Acoustics Fundamentals 18 The piston is modeled as a rectangle folded along the circumference of an infinite cylindrical baffle, as shown in Figure 2.5. The cylindrical baffle has radius a, and the piston has length 2L and width 2aα. The velocity of the piston is denoted by b, while the velocity on the rest of the cylindrical surface is zero due to the baffle’s rigidity. Taking the Fourier series expansion with respect to angle φ and the Fourier transform with respect to spatial coordinate z of the radial velocity W (a, φ, z), one obtains Z L Z a b −inφ eikz z dz , (2.50) e dφ Ẇn (a, kz , ω) = 2π −a −L where it is arbitrarily—but without loss of generality—assumed that the piston is centered at φ = 0. The solution of (2.50) contains a product of two sinc functions Ẇn (a, kz , ω) = 4bαL sinc(nα) sinc(kz L) . 2π (2.51) Substituting (2.51) into the far-field approximation of Rayleigh’s I integral (2.49), and using kz = k cos θ, one obtains the radiated sound pressure in the far field (Williams, 1999): P (r, θ, φ, ω) ≈ ρ0 c eikr 2π 2 r N X (−i)n einφ n=−N 4bαL sinc(nα) sinc(kz L) . sin θ Hn0 (ka sin θ) (2.52) A more detailed analysis of the radiation pattern in the audible frequency range is given in Chapter 5. Here we show approximations of the radiation patterns at very low and high frequencies, which are the two extreme cases in terms of the piston’s ability to reproduce sound directionally. Low-frequency radiation pattern At very low frequencies, the radiation pattern of a piston in a rigid cylindrical baffle can be analyzed using the small-argument behavior of Hn0 (x), given by (Williams, 1999) n+1 in! 2 0 Hn (x) ∼ , πn x where n = 1 2 n=0 . n≥1 It turns out that when ka → 0, the term for n = 0 dominates the sum in (2.52), giving an approximate low-frequency far-field radiation pattern (Williams, 1999) P (r, θ, φ, ω) ≈ − iρ0 ck eikr Q , 4π r (2.53) where Q = 4αaLb defines the volume flow. From (2.53), it is apparent that at low frequencies, the radiation pattern of a vibrating piston in a rigid cylindrical baffle resembles that of a point source. 2.5 Cylindrical waves 19 High-frequency radiation pattern The high-frequency radiation pattern of a piston in a cylindrical baffle can be analyzed by considering the large-argument behavior of Hn0 (ka sin θ), which has the form (Williams, 1999) r 2 Hn0 (ka sin θ) ∼ (−i)n eiπ/4 eika sin θ . (2.54) πka sin θ Using the previous approximation in (2.52) and marginalizing the effect of the term sinc(nα), one gets the following approximation: r ∞ X πka −ika sin θ eikr e sinc(kL cos θ) (2.55) einφ . P (r, θ, φ, ω) ∼ r 2 sin θ n=−∞ The last summation in (2.55) can be represented as the Fourier series expansion of a stream of Diracs, ∞ X einφ = 2π n=−∞ ∞ X m=−∞ δ(φ − 2πm) . Thus, the high-frequency radiation pattern of a piston in a cylindrical baffle collapses to a Dirac function pointing in the azimuthal direction defined by the piston’s center, indicating a single-direction radiation at high frequencies. 2.5.5 Scattering from rigid cylinders The interaction of radiated sound with objects in the acoustic medium is characterized by the changes to the sound field caused by the object, usually modeled through the so-called scattered sound field. Analyzing sound scattering is of high importance in underwater acoustics, but here it is analyzed for its importance in the microphone array design problems, described in Chapter 4. The usual way of analyzing scattering by an object is by considering a sound field consisting of a single plane wave. To simplify the analysis a little, let the plane wave with unit magnitude arrive from angle ϕ = 0, with wave fronts parallel to the z axis. Due to the particular value of the wave vector k, with kz = 0, kx = k, and ky = 0, the incoming wave field has the form Pi (r, φ, z, ω) = eikr cos φ . (2.56) The incoming wave field admits the Jacobi-Anger expansion given by (Abramowitz and Stegun, 1976) ∞ X Pi (r, φ, z, ω) = in Jn (kr) einφ . (2.57) n=0 The sound field Pt (r, φ, z, ω), that results from the scattering of the incoming wave field from a rigid cylinder, can be represented as a sum of the incoming sound field Pi (r, φ, z, ω), and the scattered sound field Ps (r, φ, z, ω), Pt (r, φ, z, ω) = Pi (r, φ, z, ω) + Ps (r, φ, z, ω) . (2.58) Acoustics Fundamentals 20 For an infinite rigid cylindrical scatterer of radius a, the radial component of the particle velocity vector needs to vanish on the cylindrical surface r = a, i.e., Ẇi (a, φ, z, ω) + Ẇs (a, φ, z, ω) = 0 . (2.59) Equivalently, using the Euler’s equation (2.3), the boundary condition (2.59) can be expressed as ∂ (Pi (r, φ, z, ω) + Ps (r, φ, z, ω)) |r=a = 0 . (2.60) ∂r The scattered sound field Ps (r, φ, z, ω) can be modeled as a superposition of outgoing waves only, and can thus be represented using (2.39). After substituting in (2.60) the two models for Pi (r, φ, z, ω) and Ps (r, φ, z, ω), (2.57) and (2.39), and solving, one obtains the following expression describing the total sound field: ∞ X Jn0 (ka) Hn (kr) einφ . Pt (r, φ, z, ω) = i Jn (kr) − 0 H (ka) n n=−∞ 2.6 n (2.61) Spherical waves Figure 2.6: Relations between spherical and Cartesian coordinates. Analyzing acoustic problems in spherical coordinate system is often advantageous, as many finite-size sound radiators or scatterers can be well modeled with spheres. Also, it can often be more practical to build sound measurement and reproduction systems of spherical or approximately spherical shape. The relationship between spherical and Cartesian coordinates, illustrated in Figure 2.6, is given by x = r sin θ cos φ y = r sin θ sin φ z = r cos θ . The general form of Euler’s and Helmholtz equations are given by (2.3) and (2.5), respectively. As before, their coordinate-system-dependence is related to the gradient 2.6 Spherical waves 21 and Laplace operators, respectively, whose forms in spherical coordinates are given by (Morse and Ingard, 1968) ∇ = ∇2 = ∂ 1 ∂ 1 ∂ er + eθ + eφ ∂r r ∂θ r sin θ ∂φ 1 ∂ ∂ 1 1 ∂ ∂2 2 ∂ , r + sin θ + 2 2 2 2 r ∂r ∂r r sin θ ∂θ ∂θ r sin θ ∂φ2 (2.62) (2.63) with er , eφ , and eθ being the unit vectors which point in the positive coordinate directions. The general steady-state solution of the wave equation is obtained through separation of variables, and has the form (Williams, 1999) P (r, θ, φ, ω) = ∞ X n X (2) m Amn (ω) h(1) n (kr) + Bmn (ω) hn (kr) Yn (θ, φ) , (2.64) n=0 m=−n (2) (1) where hn (· ) and hn (· ) are the spherical Hankel functions of the first and second kind, respectively. The former corresponds to the outgoing, while the latter to the incoming spherical waves relative to the origin. The angular functions Ynm (θ, φ) are called the spherical harmonics, and are defined as s (2n + 1) (n − m)! m m Yn (θ, φ) , P (cos θ) einφ , (2.65) 4π (n + m)! n where Pnm (cos θ) denote the associated Legendre functions (see Arfken et al., 1985). Equivalently, one can express the standing-wave-type general steady-state solution of the wave equation in terms of the spherical Bessel and spherical Neumann functions, jn (· ) and yn (· ), respectively. The solution of the standing-wave type has the form (Williams, 1999) P (r, θ, φ, ω) = ∞ X n X (Cmn (ω) jn (kr) + Dmn (ω) yn (kr)) Ynm (θ, φ) . (2.66) n=0 m=−n The expansion parameters Amn (ω), Bmn (ω), Cmn (ω), Dmn (ω), which in both solution types depend on the specified boundary conditions, satisfy the following relationship: 2.6.1 Cmn (ω) = Dmn (ω) = 1 2 (Amn (ω) i 2 (Amn (ω) + Bmn (ω)) − Bmn (ω)) . Boundary value problems The exterior boundary value problem is closely related to radiation from compact bodies. It gives a solution to the sound pressure field from a source (or a number of sources) enclosed by the sphere of a given radius a that is centered at the origin, as shown in Figure 2.7(b). The solution of the exterior boundary value problem is of the traveling-wave-type, given by (2.64), where the incoming wave part vanishes: P (r, θ, φ, ω) = ∞ X n X n=0 m=−n m Amn (ω) h(1) n (kr) Yn (θ, φ) . (2.67) Acoustics Fundamentals 22 Figure 2.7: Boundary value problem in cylindrical coordinates. (a) Interior boundary value problem, where all the sources si are outside the spherical boundary surface defined by r = b. (b) Exterior boundary value problem, where all the sources si are inside the spherical boundary surface defined by r = a. It should be noted that the solution (2.67) is valid only for r ≥ a. One important property of the sound field representation given by (2.67) is the fact that knowing the coefficients Amn (ω) on a separable surface, such as the sphere of radius a, gives the full description of the sound field outside the sphere of radius r = a. For the sphere of radius r = a, the coefficients Amn (ω) can be determined from the knowledge of the sound pressure field P (a, θ, φ, ω) on the sphere. Using the orthonormality of the spherical harmonics, they are obtained through Amn (ω) = 1 (1) hn (ka) Z 2π 0 Z 0 π P (a, θ, φ, ω) Ynm (θ, φ)∗ sin θ dθ dφ . (2.68) In the interior boundary value problem, one is interested in the sound field formed by sound sources outside the sphere of radius b, as illustrated in Figure 2.7(a). The solution of the interior boundary value problem is of the standing-wave type, given by (2.66). The solution needs to be finite at the origin, so only the terms containing the spherical Bessel function jn (· ) can be non-zero. Thus, the solution is of the form P (r, θ, φ, ω) = ∞ X n X n=0 m=−n Cmn (ω) jn (kr) Ynm (θ, φ) . (2.69) 2.6 Spherical waves 2.6.2 23 Spherical wave spectrum Sound pressure spherical wave spectrum Spherical wave spectrum of the sound pressure denotes the spherical harmonics expansion of a sound field on a sphere of a given radius r. It is given by Pmn (r, ω) = 2π Z 0 Z 0 π P (r, θ, φ, ω) Ynm (θ, φ)∗ sin θ dθ dφ . (2.70) The inverse spherical harmonic transform, which gives the sound pressure field from the sound pressure spherical wave spectrum, is given by P (r, θ, φ, ω) = ∞ X n X Pmn (r, ω) Ynm (θ, φ) . (2.71) n=0 m=−n Also, the spherical wave spectra at two radii, r0 and r, are related by (1) Pmn (r, ω) = hn (kr) (1) hn (kr0 ) Pmn (r0 , ω) . (2.72) Velocity spherical wave spectrum One can define the spherical wave spectrum of the particle velocity vector in the same way the sound pressure spherical wave spectrum was defined in (2.70). The relationship between the spherical wave spectra of the sound pressure P (r, θ, φ, ω) and the radial component of the particle velocity vector Ẇ (r, θ, φ, ω) is given by (Williams, 1999) 1 h0n (kr) Ẇmn (r, ω) = Pmn (r0 , ω) . (2.73) iρ0 c hn (kr0 ) (1) In (2.73), hn (· ) was used instead of hn (· ) for notational simplicity. Axisymmetric source An axisymmetric source on a sphere is a useful concept that is later used for analyzing the radiation pattern of a vibrating piston mounted on a spherical body. The axisymmetric source is modeled with an azimuth-independent surface velocity Ẇ (a, θ, φ, ω) = Ẇ (a, θ, ω) . (2.74) In terms of the spherical wave spectrum, this makes the coefficients Amn (ω) in (2.67) vanish for m 6= 0. Using the definition of the spherical harmonics given by (2.65), and the relationship between the spherical wave spectra of sound pressure and velocity, given by (2.73), the radiated sound pressure can be expressed by Z π ∞ X 2n + 1 hn (kr) P (r, θ, ω) = iρ0 c P (cos θ) Ẇ (a, θ0 , ω) Pn (cos θ0 ) sin θ0 dθ0 . 0 (ka) n 2 h 0 n n=0 (2.75) Acoustics Fundamentals 24 Extracting the part which corresponds to the expansion of the surface velocity ẇ(θ, a) in the Legendre polynomial basis, given by Ẇn (a, ω) = 2n + 1 2 Z π Ẇ (a, θ0 , ω) Pn (cos θ0 ) sin θ0 dθ0 , (2.76) 0 and using the orthogonality of Legendre polynomials, one obtains P (r, θ, ω) = iρ0 c ∞ X Ẇn (a, ω) n=0 hn (kr) Pn (cos θ) . h0n (ka) (2.77) Circular piston in a spherical baffle Figure 2.8: Geometry of a circular piston of radius rp = aα on the surface of a spherical baffle of radius a. A circular piston mounted on the pole of a rigid sphere, shown in Figure 2.8, is an example of an axisymmetric source that can model a compact piston loudspeaker. Denoting by b the velocity of the piston, and noting that the baffle is acoustically rigid, the velocity distribution on the surface of the sphere of radius r = a is given by Ẇ (a, θ, ω) = b 0 0≤θ≤α . α<θ≤π (2.78) Computing the Legendre polynomial expansion of Ẇ (a, θ, ω) using (2.76) and the recurrence formula for the Legendre polynomials (Arfken et al., 1985) (2n + 1)Pn (x) = dPn−1 dPn+1 − , dx dx (2.79) and substituting it into (2.77), results in the radiated pressure field (Williams, 1999) P (r, θ, ω) = ∞ iρ0 cb X hn (kr) [Pn−1 (cos α) − Pn+1 (cos α)] 0 Pn (cos θ) . 2 n=0 hn (ka) (2.80) 2.7 Room acoustics 2.6.3 25 Scattering from rigid spheres The procedure used for obtaining the sound field formed by scattering from an infinite rigid cylinder can be followed for the case of a rigid spherical scatterer. Let the plane wave arrive from the direction defined by angles (ϑ, ϕ). The incoming wave field admits the spherical harmonics expansion of the form (Williams, 1999) Pi (r, θ, φ, ω) = 4π ∞ X n=0 in jn (kr) n X Ynm (θ, φ) Ynm (ϑ, ϕ)∗ . (2.81) m=−n The total sound field Pt (r, θ, φ, ω) is a sum of the incoming sound field Pi (r, θ, φ, ω), and the scattered sound field Ps (r, θ, φ, ω): Pt (r, θ, φ, ω) = Pi (r, θ, φ, ω) + Ps (r, θ, φ, ω) . (2.82) Let a be the radius of the spherical scatterer. Due to the nature of the boundary, the radial component of the particle velocity vector needs to vanish on the spherical surface r = a, i.e., Ẇi (a, θ, φ, ω) + Ẇs (a, θ, φ, ω) = 0 . (2.83) Equivalently, using Euler’s equation (2.3), the boundary condition (2.83) can be expressed as ∂ (Pi (r, θ, φ, ω) + Ps (r, θ, φ, ω)) |r=a = 0 . (2.84) ∂r Due to its nature, the scattered sound field Ps (r, θ, φ, ω) can be modeled as a superposition of outgoing waves only. It can thus be represented using (2.67). After substituting the two models for Pi (r, θ, φ, ω) and Ps (r, θ, φ, ω), (2.81) and (2.67), and solving (2.84), one obtains the following expression describing the total sound field: Pt (r, θ, φ, ω) = 4π 2.7 ∞ X X n j 0 (ka) Ynm (θ, φ) Ynm (ϑ, ϕ)∗ . in jn (kr) − n0 hn (kr) h (ka) n m=−n n=0 (2.85) Room acoustics Sound propagation in rooms is a result of the interaction between the active sound sources, room geometry, and the properties of walls, floor, ceilings, and any other objects that occupy the room. In general, this interaction is difficult to model precisely, and even more difficult to solve analytically. However, there are models of sound propagation in rooms which possess acceptable accuracy in targeted frequency ranges. In general, one can roughly identify three frequency ranges with different characteristic behavior of room acoustics: • At low frequencies, which spread roughly up to the Schroeder frequency fS (defined later), the sound propagation in rooms is best described by the wave theory of room acoustics. Acoustics Fundamentals 26 • At medium frequencies, above the Schroeder frequency fS and up to the high frequencies where room dimensions are much larger than the sound wavelength (roughly 4fS ), the statistical model of room acoustics is commonly used. • At very high frequencies, where sound wavelengths are vanishingly small compared to the room dimensions, the model of ray acoustics or geometrical acoustics is the most appropriate. 2.7.1 Wave theory of room acoustics Sound propagation in rooms is characterized by standing wave sound motion. The analysis of a sound field in a room in terms of the room’s normal modes is most appropriate when the sound wavelengths are of the order of magnitude of the room’s dimensions. The homogeneous Helmholtz equation The starting point when analyzing the modal behavior of a room is the homogeneous Helmholtz equation, given by ∇2 Pω (r) + k 2 Pω (r) = 0 , (2.86) with k = ωc . Additionally, the solution of the acoustic wave equation in a room needs to satisfy the boundary conditions on the wall surfaces. These are characterized by the wall impedance, defined by P (rs , ω) Z(rs , ω) = , (2.87) Ẇn (rs , ω) where Ẇn (rs , ω) is the outward-pointing normal component of the particle velocity vector in the surface point rs . Using the definition of the wall impedance, given by (2.87), and Euler’s equation (2.3), the boundary condition can be expressed as Z(rs , ω) ∂ P (rs , ω) + iωρ0 P (rs , ω) = 0 , ∂n(rs ) (2.88) ∂ where ∂n(r P (rs , ω) is the directional derivative of P (rs , ω) along the outwards) pointing surface normal in the point rs . Green’s function of a room It has been shown (e.g., see Morse and Ingard, 1968) that satisfying both the homogeneous Helmholtz equation (2.86) and the wall boundary conditions (2.88) is possible only for a discrete set of values of k, denoted by Kn and called the room eigenvalues. Each eigenvalue Kn yields a solution Ψn (r, ω) called the room eigenfunction or normal mode. These eigenfunctions are orthogonal, and they satisfy ZZZ V Λn for n = m ∗ Ψm (r, ω) Ψn (r, ω) dV = , (2.89) 0 for n 6= m V 2.7 Room acoustics 27 where the integration is done over the entire volume V of the room. By knowing the eigenfunctions of a room, one can express Green’s function of a room, which is the solution of the inhomogeneous Helmholtz equation (2.7), as a series of normal modes X Ψn (r, ω) Ψ∗ (r 0 , ω) n Gω (r|r 0 ) = . (2.90) 2 − k2 ) V Λ (K n n n In general, the eigenvalues Kn of a room are complex quantities, which can be expressed in the form Kn = ωcn + i δcn , (2.91) where δn represents a damping constant. Replacing (2.91) into (2.90) and assuming δn ωn , one obtains (Morse and Ingard, 1968; Kuttruff, 2000) Gω (r|r 0 ) = c2 X n Ψn (r, ω) Ψ∗n (r 0 , ω) . V Λn (ω 2 − ωn2 − 2iδn ωn ) (2.92) ω From (2.90), one can see that as the frequency f = 2π approaches the nth resonant ωn frequency fn = 2π , the sound field in a room is dominated by the nth resonant mode, whose amplitude is proportional to the damping constant δn . Room impulse response The time-dependent Green’s function of a room g(r, t|r 0 , t0 ), also known as the room impulse response (RIR) or acoustic transfer function from r 0 to r, is obtained by applying the inverse Fourier transform to (2.92). The RIR is often expressed more succinctly with gr|r0 (t), where the initial time t0 implicitly takes the value of zero. More often than not, the locations of the source and destination, r 0 and r, respectively, are known from the context, and the RIR is denoted simply by g(t). The RIR between source and destination points can be viewed as an acoustics channel. If a point source located at r 0 emits the signal s(t), the signal observed in the point r is given by Z ∞ sr (t) = s(τ ) g(t − τ ) dτ , (2.93) −∞ which follows from (2.15). Normal modes in a perfectly rigid rectangular room The preceding analysis of the sound motion and the general solution of the wave equation in rooms give an insight into the resonant, standing-wave nature of the sound in rooms. Any additional details of the room acoustics might be blurred by the difficulty to precisely model or measure the reflective properties of walls and the room geometry, which are needed in order to obtain its exact eigenfunction-based Green’s function. Additional knowledge about wave phenomena in rooms can be obtained by analyzing a simple, rectangular room model, where the walls are perfectly rigid; this Acoustics Fundamentals 28 Figure 2.9: Rectangular room of dimensions (Lx , Ly , Lz ). particular case provides a solution in a closed form. Even though rooms in practice are not perfectly rectangular, and even less with perfectly reflecting walls, the properties of rooms met in practice follow the same trends. The model of a rectangular room of size (Lx , Ly , Lz ) is shown in Figure 2.9. The eigenfunctions that satisfy both the homogeneous Helmholtz equation (2.86) and rigid wall boundary conditions ∂ P (rs , ω) = 0 (2.94) ∂n(rs ) are obtained by separation of variables and have the form Ψm (r, ω) = cos(kx,mx x) cos(ky,my y) cos(kz,mz z) . (2.95) The three-dimensional index m = (mx , my , mz ) has integer components, and the eigenvalues Km of the wave equation have a three-dimensional form T T mx my mz Km = kx,mx ky,my kz,mz = π π π . Lx Ly Lz (2.96) The eigenfrequencies fm of the room are real-valued, and are related to the room eigenvalues Km by s 2 2 2 c my mz c mx kKm k = + + . (2.97) fm = 2π 2 Lx Ly Lz Room modal density One parameter of particular interest for the analysis of the steady-state solution of the wave equation in rooms is the number of room modes as a function of frequency Nm (f ). This quantity is determined by noting that the room eigenfrequencies, given by (2.97), form a 3D lattice generated by the basis c c c F = 2Lx 2Ly 2Lz T . (2.98) 2.7 Room acoustics 29 The number of positive eigenfrequencies below the frequency f equals the number of grid points in the first octant which are inside the sphere of radius f centered at the origin. With a small error for not correctly accounting for eigenfrequencies on coordinate surfaces and axes, this number can be approximated with Nm (f ) ≈ 4π f 3 V 3 . 3 c (2.99) Additionally, it is interesting to know the modal density on the frequency axis, denoting the number of modes per unit frequency. The modal density is obtained by differentiating (2.99), and it is given by dNm (f ) f2 = 4π 3 . df c (2.100) Steady-state room acoustics So far, it was shown that the steady-state solution of the wave equation in a room is a superposition of infinitely many room modes. Each mode Ψn (r, ω) becomes dominant as the frequency ω approaches the real part ωn of the corresponding room eigenvalue Kn , pushing the room frequency response to a local maximum. Under the assumption of a small damping constant δn ωn , each modal term in (2.92) can be approximated by a transfer function of a resonant system, yielding (Kuttruff, 2000) Gω (r|r 0 ) = c2 X n Ψn (r, ω) Ψ∗n (r 0 , ω) . V Λn (ω 2 − ωn2 − 2iδn ω) (2.101) In words, the steady-state solution to the room wave equation can be viewed as a combination of resonances, each resonance having a 3 dB half-width5 of (Kuttruff, 2000) δn ∆fn = . (2.102) π The damping constants δn in most rooms are in the range [1, 20] s−1 . Hence, the half-width of the room resonances are of the order of 1 Hz. If one contrasts the half-width of resonances with the modal density given by (2.100), or its inverse, representing the average spacing between two resonances, the following can be observed: • At low frequencies, resonances are well separated, with distances between successive resonant frequencies exceeding their half-bandwidth. • As the frequency increases, the half-width of single room resonances becomes larger than the average distance between successive resonances, causing resonances to overlap. The resulting steady-state acoustic behavior of a room is characterized by a strong interaction of multiple, densely-spaced resonances. 5 As the name suggests, the 3 dB half-width of a resonance is a half of the width of the frequency range where its main lobe is above 50% of its maximum power. Acoustics Fundamentals 30 In order to distinguish between the two frequency regions—the low-frequency region with clear separation of room resonances and the high-frequency region characterized by the interaction of densely-spaced room resonances, Schroeder defined a limiting frequency fs , later named the Schroeder frequency, as the frequency where on average three modal frequencies fall within a single resonance half-width. The Schroeder frequency is given by (Kuttruff, 2000) r RT60 5500 Hz ≈ 2000 fs ≈ p Hz , (2.103) V V hδn i where hδn i is the average value of the damping constant δn , V is the room’s volume, 6.91 is the so-called reverberation time of the room. and RT60 = hδ ni In large halls, the Schroeder frequency is usually below 50 Hz (Kuttruff, 2000), rendering the wave theory approach practically useless. On the other hand, in a small living room of dimensions 6 × 5 × 2.5 and reverberation time RT60 = 0.5 s, the Schroeder frequency is around 163 Hz. According to (2.99), the same room has around 32 resonant frequencies below Schroeder frequency, which dominate its low-frequency behavior. 2.7.2 Statistical room acoustics At any frequency higher than the Schroeder frequency, the sound pressure field in a room is not affected by one dominant room mode; instead, it is influenced by many overlapping modal resonances. Since every mode of a room results from a different interaction of sound waves with the room, it can be assumed that modes are independent and with random phases at any frequency. Consequently, the sound pressure field at any point in the room has real and imaginary parts that can be modeled as centered Gaussian random variables, and a magnitude which follows Rayleigh’s distribution. Furthermore, the frequency characteristic of the sound field Pr (ω) at any point r in a room can be modeled as a Gaussian random process for ω ≥ 2πfS . The same can be said for the spatial dependence of the sound pressure field Pω (r) in the said frequency range. If in addition to modal overlap, one assumes that no active sound sources or room boundaries are close to points of observation and the energy density flow does not exhibit a preferred direction, then one talks about a diffuse sound field. In a diffuse sound field, the sound propagation at any point is purely isotropic. Spatial coherence function One characteristic of a diffuse complex sound field is its spatial coherence function Φω (r), which represents the correlation coefficient of the sound pressure between two points separated by the vector r. The spatial coherence function is given by (Cook et al., 1955) E [P (r 0 , ω) P ∗ (r 0 + r, ω)] sin(k krk) = , (2.104) Φω (r) = E [|P (r 0 , ω)|2 ] k krk where E[· ] denotes the expectation operator. Considering the spatial coherence function of a diffuse sound field, one can see that its absolute value drops off as a function of distance r. Being Gaussian, the 2.7 Room acoustics 31 real and imaginary parts of the sound field in two points at a given frequency ω become roughly independent when the value of the absolute spatial coherence drops below 0.2. The corresponding critical distance, called the reach of causality, is given by (Kuttruff, 2000) λ rcorr ≈ . (2.105) π Furthermore, one can analyze the magnitude of the sound pressure field for a given frequency ω as a function of the spatial location, Pω (r), and observe a random pattern characterized by an interchange of maxima and minima. The average distance of adjacent maxima of a diffuse sound field is given by (Kuttruff, 2000) hrmax i ≈ 0.79 λ . (2.106) Frequency coherence function Similarly to the spatial coherence function of a diffuse sound field, one can define its frequency coherence function Φr (ω) as the correlation coefficient of the sound pressure in a point r between two frequencies separated by ∆ω (Kuttruff, 2000): Φr (∆ω) = hδn i2 E [P (r, ω) P ∗ (r, ω + ∆ω)] , = E [|P (r, ω)|2 ] hδn i2 + (π∆ω)2 (2.107) where hδn i is the average damping constant. It should be noted that the frequency coherence function Φr (ω) is independent of location, as long as the sound field is diffuse. From the definition of the frequency coherence function, one can derive the critical frequency shift, defined as the frequency difference ∆fcorr for which the sound pressure field at frequencies f and f + ∆fcorr become approximately independent. The critical frequency shift is given by (Kuttruff, 2000) ∆fcorr ≈ 0.64 hδn i . 2.7.3 (2.108) Geometrical acoustics It was already mentioned that above the Schroeder frequency, defined by (2.103), the density of modes become very high so that the modal description of room acoustics becomes of little use. At high frequencies, where acoustics wavelength becomes negligible relative to the room dimensions, one can apply the geometrical, or wave acoustics, similar to the methods used in geometrical optics. The main concept of geometrical acoustics is the abstraction of a sound ray, which represents a portion of a spherical wave that can be visualized as a beam of vanishing aperture. The intensity of sound carried by a sound ray follows a ∼ 1r decay characteristic of a spherical wave. Geometrical acoustics makes additional simplifications related to the interaction of sound rays and walls, and also the mutual interaction of sound rays. In particular, diffraction effects are neglected, and sound rays propagate in straight line segments. Also, the interference between sound rays is not considered, making their superposition purely additive in intensity or energy sense (Kuttruff, 2000). Acoustics Fundamentals 32 The concept of specular reflections, used in geometrical acoustics, can be modeled by an abstraction of the so-called image sources. Although the concept of image sources is of limited use in the general sense, the assumptions of geometrical acoustics, stated above, make it sufficiently accurate in practice (Kuttruff, 2000). Figure 2.10: (Lx , Ly , Lz ). Image source model in a 3D rectangular room of dimensions Similarly to the analysis of modal density, one can analyze the temporal distribution of sound reflections modeled with image sources using the model of a rectangular room, shown in Figure 2.9. For a point source placed at the location rs inside the room, the image sources can be obtained by recursively mirroring the sources around the room walls, starting from the original source position rs . The set of image source points in space forms a regular pattern that can be described by a 3D lattice, shown in Figure 2.10. It should be noted that each image source can account for directivity of the original sound source, as well as the energy losses due to absorption coefficients of the walls. Each sound ray reaching the destination after a number of reflections can be seen as coming from one of the image sources. It is characterized by its delay, energy, and direction. If absorption or reflection characteristics of walls are independent of frequency, then according to the previously described geometrical acoustics model, the room impulse response is given by X An δ(t − tn ) , (2.109) g(t) = n 2.7 Room acoustics 33 where An is an amplitude factor accounting for energy losses, and tn is the delay of the nth sound ray. A temporal and energy distribution of a room impulse response is commonly depicted using a reflection diagram or echogram, such as the one in Figure 2.11. Figure 2.11: Reflection diagram of a room impulse response, with a distinct direct sound and two reflection types: early reflections, up to 80 ms after the direct sound, and densely grouped reflections called the room reverberation that follow the early reflections. Based on a reflection diagram, one can distinguish three parts of a room impulse response: • Direct sound, that represents the sound reaching the observation point from the source without going through any interaction with the room boundaries. • Early reflections, which denote the first few, sparsely-spaced strong reflections, usually from the closest image sources and within 80 ms after the direct sound. The early reflections have a great influence on sound spaciousness in rooms. • Reverberation, which denotes the late part of the room response, characterized by a dense distribution of low-energy reflections. The late reflections determine the subjective characteristic known as the listener envelopment (Toole, 2008). Reflection density The average temporal density of reflections in a room can be obtained in the same way the frequency density of room modes was obtained. By using the image source model, shown in Figure 2.10, one can estimate the number of reflections up to the time instant t by counting the number of image sources in the sphere of radius r = ct: 4π c3 t3 . (2.110) 3 V Taking the temporal derivative of (2.110), one arrives to the average temporal density of reflections, given by c3 t2 dNr (t) = 4π . (2.111) dt V Nr (t) = Acoustics Fundamentals 34 2.7.4 Reverberation time The early works on architectural acoustics by Sabine (Kuttruff, 2000) lead to the definition of the reverberation time, which is an important measure of evolution of sound energy in rooms. The reverberation time quantifies the rate of energy decay in a room due to air absorption and wall reflections losses. In order to derive the reverberation time, assume that due to the air absorption, sound energy density decays exponentially with distance r according to e−mr , where m is an attenuation constant. Furthermore, let each reflection from a wall decrease the sound intensity carried by a sound wave by 1 − α, where α is the absorption coefficient of the wall. The evolution of sound energy density is described by w(t) = w0 e−mct (1 − α)n̄t , (2.112) where n̄ is the average number of reflections per unit of time. In case of diffuse sound propagation in a room, the average number of reflection per time can be approximated by (Kuttruff, 2000) cS (2.113) n̄ ≈ 4V where S is the total area of the walls. Computing the reverberation time as the time it takes the energy density w(t) to drop by 60 dB of its initial value w0 , one obtains the Eyring’s reverberation formula, given by (Kuttruff, 2000) RT60 = 0.161 V . 4mV − S ln(1 − α) (2.114) The Sabine’s reverberation formula is similar to the Eyring’s reverberation formula given in (2.114), and it has the form (Kuttruff, 2000) V , (2.115) 4mV + S ᾱ where ᾱ is the average absorption coefficient. The reverberation time, as observed by Sabine, depends on the volume of a room and the absorption properties of air and the room walls. Since both the air and walls are better absorbers at higher frequencies, the reverberation time is frequency dependent and tends to be longer at low frequencies. This is particularly noticeable with low-frequency room modes, where the reverberation time RT60 takes on values much larger than the average reverberation time taken over all frequencies. The reverberation time can be measured by exciting the room with a stationary white noise signal from a single source, and abruptly switching it off. The time it takes for the measured energy to decay to −60 dB of the its steady-state value defines the reverberation time RT60 . The energy decay can be described by the energy decay curve EDC(t) (Kuttruff, 2000), which can be either measured or computed from the room impulse response g(t) using the following formula: ! R∞ 2 g (τ ) dτ t . (2.116) EDC(t) = 10 log10 R ∞ 2 g (τ ) dτ 0 RT60 = 0.161 Figure 2.12 illustrates the energy density curve and the definition of the reverberation time. 2.7 Room acoustics 35 Figure 2.12: Energy density curve in a room and the definition of the reverberation time RT60 as the time it takes the energy density to decrease by 60 dB after an abrupt interruption of the sound source. 2.7.5 Critical distance It was already mentioned that the sound field in a room can be decomposed into the direct and reverberant component, Pd (r, ω) and Pr (r, ω), respectively. The direct sound conveys all the properties of free-field sound propagation from a point source. The energy density Wd (r, ω) of the direct sound field is related to the distance d between the source rs and the observation point r. Assuming the source emits sound waves with constant power P (ω), the energy density is given by (Morse and Ingard, 1968; Kuttruff, 2000) Wd (d, ω) = P (ω) . 4πcd2 (2.117) One can also account for a possible directivity of the source, denoted by the gain D, giving the energy density P (ω)D Wd (d, ω) = . (2.118) 4πcd2 Note that for an omnidirectional point source, D = 1. The reverberant sound is commonly modeled as perfectly diffuse. The energy density of a diffuse sound is obtained from the law of sound decay in a room, and for a source radiating at constant power P (ω), it is given by (Kuttruff, 2000) Wr (d, ω) = 4 P (ω) cA (2.119) where A is the so-called equivalent absorption area of the room. Note that the energy density in a diffuse sound field is isotropic. The critical distance or diffuse-field distance defines the distance from the source at which the energy densities of the direct and reverberant sound fields become equal, Acoustics Fundamentals 36 Figure 2.13: Distance-dependence of energy densities of direct and reverberant sound fields, Wd (d) and Wr (d), respectively, and the critical distance dc where the two are equal. as illustrated in Figure 2.13. It is obtained by combining (2.118) and (2.119), and is given by (Kuttruff, 2000) r dc = DA [m] ≈ 0.1 16 π r DV [m] . π RT60 (2.120) The critical distance defines the reach of the direct sound, as up to the distance dc from the source, the sound field is dominated by the direct sound component. It should be mentioned that the above expression for the critical distance has its limitations, since it is developed using the reverberation model that uses low wall absorption coefficients. In practice, this is often not the case, and (2.120) often underestimates the true critical distance value. In rooms where speech communication takes place, such as classrooms, office spaces, lecture halls etc., the speech intelligibility is highly dependent on the distance from the source. Investigations have shown that speech intelligibility decreases as the receiver moves away from the source, and that starting at the critical distance and moving further, it remains roughly constant (Peutz, 1971). 2.8 Acoustic beamforming In this section, we present the fundamentals of spatial filtering or beamforming (e.g., see Van Veen and Buckley, 1988), which is used throughout Chapters 3, 4, and 5. In essence, beamforming is a signal processing technique used to control the spatial aspect of wave radiation or acquisition performed by an array of transducers.6 Thus, a beamforming problem has the following three components: 6 In acoustic applications, transducers can either be microphones or loudspeakers. 2.8 Acoustic beamforming 37 • A medium where the physical field of interest propagates, which usually refers to an acoustic or electromagnetic channel. The field propagation in the medium is characterized by Green’s function Gω (r|r 0 ). • An array of transducers used for acquiring or generating the physical field. • A transducer signal pre- or post-processor used for controlling the spatial aspects of physical field generation or acquisition, respectively. The system comprising the last two blocks, i.e., the array of transducers and the attached signal processor, is denoted as a beamformer. Intrinsically related to beamforming is a property of a transducer (or beamformer) called the directional response, which quantifies the combined effect of the transducer (or beamformer) and the propagation medium. Assuming a transducer is centered at the origin and its axis aligned with the xaxis, the directional response d(r, ω) captures the propagation characteristic from the point r all the way to the transducer’s output. Hence, it combines the propagation characteristic of the medium, captured by Green’s function, and the spatial properties of the transduction process. Note that one is usually interested in far-field characteristics, where the source is located in the infinity, r = ∞, and effectively radiates plane waves. In the farfield case, the directional response is represented more simply by d(θ, φ, ω), and if one is interested in the directional response only in the xy-plane, it takes the form d(φ, ω). As an example, an ideal pressure microphone located at the origin has the directional response d(θ, φ, ω) = 1, while a dipole microphone located at the origin has the directional response d(θ, φ, ω) = sin θ cos φ. When designing a beamformer, one usually defines a desired directional response d(r, ω) on an enclosing control circle or sphere, depending on the problem geometry. The circle or sphere can be of infinite radius, in case of plane-wave incidence, or a finite radius rs , if point sources are considered.7 Unless stated otherwise, the desired directional response is frequency-independent, and is denoted by d(θ, φ) in the far-field case and by d(r) in the near-field case. Consider a transducer array with M transducers positioned at rm and characterized by their directional responses dm (r, ω), m = 1, . . . , M . Note that, due to transducers’ translations and rotations, the response from each transducer to the control surface, Gm (r, ω), gets modified. As an example, consider a beamformer design in the xy-plane, and a transducer located at rm = (rm , φm ) with a far-field directional response dm (φ). The “propagation-transduction” channel Gm (ϕ, ω) between the direction ϕ and the output of transducer m then takes the form Gm (ϕ, ω) = dm (ϕ − φm , ω) e−ikrm cos(ϕ−φm ) , (2.121) with k = ωc . An acoustic beamformer design is defined as a problem of finding a linear combination of transducers’ responses Gm (r, ω) that best approximates the desired directional response d(r, ω) on the control surface at every frequency ω. In optimization theory 7 The finitely-distant source does not have to be a point source, but one needs to be able to obtain its response on a circle or sphere enclosing the designed array. Acoustics Fundamentals 38 parlance, this is a semi-infinite programming problem, which is difficult to solve. Hence, it is usually discretized by considering a finite number Nf of frequencies ωn below the sampling frequency, and a finite number N of control points on the control surface. 2.8.1 Beamformer filter design (a) (b) Figure 2.14: Illustration of microphone (a) and loudspeaker (b) array beamformer design problems. The two beamforming applications dealt with in this thesis are designs of microphone and loudspeaker arrays, illustrated in Figure 2.14. Leaving out the information about the radius in case of finitely-distant control points, let (ϑn , ϕn ) denote the spherical angular coordinates of control point n. Furthermore, denote by Gnm (ω) the channel between the control point (or direction) n and the output of transducer m, i.e. Gnm (ω) = Gm (ϑn , ϕn , ω) . 2.8 Acoustic beamforming 39 The process of obtaining the set of channels Gnm (ω) for transducer m can be viewed as a discretization or sampling of its modified directional response.8 Its discretized version is then given by the vector Vm (ω) = [G1m (ω) G2m (ω) . . . GN m (ω)]T . (2.122) In a similar manner, an acoustic beamformer filter design uses a discretized version of the desired directional response D(ω) = [d(θ1 , φ1 ) d(θ2 , φ2 ) . . . d(θN , φN )]T . (2.123) Denote by Xm (ω) the input signal of transducer m, and let signals from all transducers be aggregated into a vector X(ω) = [X1 (ω) X2 (ω) . . . XM (ω)]T . (2.124) The task of the transducer array is to produce transducer pre- or post-filters H(ω) = [H1 (ω) H2 (ω) . . . HM (ω)]T , (2.125) so that the “virtual transducer”, whose signal is given by Y (ω) = X T (ω) H(ω) , (2.126) has a response that approximates the desired one in the control points. The frequency response of the array from control point n is defined by the following linear combination of frequency responses from control point n to the outputs of all the transducers: Yn (ω) = [Gn1 (ω) Gn2 (ω) . . . GnM (ω)]T H(ω) . (2.127) Thus, the directional response of the array discretized in the control points is given by Y (ω) = [Y1 (ω) Y2 (ω) . . . YN (ω)]T = G(ω) H(ω) , (2.128) where G(ω) is the N × M matrix containing frequency responses of the acoustics channels between the control points and transducers. Given the description of the system setup, the task of designing transducer array filters can be formulated as a constrained optimization problem. The objective function is commonly an error norm between the desired and obtained frequency responses in the control points. It can involve all of the control points, or a subset thereof—such as when the main lobe of the directional response is of high importance, while the side lobes are controlled by constraints. The constraints can be manifold. They can relate to arrays’ response Y (ω) or the filter gains H(ω). 8 When responses are obtained from points at a finite distance, the transducer’s near-field directional response gets discretized. Acoustics Fundamentals 40 The directional response constraints can be of the equality type, where the directional response at a set of control points needs to exactly match the desired one, be it only in magnitude or both in phase and magnitude. One can also utilize inequality constraints for the directional response. For instance, the strength of the response in certain control points can be limited in order to achieve the so-called side-lobe suppression. In the transducer design problems we present here, the focus is on achieving a desired response at all control points. Therefore, the directional response constraints are not used. However, we will use a constraint on the directional response in Chapter 5 that gives guarantees on the response in the beamformer’s look direction. Filter gains are usually constrained by limiting their l2 -norm, which is related to the so-called white noise gain (Cox et al., 1987). In addition, one can limit the gain of each filter separately. The purpose of such constraints is limiting the beamformer’s sensitivity to random errors that can come from transducers’ self noise, miscalibration, placement errors, or computational noise. Complex-gain optimized transducer arrays Let the desired response d(θ, φ) be specified both in terms of amplitude and phase, or as a vector of possibly complex gains. As the notation suggests, it is assumed that the desired response is frequency-independent. Denote by D = [d(θ1 , φ1 ) d(θ2 , φ2 ) . . . d(θN , φN )]T the value of the desired directional response in N control points or directions. Assume that the maximum allowed l2 -norm of the filter vector H(ω) is Hmax at all frequencies. The transducer array design problem can then be stated as the following frequency-domain optimization problem: minimize subject to kG(ω) H(ω) − Dkx kH(ω)k2 ≤ Hmax , (2.129) where x ∈ {2, ∞} is the minimized error norm (i.e., Euclidean or min-max). As previously mentioned, this semi-infinite optimization problem is solved at Nf frequencies ωn . To obtain discrete-time finite impulse response (FIR) filters hi [n] of length Nh , one can use one of the following two options: 1. Separate frequency optimization Solve (2.129) for a uniform grid of Nf = N2h + 1 normalized frequencies9 in the range [0, π], and find FIR filters which best approximate the obtained spectra, either by an inverse DFT (Kolundžija et al., 2009c) or by some frequency-domain filter design procedure (Oppenheim and Schafer, 1989; Berchin, 2007). 2. Direct time-domain FIR filter computation (Lebret and Boyd, 1997; Yan and Ma, 2005) 9 It is assumed that the computed filters are real and have a conjugate-symmetric spectrum. 2.8 Acoustic beamforming 41 Denote by vector hm the Nh coefficients of the impulse response hm [i] of the filter used by transducer m. Let the vector h contain all impulse responses hm : h = [hT1 hT2 . . . hTM ]T . (2.130) The vector of filters’ frequency responses for a given normalized frequency ω is then given by H(ω) = V (ω) h , (2.131) where the matrix V (ω) has entries defined by v(ω) 0Nh ×1 0Nh ×1 v(ω) V (ω) = IM ×M ⊗ v T (ω) = . .. .. . 0Nh ×1 0Nh ×1 ... ... .. . 0Nh ×1 0Nh ×1 .. . ... v(ω) T , (2.132) with v(ω) = [1 e−iω . . . e−i(Nh −1)ω ]T 0Nh ×1 = T [0 | 0 {z. . . 0}] . (2.133) Nh To obtain the filter coefficients at a single normalized frequency ω, one needs to replace (2.131) in (2.129), giving rise to the following formulation: minimize subject to where F (ω) = G(ω) V (ω). kF (ω) h − Dkx kV (ω) hk2 ≤ Hmax . (2.134) In order to solve the optimization problem at Nf normalized frequencies10 ωn , n = 1, . . . , Nf , one needs to stack matrices in order to optimize for all frequencies ωn jointly: F (ω1 ) F (ω2 ) FA = . (2.135) , DA = 1Nf ×1 ⊗ D . .. F (ωNf ) The transducer array design is then stated as follows: minimize subject to kFA h − DA kx k V (ωl ) h k ≤ Hmax , l = 1, . . . , Nh (2.136) Both separate frequency optimization and direct FIR computation can be solved using interior point methods (Boyd and Vandenberghe, 2004), which are efficient algorithms for solving convex programs. The separate frequency optimization has lower complexity, since it deals with solutions of Nh times lower dimension. However, one needs to be careful when choosing filter lengths, since it can in general give frequency responses Hi (ω) which cannot be well approximated by FIR filters of short length Nh . 10 In this case, the number of evaluation frequencies N does not have to be equal to the filter f length Nf . Acoustics Fundamentals 42 Magnitude-optimized transducer arrays In some cases, it is of interest to match a desired directional response in a magnitude sense, without worrying about phase. The previously described optimization procedures optimize the difference between two vectors of complex gains—one of the transducer array and the other of the desired response—implicitly taking the phase into the optimization criterion. In order to optimize the transducer array’s directional response in the magnitude sense, the objective function needs to be changed. In case of a single frequency ω, the optimization problem takes the form: minimize subject to k | G(ω) H(ω) | − | D | kx kH(ω)k2 ≤ Hmax , (2.137) The objective function in (2.137) is not convex and has multiple local minima in general. Hence, conventional tools of convex optimization (Boyd and Vandenberghe, 2004) cannot be used. Instead, one can use the so-called local solutions, which iteratively improve on an initial solution of the array filter coefficients. The quality of the final solution relative to the optimum is then dependent on the initial solution. We present here a variation of a local optimization algorithm by Kassakian (2006), but a similar procedure was proposed by Wang et al. (2003). The algorithm in (Kassakian, 2006) was given for the problem called magnitude least squares (MLS). We present Algorithm 2.1, which has a more general form that allows one to minimize an arbitrary convex norm of the magnitude error, and also include convex constraints. Algorithm 2.1 Minimize the error norm of the directional response’s magnitude (adapted from (Kassakian, 2006)). 1. Choose the solution tolerance 2. Choose the initial solution H(ω) 3. repeat 4. E ← k |G(ω) H(ω)| − |D| kx 5. Compute D̂(ω) such that ∀j ∈ {1, . . . , N } |D̂j (ω)| = |Dj | ∠D̂j (ω) 6. = ∠(G(ω) H(ω))j Solve the following quadratic program minimize subject to k G(ω) H(ω) − D̂(ω) kx kH(ω)k2 ≤ Hmax 7. E 0 ← k |G(ω) H(ω)| − |D| kx 8. until |E 0 − E| < In Algorithm 2.1, x stands for any convex norm, including the most widely used l2 - and l∞ -norms. 2.8 Acoustic beamforming 43 Since | |x| − |y| | ≤ |x − y|, with equality only when the complex numbers x and y have the same argument (phase, in this context), step 6. can only decrease the objective function. Furthermore, since the objective function is non-negative, Algorithm 2.1 provides a solution that lies in a local minimum of the objective function. Algorithm 2.1 adds an additional, outer iteration loop to the complex-gain optimized beamformer computation. The resulting increase in complexity might render the direct FIR filter computation solution prohibitively complex, especially for longer FIR filters. Thus, if complexity is a concern, it is more suitable to use the separate frequency optimization. 44 Acoustics Fundamentals Chapter 3 Microphone Arrays For Directional Sound Capture 3.1 3.1.1 Introduction Background The history of directional microphones goes back to the first half of the twentieth century, with the works of Olson (Olson, 1946, 1967, 1980) figuring most prominently. Olson and his coworkers at RCA Laboratories were seeking a solution to replace pressure-sensitive, omnidirectional1 microphones for sound pickup in motion picture studios. The problem with pressure microphones was due to the requirement of nonintrusive recording, where the microphone would be placed several meters away from the actors. As discussed in Section 2.7, sound pickup at comparable distances may go way past the critical distance of a room, making the recorded sound less natural and notably decreasing speech intelligibility due to the dominance of reflected and reverberant sound. Along with some sophisticated constructions that could acquire sound directionally that did not gain widespread usage, the design which stuck out for its simplicity, adequate form factor, and consistent directional performance over a wide range of frequencies was the pressure gradient or velocity microphone, and variations thereof known as the unidirectional or cardioid microphone (Olson, 1946, 1967, 1980). Cardioid microphones enjoyed great popularity in studio recording for their suppression of sounds in the back half-space, and they are the most widely used directional microphones to date (Olson, 1980; Elko, 2000). Around the same time as Olson, Blumlein started experimenting with what later became popular under the name two channel stereo systems, as a way to reproduce sound beyond the anchored reproduction positions defined by the used loudspeakers (Blumlein, 1931). The law of intensity stereo, expressed through the law of sines or the law of tangents (Pulkki and Karjalainen, 2001), needed an appropriate recording setup which would automatically, in 1 Omnidirectional refers to the fact that the microphone picks up sound in a direction-independent fashion. 45 46 Microphone Arrays For Directional Sound Capture Figure 3.1: Blumlein XY stereo microphone pair consisting of the velocity microphones with dipole directional response pointing towards ±45◦ . The x axis points towards the front. the recording process, mix the incoming sound such that it gets faithfully reproduced over a two-channel stereo system. As a solution, Blumlein came up with the socalled Blumlein XY pair —a technique widely used even today—that consists of two matched velocity microphones pointing towards ±45◦ relative to the front. Blumlein XY pair is illustrated in Figure 3.1. As velocity microphones with bidirectional or figure-of-eight directional response, Blumlein used a pair of closely-spaced pressure microphone capsules, which will be described in Section 3.2. Apart from spot and surround recording, gradient microphones are used for particle velocity and sound intensity measurements under the name of p-p probes (e.g, see Fahy, 1977; Raangs et al., 2003; de Bree et al., 2007; Merimaa, 2002). The more recent research on directional microphones expands into the realm of microphone arrays. What has previously been achieved through careful acoustical design, can to a certain extent be achieved and even extended using multiple microphones and array signal processing techniques. Blumlein pair can be considered one of the earliest and simplest microphone arrays. Olson (1946) described various microphone array configurations, including any-order pressure gradient microphones, combinations thereof called unidirectional microphones, and unidirectional microphones that combine gradient microphones and delay elements. Although he denoted all these different microphone types as gradient microphones, a more recent naming convention classifies as gradient only the microphone arrays that combine pressure elements, while microphone arrays that also employ delay elements are termed differential microphone arrays (Elko, 2000). Differential microphone arrays found applications in hands-free communication (e.g., see Elko, 1996) and hearing aid devices (e.g., see Preves et al., 1998; Geluk and de Klerk, 2001). Microphone arrays capture sound in a directional fashion using beamforming, which can roughly be classified into two main directions. The first category is a classical adaptive beamforming application of acquiring a desired signal, while trying to suppress any noise sources and interferences (Cox et al., 1987). The adaptation is not restricted to undesired sources, but holds for the desired source as well, in the sense that the desired source might change location 3.1 Introduction 47 and the microphone array needs to be able to track it. As the name suggests, adaptive beamforming is a signal-dependent way of capturing sound, and its performance depends on a number of factors, such as the ability to locate the desired source, statistical properties of different sources and mutual dependence between active sound sources. Also, adaptive beamforming is usually performed in the frequency domain, and in general achieves a frequency-dependent directional capture of sound. The other category of microphone arrays, which is treated in detail in Chapter 4, focuses on decomposing the captured sound field in a way that facilitates its analysis. This type of decomposition is also related to designing multichannel sound reproduction strategies, such as for instance ambisonic decoding (Gerzon, 1980a, 1973; Cooper and Shiga, 1972), which is based on matching sound field’s orthogonal harmonic components in a single point. These strategies are discussed in more detail in Chapter 6. Unless stated otherwise, the analyses of microphone arrays consider far-field conditions in the audible frequency range. As it was previously mentioned, far-field conditions correspond to the simplifying assumption where sound sources emit plane waves. Also, the problems of sound field capture and analysis, unlike the radiation problems, are usually concerned with direction of arrival of the incoming sound waves. Thus, wave vector k used in this chapter points toward the direction of arrival of sound waves, and as a consequence, a plane wave has a slightly different form, given by p(r, t) = P0 e−i(k T r+ωt) , where P0 is its complex amplitude, and ω its frequency. 3.1.2 Chapter outline Section 3.2 presents an analysis of a plane-wave sound pressure field viewed as both a spatial and temporal phenomenon, i.e., as a multivariate function of spatial location and time. The presented analysis exposes the operations of taking gradients and directional derivatives of a sound pressure field as combinations of its spatial and temporal derivatives. It also gives a clear interpretation of gradient and differential microphone arrays: the former as devices used for measuring only spatial derivatives, and the latter as devices used for measuring spatio-temporal derivatives of the sound pressure field. In essence, it shows the equivalence of the two microphone array types. Our framework also allows for easy design of differential microphone arrays given a desired directional response. In Section 3.3, we show how a discretized problem of designing a microphone array beamformer can come to the same end. In other words, we show that following the steps of • Assembling a microphone array • Obtaining directional responses of each microphone in a number of directions relative to a single, central reference point • Computing microphone post-filters that optimally synthesize the desired directional response in the control directions 48 Microphone Arrays For Directional Sound Capture can be used to design directional microphones. Since this design procedure can account for non-ideal characteristics of different microphones through measurements, and can incorporate additional constraints such as maximum filter gains, it is more flexible and practical than the conventional theoretical approaches. Conclusions are given in Section 3.4. 3.2 3.2.1 Differential microphone arrays Spatial derivatives of a far-field sound pressure field Assume that a single far-field source emits a simple complex sinusoid with temporal frequency ω. Furthermore, let the formed sound field be represented by a plane wave with wave vector k,2 given by p(r, t) = P0 e−i(k T r+ωt) , (3.1) where r = [x y z]T and k = ωc . Let n = [nx ny nz ]T be an arbitrary unit-norm vector. The spatial derivative of the sound pressure field p(r, t) along the direction n quantifies its directional rate of change along n. It is given by the projection of the sound pressure field’s spatial gradient onto the vector n: ∂ p(r, t) = nT ∇p(r, t) = −i p(r, t) nT k = −ik cos α p(r, t) ∂n (3.2) where α is the angle between vectors k and n. Iterating the operation of taking a directional derivative along n leads to the expression for the nth-order spatial derivative along n, which reads ∂m p(r, t) = (−ik)m (cos α)m p(r, t) . ∂nm (3.3) In expression (3.3), frequency- and angle-dependent terms are clearly separated. The frequency-dependent term (−ik)m indicates a high-pass magnitude frequency characteristic irrespective of angle α, which is specific to an mth-order differentiator. Frequency characteristics for different orders m are shown in Figure 3.2(a). The directional characteristic, determined by angle α, goes from omnidirectional for m = 0, to highly-directional high-order cosine terms, as shown in Figure 3.2(b). 3.2.2 Spatio-temporal derivatives of a far-field sound pressure field Consider again a single-frequency plane wave with wave vector k = [k cos φ sin θ k sin φ sin θ k cos θ]T , 2 We mentioned in the introduction to this chapter that the wave vector k point towards the direction of arrival of sound waves. 3.2 Differential microphone arrays 49 100 80 m=0 m=1 m=2 G [dB] 60 40 20 0 −20 1 2 10 3 10 4 10 10 f [Hz] (a) m=0 m=1 0dB −7.5dB −15dB −22.5dB −30dB m=2 0dB −7.5dB −15dB −22.5dB −30dB 0dB −7.5dB −15dB −22.5dB −30dB (b) Figure 3.2: Directional (a) and frequency (b) characteristics of spatial derivatives of different order m for a plane-wave sound field. which gives rise to the sound pressure field given by (3.1). Being a function of three spatial coordinates, x, y, and z, and a temporal coordinate t, the total gradient of the sound pressure field (3.1) is given by ∇p(x, y, z, t) with k = T = = −ik [cos φ sin θ sin φ sin θ cos θ c] p(x, y, z, t) , ∂p ∂p ∂p ∂p ∂x ∂y ∂z ∂t T (3.4) ω c. First-order spatio-temporal derivative To compute a spatio-temporal derivative, one needs to project the gradient, given by (3.4), onto a vector with both spatial and temporal components. Let us define a somewhat unintuitive notion of spatio-temporal direction with the following unit vector: T u = [ρu cos φu cos θu ρu sin φu cos θu ρu cos θu ut ] , (3.5) where ρu ∈ [0, 1], θu ∈ [0, π], and φu ∈ [0, 2π] define spatial coordinates, and ut ∈ [−1, 1] temporal coordinate of the vector u. Note that having a unit norm implies Microphone Arrays For Directional Sound Capture 50 ρ2u + u2t = 1. Also, the ratio ρuut gives the relation between the spatial and temporal parts of the vector u. The spatio-temporal derivative of a plane-wave sound field along the direction given by the vector u then reads ∂ p(x, y, z, t) ∂u = uT ∇p(x, y, z, t) = −ik (ρu cos α + cut ) p(x, y, z, t) , (3.6) where α is the angle between the spatial directions defined by (θ, φ) and (θu , φu ). As with the spatial derivative, there is a clearly separated high-pass frequency term −ik, that is characteristic to differentiators. The directional characteristic is a combination of the first-order bidirectional characteristic ρu cos α and the omnidirectional characteristic cut . The overall directional response is determined by relative contributions of the two derivatives—spatial and temporal—given by the ratio ρuut . In the two extreme cases, when ρu = 0 and ρu = 1, the directional characteristic is omnidirectional and bidirectional, respectively. The other interesting combinations, such as a cardioid, and the so-called tailed cardioids and sub-cardioids, are defined in Table 3.1 and shown in Figure 3.3. Response type ρu ut Cardioid Sub-cardioid Hyper-cardioid Super-cardioid c (c, 0) 3c √ 3− 3 √ c 3−1 Table 3.1: Some well known first-order directional characteristics expressed through the ratio ρuut of the spatio-temporal derivative. cardioid 0dB −7.5dB −15dB −22.5dB −30dB subcardioid 0dB −7.5dB −15dB −22.5dB −30dB hyper−cardioid 0dB −7.5dB −15dB −22.5dB −30dB super−cardioid 0dB −7.5dB −15dB −22.5dB −30dB Figure 3.3: Directional characteristics of first-order spatio-temporal derivatives of a plane-wave sound field for different ratios ρuut , as given in Table 3.1. Higher-order spatio-temporal derivatives Iterating the operation of taking the spatio-temporal derivative of the sound pressure field along the same spatio-temporal direction u m times gives its mth order spatio- 3.2 Differential microphone arrays Response type Cardioid Hyper-cardioid Super-cardioid ρu 1 /ut1 √ c ( 6√− 1)c √ √ 4− 7+ 8−3 7 √ √ c √ 7−2− 8−3 7 51 ρu 2 /ut2 √ c ( 6√ + 1)c √ √ 4− 7− 8−3 7 √ √ c √ 7−2+ 8−3 7 ∆φ = α1 − α2 0 π 0 Table 3.2: Some well known second-order polar patterns expressed through the ratios ρu /ut and angle differences ∆α of the spatio-temporal gradient. temporal derivative. It has the following form: ∂m p(x, y, z, t) = (−ik)m (ρu cos α + cut )m p(x, y, z, t) . ∂um (3.7) Generalizing further, derivatives along a single direction u can be replaced by spatio-temporal derivatives along multiple directions. Consider an n-tuple of spatiotemporal directions defined by U = (u1 , · · · , um ) (3.8) where each vector ui from the n-tuple is given by T ui = [ρui cos φui sin θui ρui sin φui sin θui ρui cos θui uti ] . (3.9) The mixed spatio-temporal derivative of a plane-wave sound pressure field along directions given by U has the form m Y ∂m m (ρui cos αi + cuti ) , p(x, y, z, t) = (−ik) p(x, y, z, t) ∂u1 . . . ∂um i=1 (3.10) where αi is the angle between directions (θ, φ) and (θui , φui ). Like spatial derivatives, the spatio-temporal derivatives have a high-pass frequency characteristic (−ik)m of an mth-order differentiator, shown in Figure 3.2(a). The directional characteristic is proportional to a linear combination of spatial Qm gradients of different orders, which is seen by expanding the product i=1 (ρui cos αi + cuti ) in (3.10). The same can be said for the term (ρu cos α + cut )m in (3.7), which is a special case of (3.10). As with the first order, the shape of the directional characteristic of a high-order spatio-temporal derivative of a plane-wave sound field is determined by the choice of vectors ui , i.e., the parameters ρui , θui , φui , and uti . Some well known second-order ρ ρ directional patterns, resulting from different choices of ratios uut 1 and uut 2 , and angle 1 2 differences ∆α = α1 − α2 , are given in Table 3.2 and shown in Figure 3.4. 3.2.3 Differential microphone arrays The previously presented theoretical analysis of spatio-temporal derivatives of a planewave sound field serve as a basis for designing gradient and differential microphone arrays. Practical differential microphone arrays are based on approximating derivatives Microphone Arrays For Directional Sound Capture 52 cardioid 0dB −7.5dB −15dB −22.5dB −30dB hyper−cardioid 0dB −7.5dB −15dB −22.5dB −30dB super−cardioid 0dB −7.5dB −15dB −22.5dB −30dB Figure 3.4: Directional characteristics of second-order spatio-temporal derivatives of a plane-wave sound field for different ratios ρuut and angle differences ∆α, as given in Table 3.2. with finite differences. They combine values of the sound pressure field in closelyspaced points in space and time,3 either acoustically (pressure at two faces of a diaphragm, where the acoustic paths to the two faces of a diaphragm have different lengths) or electronically (pressure at different microphones of a microphone array combined with delay elements). In the remainder of this section, we present a few practical differential microphone array realizations which are based on the previous spatio-temporal gradient analysis. First-order differential microphone arrays: cardioid, hyper-cardioid, and super-cardioid Figure 3.5: First-order differential microphone realization using two pressure microphones and a delay element. The first-order directional characteristics of sound field’s spatio-temporal derivatives can be approximated using a finite difference in space and time of a sound pressure field. The concept is illustrated in Figure 3.5, where signals from two closelyspaced microphones, spaced at a distance d, are combined with a delay element. The illustrated device is the simplest first-order differential microphone array. 3 Points are separated by a distance much shorter than the wavelength, and a time much shorter than the period of a plane wave. 3.2 Differential microphone arrays 53 The response to a plane wave (3.1) of the first-order differential microphone array from Figure 3.5 is given by k td pd (t) = −2i sin (d cos α + ctd ) p r, t − , (3.11) 2 2 where k is the wave number, d the inter-microphone distance, td the used delay, and r the position of the microphone array’s center (mid-point between the two microphones). At low frequencies, (3.11) can be approximated by td . (3.12) pd (t) ≈ −ik (d cos α + ctd ) p r, t − 2 From (3.12), one can see that the ratio d/td determines the directional response of a practical differential microphone array in the same way the ratio ρu /ut determines the directional characteristic of the spatio-temporal derivative of a plane wave, given in (3.6). cardioid 0dB −7.5dB −15dB −22.5dB −30dB super−cardioid hyper−cardioid 0dB −7.5dB −15dB −22.5dB −30dB 0dB −7.5dB −15dB −22.5dB −30dB f=300 f=3000 f=7000 Figure 3.6: Directional responses at different frequencies of first-order differential microphone arrays√ realized as shown in Figure 3.5, with d = 2 cm and td = d/c 3−1) √ (cardioid), td = d( (super-cardioid) and td = d/3c (hyper-cardioid). c(3− 3) Figure 3.6 shows directional responses of the practical cardioid, super-cardioid, and hyper-cardioid microphones realized with the microphone combination shown in Figure 3.5, with d = 2 cm. From Figure 3.6, it can be seen that the shape of directional responses of first-order microphone arrays is frequency-dependent, and that it corresponds to the desired responses, shown in Figure 3.3, only at low frequencies. Above the aliasing frequency,4 the directional responses deviate from the desired ones, as can be observed in Figure 3.6 for the frequency f = 7000 Hz. Second-order differential microphone arrays In this part, it is shown how clover-leaf directional responses sin 2α and cos 2α can be obtained in two different ways based on the analysis from the beginning of this section. 4 The aliasing frequency of a first-order differential microphone array is dependent on the intermicrophone distance d and the used delay td . 54 Microphone Arrays For Directional Sound Capture Clover-leaf response sin 2α: quadrupole microphone array. The clover-leaf directional response sin 2α can be represented as a product of directional responses of two plane-wave sound field’s spatial derivatives: the spatial derivative along the axis x, which has a directional response cos α, and the spatial derivative along the axis y, which has a directional response sin α (or cos α − π2 ). As such, the directional response sin 2α can be realized as a cascade of two spatial derivative approximations: first along the axis x, and then along the axis y (or vice versa). Figure 3.7: Quadrupole microphone array used for obtaining a clover-leaf directional response of the form sin 2α. Figure 3.7 illustrates a configuration of four pressure microphones used to approximate the previously described cascade of spatial derivatives of a sound field. Figure 3.8 shows the directional responses at various frequencies of the quadrupole microphone array shown in Figure 3.7, when the inter-microphone distance d = 2 cm is used. 0dB −7.5dB −15dB −22.5dB −30dB f=300 Hz f=3000 Hz f=15000 Hz Figure 3.8: Directional responses at various frequencies of the quadrupole microphone array shown in Figure 3.7, with inter-microphone distance d = 2 cm. Clover-leaf response cos 2α: three-microphone line array. The clover-leaf directional response of the form cos 2α can be represented as or equivalently, as cos 2α = 2 cos2 α − 1 , (3.13) √ √ cos 2α = ( 2 cos α − 1)( 2 cos α + 1) , (3.14) 3.2 Differential microphone arrays 55 which is a product of directional characteristics of two first-order spatio-temporal derivatives of a plane-wave sound pressure field. Consequently, the response cos 2α can be√obtained by a combination of two√spatio-temporal derivatives: one with ρu /ut = − 2, and the other with ρu /ut = √2, or equivalently, two spatio-temporal finite√ differences: the first with d/td = − 2c, and the second with d/td = 2c (or vice versa), as shown in Figure 3.9. Figure 3.9: A line-array with three microphones used to obtain the clover-leaf directional response cos 2α. 0dB −7.5dB −15dB −22.5dB −30dB f = 300 Hz f = 3000 Hz f = 7000 Hz Figure 3.10: Directional responses at various frequencies of the microphone array shown in Figure 3.9, with inter-microphone distance d = 2 cm and inter-microphone delay td = √d2c . Figure 3.10 shows the directional responses at different frequencies of the microphone array shown in Figure 3.9, with the inter-microphone distance d = 2 cm and delay td = √d2c . Like the first-order differential microphone arrays, the second-order differential microphone arrays have a frequency-dependent directional response. It is a good approximation of the desired response only below the aliasing frequency. This can be observed in Figure 3.10, which shows how the shape of the directional response of the microphone array from Figure 3.9 deforms at the frequency f = 7000 Hz. Note that the directional response of the form sin 2α can also be obtained by rotating by 45◦ the microphone array from Figure 3.9. Microphone Arrays For Directional Sound Capture 56 3.3 Directional microphone arrays as acoustic beamformers With the advent of powerful computational devices and the advances in the field of numerical mathematics, the virtue of discretizing continuous problems made it possible to solve, at least approximately, the problems previously considered as hard. The same holds for the problem of microphone array design. We have already seen how beamforming is used for obtaining a desired directional response with an array of transducers, each having its own directional response, possibly different from the others. Suppose that one is given an array of microphones with a task of capturing sound with a desired directional response in the xy-plane. Using the presented spatiotemporal gradient analysis for differential microphone arrays, and assuming directional response’s symmetry relative to the array’s main axis,5 one would need to factor the trigonometric polynomial that describes the desired directional response d(α) = P (cos α) = N X an (cos α)n , (3.15) n=0 and build a differential microphone array cascade that approximates the desired response. In addition to building a said cascade of microphones and delay elements, one needs to post-equalize its high-pass frequency characteristic. Instead of the mentioned set of steps, one could choose to assemble M microphones into a desired topology,6 compute or measure responses from each microphone toward N control points on a large circle enclosing the array,7 and solve the following array design problem minimize k G(ω) H(ω) − D k2 (3.16) subject to |H(ω)| Hmax 1M ×1 , which can be expressed as a quadratically constrained quadratic program and solved using interior point method (Lebret, 1996). In (3.16), G(ω) is an N × M matrix with frequency responses between microphones and control points, D an N × 1 vector containing the desired directional response discretized in the control points, Hmax the maximum allowed filter gain, and H(ω) an M ×1 vector with microphone array beamformer post-filters. Provided that the number of control points is sufficiently large, the obtained microphone post-filters optimally synthesize d(α) given the constraints. In this section, we show how the latter approach for computing a set of beamformer filters can be used to obtain various directional responses with a simple linear microphone array consisting of three uniformly-spaced pressure microphones, shown in Figure 3.11. The microphone array from Figure 3.11 is used to synthesize the following four 5 Symmetry 6 For 7 The relative to the array’s axis makes the directional response an even function. instance, microphones can form a linear of rectangular grid, or be distributed on a circle. number of control points should be larger than the angular bandwidth of d(θ) and also of directional responses of the used microphones. 3.3 Directional microphone arrays as acoustic beamformers 57 Figure 3.11: Directional sound acquisition with a linear microphone array through beamforming. d 1 (φ ) 90 2 120 d 2 (φ ) 90 2 60 1 150 120 30 180 30 180 0 210 330 240 1 150 0 210 60 300 330 240 270 300 270 f=300Hz f=3000Hz f=7000Hz d 3 (φ ) 90 2 120 d 4 (φ ) 90 2 60 1 150 120 30 180 30 180 0 210 330 240 1 150 0 210 60 300 330 240 270 300 270 Figure 3.12: Directional responses of a three-element line microphone array from Figure 3.11 obtained with an optimized beamformer. directional responses: d1 (φ) = 1 d2 (φ) = cos φ d3 (φ) = cos(2φ) d4 (φ) = (cos φ)2 . (3.17) Each directional response is specified relative to the location of the middle microphone (microphone 2). Beamformer filters are computed using (3.16), with Hmax = 20 dB and N = 7 uniformly-spaced azimuthal control directions. The resulting directional responses at different frequencies are shown in Fig- Microphone Arrays For Directional Sound Capture 58 ure 3.12, while the same directional responses normalized by their maximum values are shown in Figure 3.13. d 1 (φ ) d 2 (φ ) 0dB −7.5dB −15dB −22.5dB −30dB 0dB −7.5dB −15dB −22.5dB −30dB f=300Hz f=3000Hz d 3 (φ ) 0dB −7.5dB −15dB −22.5dB −30dB f=7000Hz d 4 (φ ) 0dB −7.5dB −15dB −22.5dB −30dB Figure 3.13: Directional responses of a three-element line microphone array from Figure 3.11 obtained with an optimized beamformer. Directional responses are normalized such that the maximum value corresponds to 0 dB. As one would expect, the best way to obtain the omnidirectional response d1 (φ) in the center of the array is by taking the signal from the middle microphone, which is what the beamformer optimization procedure ends up doing. This is verified in Figure 3.14(a). The obtained directional response corresponds to the desired one at all frequencies. The dipole response d2 (φ) in the center of the array can be obtained by combining the outer two pressure microphones. Inspecting the frequency responses of the microphone post-filters H1 (ω), H2 (ω), and H3 (ω) in Figure 3.14(b), one can see that the beamformer design effectively does this. The high-frequency directional response degenerates due to the high-order aliased harmonic terms, and the aliasing frequency is determined by the distance 2d between microphones 1 and 3. The second order circular harmonic response d3 (φ) requires combining all three microphones, as with the differential microphone array described in Section 3.2.3. This can be seen in Figure 3.14(c). Figure 3.13 shows that the obtained directional response has a correct shape at all inspected frequencies. However, directional response at low frequencies is highly attenuated due to the maximum filter gain constraint, as can be 3.4 Conclusions 59 observed in Figure 3.12. Finally, the directional response d4 (φ) is matched well in shape at high frequencies. Since it can be decomposed into a sum of an omnidirectional and a second-order term, (cos φ)2 = 1 + cos(2φ) , 2 (3.18) the low-frequency directional response obtained with (3.16) can only match the omnidirectional part due to the maximum-gain constraint, which is apparent in Figure 3.12. From Figure 3.14(d), one can see that all three microphones are used, and that the constraint is effective up to above 1.5 kHz. 3.3.1 Discussion In the examples shown for a simple three-microphone line array, it is evident that one can approach the problem of directional sound acquisition in the discrete-space or discrete-direction domain, and obtain optimal solutions with a convex optimization solver (e.g., CVX by Grant et al. (Grant et al., 2011)). The advantage of stating the problem in a discrete form lies in the added flexibility. For instance, one can add physical constraints, such as a limit on filters’ gains to prevent high noise sensitivity, and obtain the optimal solution under these constraints. Additionally, the discrete problem requires knowledge of microphones’ directional responses in a few points, which can be obtained with an anechoic measurement procedure. In this way, one avoids sensitivity to microphones’ imperfect characteristics and miscalibration. 3.4 Conclusions This chapter presented an analysis of a plane-wave sound pressure field as a multivariate function of spatial location and time. Under the light shed by this analysis, gradient and differential microphone arrays show up as devices for approximating sound pressure field’s spatio-temporal derivatives. Therefore, they conceptually realize the same functionality, with the former being a special case of the latter. The presented analysis framework enables not only analyzing the response of a given gradient or differential microphone array, but it can also be used for designing differential microphone arrays. The appropriate adjustment of the microphone array’s parameters—such as the array orientation and shape, the inter-microphone distances, and microphone signal delays—enables meeting the microphone array’s desired response requirements. We also showed how the traditional directional microphone array design can be stated as a space- and frequency-discrete beamformer design problem and optimally solved using efficient numerical procedures. The beamformer design problem offers additional flexibility, such as relying on measured rather than theoretical microphone directional responses, or constraining the gains of microphone post-filters in order to prevent high noise sensitivity. Microphone Arrays For Directional Sound Capture 60 (a) 10 | H | [ dB] 0 −10 H(1) 1 H(1) 2 −20 H(1) 3 −30 2 10 3 4 10 10 f [ Hz ] (b) 20 | H | [ dB] 10 0 −10 −20 H(2) 1 H(2) 2 H(2) 3 −30 2 10 3 4 10 10 f [ Hz ] (c) | H | [ dB] 20 10 0 −10 H(3) 1 H(3) 2 H(3) 3 −20 2 10 3 4 10 10 f [ Hz ] (d) | H | [ dB] 20 10 0 −10 −20 2 10 H(4) 1 H(4) 2 H(4) 3 3 4 10 10 f [ Hz ] Figure 3.14: Beamformer filters used to synthesize an omnidirectional response d1 (φ) = 1 (a), dipole response d2 (φ) = cos φ (b), clover-leaf response d3 (φ) = cos(2φ) (c), and squared-cosine response d4 (φ) = (cos φ)2 (d). Chapter 4 Microphone Arrays For Sound Field Capture 4.1 4.1.1 Introduction Background This chapter treats microphone arrays which have received substantial attention in recent years, with a wide range of applications commonly termed sound field capture. One obvious application of sound field capture is recording for high-fidelity spatial sound reproduction, and the microphone arrays built for this purpose are commonly denoted as sound field microphones. Loosely speaking, the Blumlein XY microphone pair is a predecessor of sound field microphones, since it has the property that the combination of the two microphones’ directional responses—the so-called microphone encoding functions (Poletti, 1996)—encode the captured sources in a way that naturally translates to the two-channel intensity stereo (Lipshitz, 1986). The first more comprehensive microphone array for sound field capture was the Soundfield microphone (Gerzon, 1975; Farrar, 1979), commonly associated with Ambisonics and the efforts to decouple the recording and reproduction stages in the production of spatial sound (Furness, 1990). Sound field microphones of higher order have been proposed more recently by various authors (e.g, see Cotterell, 2002; Abhayapala and Ward, 2002; Meyer and Elko, 2002; Bertet et al., 2006). However, due to their limited capabilities as wide-band devices, higher-order sound field microphones are not widely used in live recording practice. Another application of microphone arrays for sound field capture is the analysis of auditory scenes (Teutsch and Kellermann, 2005, 2006) and acoustic spaces (Rafaely et al., 2007a) in terms of directions and number of active or passive1 sound sources. The decomposition of a sound field in terms of circular or spherical harmonic components, discussed in Section 4.2, serves as the front-end of the aforementioned analysis. It turns the sound localization problem into a formulation equivalent to the harmonic 1 For instance, wall reflections are considered to be passive sound sources. 61 Microphone Arrays For Sound Field Capture 62 spectral analysis (Stoica and Moses, 1997), which is then solved using some of the well-known spectral estimation methods. Finally, sound field microphones can be used for directional sound field capture discussed in Chapter 3. In essence, once the captured sound field is decomposed into a set of circular or spherical harmonic components, one can combine these components with the so-called modal beamforming (Meyer and Elko, 2002) to obtain an arbitrary directional response of the order defined by the fidelity of the microphone array. 4.1.2 Chapter outline In Section 4.2, we present ways to decompose two-dimensional (“horizontal”) and three-dimensional sound fields in terms of circular and spherical harmonics components, respectively. As it turns out, these decompositions possess a straightforward relation with directional sound capture, and give a way to record a sound field using differential, circular, or spherical microphone arrays, with an angular resolution controlled by the complexity of the used microphone array. Section 4.3 shows ways to capture harmonic components of a horizontal sound field using microphone arrays built from gradient microphone capsules. Section 4.4 presents circular microphone arrays, both with and without a baffle, as devices for sampling a sound field on a circle and thereby capturing its circular harmonic components. Similarly, Section 4.5 presents unbaffled and baffled spherical microphone arrays, which are used for obtaining spherical harmonic components of a three-dimensional sound field. Section 4.6 gives an analysis that focuses on the array-processing aspect of microphone arrays for sound field capture, where responses of different microphones are combined to optimally achieve a desired sound field capture. We show that the advantages offered by the array-design approach, underlined in Chapter 3, hold for sound field microphone arrays, and we show a design of Soundfield microphone noncoincidence correction filters that corroborate this observation. Conclusions are given in Section 4.7. 4.2 Wave field decomposition The task of recording a sound field is to capture its representation which is informative enough to allow for analyzing its spatial characteristics, like the number and distribution of active sound sources. It should also give rise to a decoding procedure that would, based on the used reproduction setup, produce signals to be fed to the used loudspeakers, such that the captured sound field is reproduced with high accuracy over a desired listening area. The microphone setup most appropriate for capturing a sound field is dependent on the particular spatial acoustic scenario. For instance, if sound sources are distributed towards arbitrary directions in 3D space, spherical microphone array topologies seem most suited. For planar problems, where sound sources are coplanar, circular arrays provide the most intuitive option. In Chapter 2, a common pattern appeared in all the sound radiation problems, be it in different coordinates, in free field or rooms. Each geometry gave rise to a 4.2 Wave field decomposition 63 set of eigenfunctions of the wave equation. The particular solutions, resulting from the radiation of active sound sources in a considered geometry, were expressed as a superposition of the mentioned eigenfunctions. Without the knowledge of the problem geometry and the distribution of sound sources, nothing can be told about the sound field in a region of interest unless one covers the entire region with microphones and acquires the sound pressure field p(r, t). Knowledge of the geometry goes a step further, allowing for sampling the sound field more economically while retaining sufficient information for its perfect reconstruction. This usually requires microphone apertures of continuous type, which to date have not been built in practice. In order to be practical, one has to use an array of discrete microphone elements. The topology of the array is dictated by the geometry of the acoustic problem, while the sound field acquisition fidelity is dependent on the number of used microphones, number of active sources, and the frequency range of interest. 4.2.1 Horizontal sound field decomposition In the problems of sound capture and reproduction, it is very common to focus only on the horizontal plane analysis. It is a characteristic of the first stereo systems (Blumlein, 1931; Snow, 1955), Quadraphonisc (Bauer, 1974), some matrix surround systems (e.g., see ITU-775, 1994), and planar Ambisonics (Furness, 1990) and Wave Field Synthesis (Berkhout et al., 1993). The same can be told for the sound recording techniques used for providing content for the mentioned multichannel systems. In many cases, the assumption that a single-plane analysis of sound suffices is not far from reality. Whether listeners are in the open air or a room, most sound sources are usually to within a small height difference of their ears. Thus, the main sound localization task one is faced with in everyday life happens roughly in what is referred to as the horizontal plane, where the human localization ability is indeed most effective (Blauert, 1997). In the following analyses, the sources are assumed to be sufficiently far away from the analyzed region for the far-field conditions to hold, i.e., the sources can be modeled as plane-wave radiators. Since the sound field is analyzed in the horizontal, xy-plane, it can safely be assumed that propagation happens perpendicularly to the z-axis, and that z-component of wave vectors vanishes (kz = 0). Angular spectrum The angular spectrum, described in Section 2.4.2, gives a plane wave description of a sound field in a plane, and is a viable tool for analyzing a horizontal sound field. Let P (kx , ky , ω) denote the angular spectrum in the xy-plane.2 The sound pressure field expressed through the angular spectrum, given by (2.30), simplifies in this case to Z ∞Z ∞ 1 P (kx , ky , ω)e−i(kx x+ky y) dkx dky . (4.1) P (x, y, ω) = 4π 2 −∞ −∞ Note the sign change in the complex exponential due to the direction reversal of the wave number k such that it points toward the direction of sound arrival. 2 Note that the z-coordinate has been omitted, since a horizontal sound field is independent of it. Microphone Arrays For Sound Field Capture 64 Let the angular spectrum be expressed in polar coordinates, with kr2 = ω2 c2 . kx = kr cos ϕ ky = kr sin ϕ , If the same is done with the sound pressure field, using x = r cos φ y = r sin φ , (4.1) becomes 1 P (r, φ, ω) = 4π 2 Z 2π 0 Z ∞ P (kr , ϕ, ω)e−ikr r cos(φ−ϕ) kr dkr dϕ . (4.2) 0 Furthermore, due to the one-to-one relation between the frequency ω and the radial wave vector kr , the angular spectrum P (kr , ϕ, ω) takes a non-zero value only when kr = ωc . This can be represented by ω P (kr , ϕ, ω) = 2π δ kr − P (ϕ, ω) . (4.3) c We call P (ϕ, ω) the horizontal angular spectrum. Finally, the exponential term e−ikr r cos(φ−ϕ) admits the Jacobi-Anger expansion (Abramowitz and Stegun, 1976) ∞ X e−ikr r cos(φ−ϕ) = (−i)n Jn (kr r) ein(φ−ϕ) . (4.4) n=−∞ Substituting (4.3) and (4.4) into (4.2) leads to P (r, φ, ω) = ∞ ω X (−i)n Pn (ω) Jn (kr r) einφ , c n=−∞ (4.5) where Pn (ω) is the nth Fourier series or circular harmonic3 coefficient of the 2πperiodic horizontal angular spectrum P (φ, ω), given by 1 Pn (ω) = 2π 2π Z with P (φ, ω) = P (ϕ, ω) e−inϕ dϕ , (4.6) 0 ∞ X Pn (ω)einφ . (4.7) n=−∞ Let us first come back to the notion of a directional microphone. In Section 2.8, it was mentioned that the directional response of a microphone d(θ, φ) determines with which (complex) gain the microphone reacts to a plane wave coming from a given direction (θ, φ). If a directional microphone captures a sound field composed of a 3 The complex exponential functions einφ are also called circular harmonics. 4.2 Wave field decomposition 65 continuum of plane waves, it actually integrates the angular spectrum weighted by its directional response. For a horizontal sound field, the signal captured by a directional microphone can thus be expressed by S(ω) = 2π Z P (ϕ, ω) d(ϕ) dϕ , (4.8) 0 where d(ϕ) is the directional response of the microphone in the horizontal plane. One can now make the following two observations about the horizontal sound field expressed in (4.5): • The Fourier series coefficients Pn (ω) of the angular spectrum are a complete representation of a horizontal sound field. • A comparison between (4.6) and (4.8) reveals that for obtaining a full description of a horizontal sound field, it suffices to have coincident recordings from directional microphones whose directional responses are of the form e−jnφ , for n ∈ Z. An equivalent and physically more meaningful combination of directional responses involves cosines and sines, cos(nφ) and sin(nφ), with n ∈ Z+ ∪ {0}. In that case, the Fourier series expansion of the horizontal angular spectrum takes the form P (φ, ω) = A0 (ω) + ∞ X An (ω) cos(nφ) + n=1 ∞ X Bn (ω) sin(nφ) , (4.9) n=1 with An (ω) Bn (ω) n = n = = 1 π 1 π Z Z 0 2π 0 1 2 , 1, 2π P (ϕ, ω) cos(nϕ) dϕ P (ϕ, ω) sin(nϕ) dϕ n=0 . n>0 (4.10) Helical wave spectrum Under the scenario characteristic of a horizontal sound field, cylindrical waves arise as a natural analysis framework. In particular, the interior value problem in cylindrical coordinates, described in Section 2.5.1, corresponds exactly to the horizontal sound field of far-field sound sources. Recall the equation (2.38), which expresses the sound field inside an infinite sourcefree cylinder of radius b, whose axis coincides with the z-axis:4 P (r, φ, z, ω) = Z ∞ 1 X inφ ∞ e Cn (kz , ω) e−ikz z Jn (kr r) dkz , 2π n=−∞ −∞ (4.11) 4 Again, due the direction reversal of the wave vector k, the signs inside the complex exponentials are changed. Microphone Arrays For Sound Field Capture 66 A horizontal sound field is identical in any plane normal to the z-axis. This makes the term inside the integral in (4.11) non-zero only when kz = 0, which can be represented by Cn (kz , ω) = 2π δ(kz ) Cn (ω) . (4.12) Replacing (4.12) into (4.11) yields the following cylindrical wave expansion: P (r, φ, ω) = = ∞ X n=−∞ ∞ X Cn (ω) Jn (kr r) einφ (4.13) Pn (r, ω) einφ , (4.14) n=−∞ with kr = ωc and Pn (r, ω) = Cn (ω) Jn (kr r). Similar to the helical wave spectrum defined for the exterior boundary value problem in Section 2.5.2, we denote Pn (r, ω) as the horizontal helical wave spectrum. Note the separation between the spatial variables r and φ in (4.13). The expansion coefficients Cn (ω) are independent from the two spatial coordinates, and provide a sufficient description of a horizontal sound field. There is a high similarity between the representation obtained from the angular spectrum, expressed in (4.5), and the horizontal helical wave spectrum (4.13). Both express a circular harmonic expansion, and the two sets of coefficients, Cn (ω) and Pn (ω), are related through Cn (ω) = (−i)n ω Pn (ω) . c (4.15) Capturing of the coefficients Cn (ω) requires a continuous circular aperture of an arbitrary radius b, where each point on the aperture acts as an omnidirectional pressure sensing element. Using the orthogonality of the circular harmonics einφ , the measurement amounts to applying a circular harmonic weighting function and summing the contributions along the aperture: Cn (ω) = 1 1 2π Jn (kr b) Z 2π P (b, ϕ, ω) e−inϕ dϕ . (4.16) 0 Note that there is a problem with the microphone equalization factor given by Bessel functions Jn (· ) have oscillatory behavior around zero, meaning that for some frequencies, the expansion coefficients Cn (ω) are unobtainable or they can be obtained with high sensitivity to noise. This problem, as described later in the chapter, can be circumvented by using directional apertures (i.e., with a cardioid directional response) or mounting microphones on a rigid cylindrical baffle. Another way of addressing the problem is a combination of concentric microphone apertures (e.g., two concentric apertures). 1 Jn (kr b) . 4.2.2 Three-dimensional sound field decomposition In the most general case, nothing can be assumed about locations of sound sources, apart from the fact that they enclose the volume where the sound field is analyzed, Measuring a horizontal sound field with gradient microphones 67 i.e., the listening volume. These circumstances do not provide any simplifications, such as those in the case of a horizontal sound field. From the perspective of sound field capture, the most appropriate analysis framework involves spherical waves, and more specifically, the interior value problem in spherical coordinates, described in Section 2.6.1. Recall that the sound field inside a source-free sphere of radius b centered at the origin is given by ∞ X n X P (r, θ, φ, ω) = Cmn (ω) jn (kr) Ynm (θ, φ) , (4.17) n=0 m=−n with k = ωc and r ≤ b. In order to come up with the full description of the sound field inside the sphere, it suffices to obtain the expansion coefficients Cmn (ω). Similarly to the acquisition of the helical wave spectrum of a horizontal sound field, the spherical spectrum of a three-dimensional sound field can be captured with a continuous, spherical aperture of radius b, where each point on the aperture acts as a sound pressure sensor. Using the orthogonality of spherical harmonics Ynm (θ, φ), Z π Z 2π 1 n = n0 , m = m0 m m0 ∗ Yn (θ, φ) Yn0 (θ, φ) sin θ dφ dθ = , (4.18) 0 otherwise 0 0 the expansion coefficients Cmn (ω) can be obtained by applying spherical harmonic weighting functions Ynm (θ, φ) and summing the contributions around the aperture: 1 Cmn (ω) = jn ( ωc b) Z 0 π Z 0 2π P (r, θ, φ, ω) Ynm ∗ (θ, φ) sin θ dφ dθ . (4.19) Note that the expansion coefficients Cmn (ω), obtained by (4.19), can be used to extrapolate the sound field outside the sphere of radius b, as long as the volume where the sound field is extrapolated is free from sound sources. Similarly to the case of circular apertures, the expansion coefficients Cnm (ω) obtained from spherical apertures with (4.19) suffers from ill-conditioning at frequencies where the spherical Bessel function jn (· ) takes zero or very small values. This problem is circumvented by using directional apertures or mounting an aperture on a rigid baffle, as described later in this chapter. Another way to avoid the ill-conditioning involves using two concentric spherical apertures (Rafaely et al., 2007a). 4.3 Measuring a horizontal sound field with gradient microphone arrays Conceptually, a gradient microphone of order n has a directional response proportional to (cos φ)n , where φ is the angle of arrival of an incoming wave relative to the microphone’s look direction. As mentioned in Chapter 3, gradient microphone arrays are a particular case of differential microphone arrays, described in Section 3.2. Their directional response approximates the desired (cos φ)n up to a given aliasing frequency, which depends on the inter-microphone spacing. 68 Microphone Arrays For Sound Field Capture In the previous section, it was shown how one can record a horizontal sound field with coincident microphones whose directional responses are sin(nφ) and cos(nφ). In this section, we show how gradient microphones of different orders can be used to achieve the same goal, i.e., to acquire the Fourier series coefficients An (ω) and Bn (ω) of the horizontal angular spectrum P (φ, ω). For gradient microphones of orders zero and one, the directivity patterns are equal (and hence equivalent) to the circular harmonics of orders zero and one. For orders higher than one, one can use the definition of Chebyshev polynomials of the first kind (Abramowitz and Stegun, 1976), Tn (cos θ) = cos (nθ) , (4.20) for obtaining the representation of circular harmonics in terms of gradient microphones’ directional responses (cos θ)m . This observation is formalized in the following two propositions: Proposition 1. The directivity pattern of the form cos (nθ) can be obtained as a linear combination of directivity patterns (cos θ)m of different orders m, where m ≤ n. Proof. The proof follows directly from the definition of the Chebyshev polynomial of the first kind, given in (4.20). Proposition 2. The directivity pattern of the form sin (nθ) can be obtained as a linear π 3π m combination of the directivity patterns (cos (θ − 2n ))m or (cos (θ + 2n )) of different orders m, where m ≤ n. Proof. Using the identity sin θ = cos θ − π 2 sin (nθ) = cos nθ − π 2 , (4.21) sin (nθ) can be expressed as a cosine: = cos n θ − π 2n . (4.22) . (4.23) Equivalently, sin (nθ) can be expressed as: sin (nθ) = cos nθ + = cos n θ 3π 2 3π + 2n π Applying Proposition 1 to the right side of (4.22) or (4.23) for the angle (θ − 2n ) 3π or (θ + 2n ), respectively, gives π sin (nθ) = Tn cos θ − 2n (4.24) and sin (nθ) = Tn cos θ + 3π 2n , (4.25) which completes the proof. 3π m π It should be noted that (cos (θ − 2n ))m and (cos (θ + 2n )) are directivity patterns of gradient microphones of order m, whose main axis is rotated relative to the π 3π x-axis by 2n and − 2n , respectively. By now, it should be clear how the circular harmonic coefficients An (ω) and Bn (ω) can be obtained by the use of gradient microphones: Measuring a horizontal sound field with gradient microphones 69 • The circular harmonic coefficient An (ω) can be obtained by linearly combining the outputs of gradient microphones of orders up to and including n, whose axes lie along the x-axis. The contribution of each gradient microphone is obtained from the definition of the Chebyshev polynomial of the first kind, given in (4.20). • The circular harmonic coefficient Bn (ω) can be obtained by linearly combining the outputs of gradient microphones of orders up to and including n, whose axes lie along the line that goes through the origin (microphone’s center) and forms 3π π or the angle 2n with the x-axis. The contribution of each gradient the angle 2n microphone is obtained by applying the expression (4.24) or (4.25), respectively. 4.3.1 Gradient-based horizontal sound field microphones In order to show how gradient microphone arrays can be utilized for realizing higherorder horizontal sound field microphones, three horizontal sound field configurations— two of the second and one of the third order, are described in detail here. First-order horizontal sound field microphone arrays (e.g., see Elko and Pong, 1997; Merimaa, 2002; Pulkki and Faller, 2006; Kolundžija, 2007) are not discussed here. In the following, we denote by mi (t) and Mi (ω) the time- and frequency-domain representation of the signal captured by the microphone with index i. (a) Second-order horizontal sound field microphone array with omnidirectional microphones Figure 4.1: Second-order horizontal sound field microphone array consisting of five pressure microphone elements. The configuration shown in Figure 4.1 can be used for capturing the Fourier coefficients An (ω) and Bn (ω) of orders up to two in the point O. These signals are captured in the following way: • The sound pressure signal A0 (ω), or the zero-order harmonic, is taken from microphone 1, A0 (ω) = M1 (ω) . (4.26) • The first-order Fourier coefficient A1 (ω), which corresponds to the circular harmonic cos θ, is obtained by the following combination of signals from microphones 2 and 4: A1 (ω) = (M2 (ω) − M4 (ω)) H1 (ω) , (4.27) Microphone Arrays For Sound Field Capture 70 where H1 (ω) is a filter used for equalizing a high-pass frequency characteristic of a first-order gradient microphone array with inter-microphone distance of 2d. • The first-order Fourier coefficient B1 (ω), which corresponds to the circular harmonic sin θ, is obtained by combining signals from microphones 2, 3, 4 and 5. Namely, combining signals from microphones 3 and 5 in the way given in (4.27) gives the signal C1 (ω) = (M5 (ω) − M3 (ω)) H1 (ω) , (4.28) whose directional response in the working frequency range is of the form ec1 (θ) = cos (θ − π4 ). Furthermore, for obtaining the desired response of the form sin θ, signals A1 (ω) and C1 (ω) need to be combined in the following way: √ (4.29) B1 (ω) = 2 C1 (ω) − A1 (ω) . Note that first-order Fourier series coefficients A1 (ω) and B1 (ω) can be obtained by combining microphone pairs (1, 2) and (1, 5). This approach, described in more detail in (Kolundžija, 2007), would allow increasing the aliasing frequency (due to shorter inter-microphone distance) at a cost of lower low-frequency sensitivity5 and response deviations at high frequencies due to non-coincidence. • The second-order Fourier coefficients A2 (ω) and B2 (ω), which correspond to the circular harmonics cos 2θ and sin 2θ, are obtained using the identities cos 2θ = 2(cos θ)2 − 1 sin 2θ = 2 cos θ − π 4 2 − 1, (4.30) which relate the circular harmonics cos 2θ and sin 2θ, and directivity patterns of a pressure microphone, e0 (θ) = 1, and second-order gradient microphones, eg2,1 (θ) = (cos θ)2 and eg2,2 (θ) = (cos (θ − π4 ))2 , respectively. The response e0 (θ) is obtained from microphone 1, the response eg2,1 (θ) by combining microphones 1, 2 and 4, and the response eg2,2 (θ) by combining microphones 1, 5 and 3. The desired signals are given by A2 (ω) = 2 [M2 (ω) − 2M1 (ω) + M4 (ω)] H2 (ω) − M1 (ω) B2 (ω) = 2 [M5 (ω) − 2M1 (ω) + M3 (ω)] H2 (ω) − M1 (ω) , (4.31) where H2 (ω) is an equalization filter for correcting a high-pass frequency characteristic of a second-order gradient microphone array with inter-microphone spacing of d. (b) Third-order horizontal sound field microphone array with omnidirectional microphones The configuration shown in Figure 4.2 can be used for capturing the Fourier series coefficients of the horizontal angular spectrum, An (ω) and Bn (ω), of orders up to three in the point O. These signals are captured as follows: 5 Lower sensitivity translates to higher sensitivity to noise or lower signal-to-noise ratio. Measuring a horizontal sound field with gradient microphones 71 Figure 4.2: Third-order horizontal sound field microphone configuration with 11 pressure microphone capsules. • Fourier coefficients A0 (ω), A1 (ω), A2 (ω) and B2 (ω) can be captured in the same way as described for the second-order sound field microphone. • The first-order Fourier coefficient B1 (ω) is captured more simply by combining signals from microphones 6 and 7, in the same way as the signal A1 (ω) in (4.27): B1 (ω) = (M7 (ω) − M6 (ω)) H1 (ω) . (4.32) • The third-order Fourier coefficients A3 (ω) and B3 (ω) are obtained using the identities cos 3θ = 4(cos θ)3 − 3 cos θ 3 sin 3θ = 4 cos θ + π2 + 3 sin θ , (4.33) which relate the circular harmonics cos 3θ and sin 3θ, and directivity patterns of first-order (eg1,1 (θ) = cos θ and eg1,2 (θ) = sin θ) and third-order (eg3,1 (θ) = (cos θ)3 and eg3,2 (θ) = (cos (θ + π2 ))3 ) gradient microphones. The response eg3,1 (θ) is obtained by combining microphones 2, 4, 8 and 10, and the response eg3,2 (θ) by combining microphones 6, 7, 9 and 11; the ways to obtain responses eg1,1 (θ) and eg1,2 (θ) are given in (4.27) and (4.32). The desired signals are then given by A3 (ω) = 4 (M8 (ω) − 3M2 (ω) + 3M4 (ω) − M10 (ω)) H3 (ω) − 3 (M2 (ω) − M4 (ω)) H1 (ω) B3 (ω) = 4 (M9 (ω) − 3M6 (ω) + 3M7 (ω) − M11 (ω)) H3 (ω) + 3 (M7 (ω) − M6 (ω)) H1 (ω) . (4.34) 72 Microphone Arrays For Sound Field Capture where H3 (ω) is an equalization filter for correcting the high-pass frequency characteristic of a third-order gradient microphone array with inter-microphone spacing of d. (c) Second-order horizontal sound field microphone array with omnidirectional and bidirectional microphones Figure 4.3: Second-order horizontal sound field microphone array consisting of one omnidirectional and four bidirectional microphone elements. Figure 4.3 shows a configuration which is similar to the one shown in Figure 4.1, but which uses one pressure microphone capsule in the center O and four bidirectional (with a figure-of-eight directional response) capsules around the center. Capturing the Fourier series coefficients An (ω) and Bn (ω) of orders up to two is done in a way similar to the previous horizontal sound field microphones: • The pressure signal A0 (ω), or the zero-order harmonic, is taken from microphone 1, A0 (ω) = M1 (ω) . (4.35) • The first-order Fourier coefficient A1 (ω) is obtained by combining signals from microphones 2 and 4 as follows: A1 (ω) = (M2 (ω) + M4 (ω)) G1 (ω) , (4.36) where G1 (ω) is an equalization filter for correcting the frequency characteristic of the first-order gradient approximation obtained by averaging responses of two bidirectional microphones spaced at the distance 2d. • The first-order Fourier coefficient B1 (ω) is obtained by combining signals from microphones 2, 3, 4 and 5. Combining signals from microphones 3 and 5 in the way given in (4.36) gives the signal C1 (ω) = (M5 (ω) + M3 (ω)) G1 (ω) , (4.37) whose directivity pattern is of the form ec1 (θ) = cos (θ − π4 ). Furthermore, signals A1 (ω) and C1 (ω) are combined in the following way: √ B1 (ω) = 2 C1 (ω) − A1 (ω) . (4.38) 4.4 Circular microphone arrays 73 • The second-order Fourier coefficients A2 (ω) and B2 (ω) are obtained using the identities (4.30). The response e0 (θ) is obtained from microphone 1, the response eg2,1 (θ) by combining microphones 2 and 4, and the response eg2,2 (θ) by combining microphones 5 and 3. The desired signals are given by A2 (ω) = 2 (M2 (ω) − M4 (ω)) H2 (ω) − M1 (ω) B2 (ω) = 2 (M5 (ω) − M3 (ω)) H2 (ω) − M1 (ω) , (4.39) where G2 (ω) is an equalization filter used for correcing the frequency characteristic of a second-order gradient microphone array built from two bidirectional microphones spaced at the distance 2d. The second-order configuration from Figure 4.3 provides better signal-to-noise ratio than the one shown in Figure 4.1, even though they both use the same number of microphone capsules. 4.4 Circular microphone arrays In Section 4.2, we briefly described how continuous circular microphone aperture of radius a can be used for capturing a sufficient representation of a horizontal sound field, i.e., the coefficients Cn (ω) of the helical wave spectrum (4.13). In practice, obtaining a continuous distribution of sound pressure on a circle is not possible, and a finite number of microphones is used instead. As mentioned earlier, microphones are modeled as devices for directionally capturing sound waves in a single point. Thus, a circular array of microphones models the sampling of a continuous circular aperture. Additionally, if the used microphones are directional, than the circular microphone array models the sampling of a continuous circular aperture of the same directional response.6 The sampling operation is usually associated with band-limited signals, or the signals which can be made band-limited by analog filtering prior to sampling. The sound field of a single plane wave, both in free-field (4.4) and in the presence of a rigid cylinder (2.61), contains an infinite number of circular harmonics. Hence, it can not be considered band-limited in the angular domain, and even more—there are no acoustical filters that would make it such.7 When a signal that is not band-limited needs to be sampled, one can talk about the effective angular bandwidth. The following definition formulates the effective bandwidth of periodic functions that is used in the rest of this section. Definition 1. A 2π-periodic function f (φ) ∈ L2 ([−π, π]), with the Fourier series expansion ∞ X f (φ) = Fn einφ (4.40) n=−∞ 6 It should be stressed that this only holds if all the microphones have the same directional response and they point outward in the radial direction. 7 Strictly speaking, one can assume that some averaging takes place across the membrane of a practical microphone, which effectively serves as a low-pass filter. However, there is no good model for this effect and it can not be used as a reliable way of band-limiting a sound field in the angular domain. Microphone Arrays For Sound Field Capture 74 has -effective angular bandwidth Neff if and only if P∞ 2 2 l=Neff +1 (|F−l | + |Fl | ) P∞ 2 m=−∞ |Fm | ≤ . (4.41) As an example, assume that a horizontal sound field has a −20dB-effective angular bandwidth Neff on a circle of radius r. When this sound field is uniformly sampled on the given circle with at least 2Neff samples, the aliasing error is below −20dB. For deciding on a sampling scheme, or the number of microphones in the present context, one needs to decide on the fidelity of sound field acquisition and the amount of aliasing error incurred by sampling. In this case, fidelity denotes the number of circular harmonics of a horizontal sound field. 4.4.1 Continuous circular microphone apertures Before we present three different circular microphone array types, we show a more general notion of a continuous circular microphone aperture. In particular, we show three different types of circular microphone apertures: • Unbaffled omnidirectional circular microphone aperture, where any point along the aperture acts as an omnidirectional pressure microphone. • Unbaffled first-order circular microphone aperture, whose points act as firstorder microphones facing radially outward. • Baffled omnidirectional circular microphone aperture, which is mounted on the surface of an infinite rigid cylindrical baffle. Any point along the aperture acts as an omnidirectional pressure microphone. (a) Unbaffled omnidirectional circular microphone aperture Figure 4.4: Unbaffled continuous circular microphone aperture Let a continuous omnidirectional circular microphone aperture of radius a be centered at the origin, as shown in Figure 4.4. Furthermore, let a horizontal sound field be composed of a single plane wave with unit magnitude and frequency ω, coming from the direction defined by azimuth ϕ, P (r, φ, ω) = e−ikr r cos(ϕ−φ) , with kr = ω c. (4.42) 4.4 Circular microphone arrays 75 The sound pressure along the aperture is given by the Jacobi-Anger expansion (4.4), and has the form P (a, φ, ω) = ∞ X (−i)n Jn (kr a) ein(ϕ−φ) . (4.43) n=−∞ The sound pressure field on the aperture is composed of countably many circular harmonics ein(ϕ−φ) . The circular harmonic coefficients are given by the horizontal helical wave spectrum Pn (a, ω) = (−i)n Jn (kr a) . (4.44) Figure 4.5 shows the dependence of circular harmonic coefficients of different orders on the value of kr a. | P n (k r a )| [ dB] 0 −20 −40 n=0 n=1 n=2 n=3 n=4 −60 −80 −100 −1 10 0 1 10 10 k ra Figure 4.5: Magnitude of the circular harmonic coefficients Pn (a, ω) of a single plane wave along a circle of radius a. From Figure 4.5, it is apparent that the sound pressure field exhibits more rapid angular changes along larger apertures and at higher frequencies. In other words, loworder harmonics dominate at low frequencies (for small values of the product kr a), while higher-order harmonics reach prominence as the frequency increases. At low frequencies, the gain of each circular harmonic coefficient Pn (a, ω) grows at the rate of 6n dB/oct. Additionally, the high-frequency behavior of circular harmonic coefficients is described by the large-argument behavior of Bessel functions, given by (Abramowitz and Stegun, 1976) r 2 nπ π Jn (x) ∼ cos x − − . (4.45) πx 2 4 Hence, at high frequencies, the magnitude of circular harmonic coefficients exhibit ripples, whose envelope decays as ∼ √k1 a . r It is also of interest to analyze the effective angular bandwidth of the horizontal helical wave spectrum at different frequencies. Figure 4.6 shows the −20dB-effective angular bandwidth8 of the sound pressure field (4.42) for different values kr a. From 8 Note that the effective angular bandwidth Neff is integer-valued. Microphone Arrays For Sound Field Capture 76 30 25 N e ff 20 15 10 5 0 5 10 15 k ra 20 25 30 Figure 4.6: −20dB-effective angular bandwidth of a plane-wave sound field on a circle of radius a centered at the origin. Figure 4.6, it can be observed that the effective angular bandwidth increases linearly with frequency. At low frequencies, most of the power is contained within the first few circular harmonics, and as the frequency or the circle radius increase, one needs more circular harmonic components to be able to represent the captured sound field faithfully. (b) Unbaffled first-order circular microphone aperture Consider the same plane-wave sound field given by (4.42). Let a circular microphone aperture of radius a and centered at the origin have a first-order, cardioid-type directional response. It implies that an infinitesimal element of the aperture, located at (a, φ), has a directional response d(ϕ) = α + (1 − α) cos(ϕ − φ), and that the signal along the aperture is given by = α + (1 − α) cos(ϕ − φ) e−ikr a cos(ϕ−φ) ∂ e−ikr a cos(ϕ−φ) . = α − (1 − α) i ∂(kr a) K(a, φ, ω) (4.46) Substituting (4.4) in (4.46) gives K(a, φ, ω) = ∞ X (−i)n [α Jn (kr a) + (1 − α) i Jn0 (kr a)] ein(ϕ−φ) . (4.47) n=−∞ As in the case of omnidirectional circular microphone aperture, the signal captured along the first-order one is composed of infinitely many circular harmonics. Figure 4.7 shows the gains of circular harmonic coefficients Kn (a, ω) = (−i)n (α Jn (kr a) + (1 − α) i Jn0 (kr a)). From Figure 4.7, it can be observed that the circular harmonic coefficients Kn (a, ω) do not have ripples at high frequencies. This circumvents the ill-conditioning of the equalization needed for obtaining the cylindrical wave expansion coefficients Cn (ω). Additionally, both zero- and first-order circular harmonics have constant low-frequency 4.4 Circular microphone arrays 77 0 P n (k r a ) [ dB] −20 −40 n=0 n=1 n=2 n=3 n=4 −60 −80 −100 −1 10 0 1 10 10 k ra Figure 4.7: Magnitude of the circular harmonic coefficients Kn (a, ω) of a single plane wave captured with a cardioid circular microphone aperture of radius a and parameter α = 0.5. gains, avoiding the noise sensitivity when extracting these two circular harmonics at any frequency. The −20dB-effective angular bandwidth is very similar to that of the omnidirectional circular microphone aperture. (c) Baffled omnidirectional circular microphone aperture Figure 4.8: Continuous circular microphone aperture mounted on a rigid cylindrical baffle Sound scattering from rigid cylinders was presented in Section 2.5.5. Recall that scattering was analyzed for a single-plane-wave radiation with kz = 0, which matches the description of a horizontal sound field. If the plane wave arrives from angle ϕ, the sound field in the presence of a rigid cylindrical scatterer of radius a is given by Microphone Arrays For Sound Field Capture 78 (2.61). It then follows that the sound pressure field observed on an omnidirectional circular microphone aperture mounted on the surface of an infinite rigid cylinder of radius a, shown in Figure 4.8, is given by9 ∞ X J 0 (ka) P (a, θ, φ, ω) = (−i)n Jn (ka) − n0 Hn (ka) ein(ϕ−φ) . (4.48) Hn (ka) n=−∞ Compared to the free-field case (4.44), the circular harmonic coefficients J 0 (ka) Pn (a, ω) = (−i)n Jn (ka) − Hn0 (ka) Hn (ka) n 0 Jn (ka) Hn (ka) 0 (ka) Hn , which denotes the contribution of the scatcontain an additional term tered sound field. Figure 4.9 illustrates the magnitude of circular harmonic coefficients Pn (a, ω) for different values of the wave-number-radius product kr a. 0 K n (k r a ) [ dB] −20 −40 n=0 n=1 n=2 n=3 n=4 −60 −80 −100 −1 10 0 1 10 10 k ra Figure 4.9: Magnitude of the circular harmonic coefficients Pn (a, ω) of a single plane wave on the surface of a rigid cylindrical baffle. From Figure 4.9, it is apparent that the circular harmonic coefficients do not oscillate at high frequencies, but roughly follow the Bessel function envelope ∼ √1r . As in the case of a cardioid aperture, these functions circumvent the ill-conditioning associated with equalizing Bessel functions when extracting the cylindrical wave expansion coefficients Cn (ω). This makes baffled circular apertures more convenient for capturing a horizontal sound field. Figure 4.10 shows the difference between the −20 dB-effective angular bandwidths of baffled and unbaffled circular omnidirectional microphone apertures, baffled unbaffled ∆Neff (kr a) = Neff (kr a) − Neff (kr a) . From Figure 4.10, it can be seen that the effective angular bandwidth of the sound pressure field along the baffled aperture is slightly increased at all frequencies. It also supports the argument that a baffle increases the effective radius of the microphone aperture (Teutsch and Kellermann, 2006). 9 Note the sign change in the term in , which is a consequence of changing the direction of wave vector k. 4.4 Circular microphone arrays 79 2 ∆N e ff 1.5 1 0.5 0 5 10 15 k ra 20 25 30 Figure 4.10: The difference ∆Neff (kr a) between the −20dB-effective angular bandwidth of a plane-wave sound field on a circular omnidirectional microphone aperture of radius a with and without a rigid cylindrical baffle. 4.4.2 Sampling circular microphone apertures The three previously described continuous circular microphone apertures serve as a basis for analyzing the following three types of microphone arrays: • Circular omnidirectional microphone arrays in free field. • Circular cardioid microphone arrays in free field. • Circular omnidirectional microphone arrays mounted on a rigid cylindrical baffle. In essence, these microphone arrays are sampled versions of the corresponding continuous circular microphone apertures. Let a continuous circular microphone aperture of any presented type have radius a and be centered at the origin. Generally put, the microphone aperture captures the horizontal sound field P (a, φ, ω) in a direction-dependent fashion. Denote by X(a, φ, ω) the signal observed along the aperture.10 Let a microphone array of N uniformly-spaced microphones sample the circular aperture. The sampling operation can be modeled as multiplication between the signal X(a, φ, ω) observed along the aperture, and the sampling function (Poletti, 2000) ∞ ∞ X 2π X 2π δ φ− m = eimN φ . (4.49) ∆N (φ) = N m=−∞ N m=∞ 10 In the three analyzed cases, X(a, φ, ω) can take the form (4.42), (4.46), or (4.48). Microphone Arrays For Sound Field Capture 80 The circular harmonic coefficients Pns (a, ω) obtained by a sampled circular aperture are then given by Z 2π 1 s Xn (a, ω) = X(a, φ, ω) ∆N (φ) e−inφ dφ 2π 0 Z 2π ∞ 2π X 1 2π X(a, φ, ω) = δ φ− m e−inφ dφ 2π 0 N m=−∞ N N −1 2mnπ 2mπ 1 X X a, (4.50) , ω e−i N . = N m=0 N The expression for the nth sampled circular harmonic coefficient Xns (a, ω) corresponds to taking the Discrete Fourier Transform (DFT) of the uniformly-spaced samples of the sound pressure field around a circle of radius a. The circular microphone arrays are sometimes referred to as DFT arrays (Poletti, 2000). Equivalently, one can use the leftmost sum in (4.49) to obtain the following expression for the sampled circular harmonic coefficient Xns (r, ω): Xns (r, ω) = = 1 2π Z 2π X(r, φ, ω) 0 ∞ X ∞ X eimN φ e−inφ dφ m=−∞ Xn−mN (r, ω) . (4.51) m=−∞ In (4.51), apart from the desired circular harmonic coefficient Xn (r, ω) obtained for m = 0, the sampled circular harmonic coefficient Xns (r, ω) contains the aliasing terms Xn−mN (r, ω). Thus, in order to keep the power of the aliasing terms low, one needs to choose the number of microphones large enough or decrease the radius of the aperture. There is, however, a caveat to the latter: the smaller the aperture is, the weaker is the magnitude of circular harmonic coefficients, and the more susceptible to noise they are. A DFT microphone array can measure a finite number of circular harmonic coefficients, given by N N = 2l 2 −1 M= (4.52) N −1 N = 2l + 1 2 4.5 Spherical microphone arrays Section 4.2 presented a 3D sound field analysis framework based on the interior boundary value problem in spherical coordinates. It was briefly mentioned how continuous spherical microphone apertures could be used for capturing a complete representation of a 3D sound field of far-field sound sources. Similarly to Section 4.4 on circular microphone arrays, this section gives an analysis of a 3D sound field capture with continuous spherical apertures, including free-field and baffled omnidirectional apertures, and a free-field cardioid aperture. The spherical microphone arrays can be seen as setups used for sampling the corresponding continuous spherical apertures. 4.5 Spherical microphone arrays 81 Before moving onto the analysis of spherical microphone apertures, we introduce a notion of effective bandwidth in the spherical harmonic domain. Similarly to the case of a horizontal sound field, the motivation for defining the effective bandwidth follows from the fact that a far-field source has countably many spherical harmonic components, as seen in (2.81). However, for a limited range of frequencies, it turns out that most of the sound fields’ power is contained in a finite—even only a few— spherical harmonics, making it nearly band-limited in the spherical harmonic domain. Definition 2. A function f (θ, φ) on a unit sphere, with the spherical harmonic expansion f (θ, φ) = ∞ X n X Fmn Ynm (θ, φ) , (4.53) n=0 m=−n has -effective spherical harmonic bandwidth of Neff if and only if P∞ Pl 2 l=Neff +1 m=−l |Fml | P∞ Pl 2 l=0 m=−l |Fml | ≤ . (4.54) Similarly to the effective bandwidth on a circle, one can associate the effective bandwidth on a sphere with the power of the aliasing error introduced by sampling. In other words, if a function f (θ, φ) has -effective spherical harmonic bandwidth of Neff , than the power of the error due to aliasing when uniformly sampling the unit sphere with at least M = (Neff + 1)2 samples11 is bounded by . The decision on the number of sampling point should be guided by the number of spherical harmonics one seeks to capture, and the tolerable amount of aliasing error. 4.5.1 Continuous spherical microphone apertures Similarly to Section 4.4, this section focuses on three types of spherical microphone arrays, namely with omnidirectional and cardioid microphones in free field, and with omnidirectional microphones mounted on a rigid sphere. For analyzing these microphone arrays, it is instructive to consider continuous generalizations, i.e., continuous spherical microphone apertures: • Unbaffled omnidirectional spherical microphone aperture, where any point on the aperture acts as an omnidirectional pressure microphone. • Unbaffled first-order spherical microphone aperture, whose points act as firstorder microphones facing radially outward. • Baffled omnidirectional spherical microphone aperture, which is mounted around the surface of a perfectly rigid spherical baffle. Any point on the aperture acts as an omnidirectional pressure microphone. Microphone Arrays For Sound Field Capture 82 Figure 4.11: Continuous spherical microphone aperture (a) Unbaffled omnidirectional spherical microphone aperture Let a continuous omnidirectional spherical microphone aperture of radius a be centered at the origin, as shown in Figure 4.11. Furthermore, let the captured sound field be composed of a single plane wave with unit magnitude and frequency ω, coming from the direction defined by (ϑ, ϕ). The sound pressure field is given by P (r, θ, φ, ω) = e−ikr r(sin θ sin ϑ cos(ϕ−φ)+cos θ cos ϑ) , (4.55) with kr = ωc . We have seen in Section 2.6 that the sound pressure on the spherical aperture admits the circular harmonic expansion (2.81) of the form P (a, θ, φ, ω) = 4π ∞ X n=0 (−i)n jn (kr a) n X Ynm (θ, φ) Ynm (ϑ, ϕ)∗ . (4.56) m=−n In words, the plane-wave sound pressure field on a sphere is composed of countably many spherical harmonics Ynm (θ, φ), whose strength is given by the spherical harmonic coefficients Cmn (a, ω) = 4π (−i)n jn (kr a) Ynm (ϑ, ϕ)∗ . (4.57) In order to analyze the effective bandwidth on a sphere of a sound field due to a single plane wave, one can use the angular power spectrum. The angular power spectrum Sn (kr a) quantifies the aggregate power of circular harmonics of degree n, and is given by n 1 X Sn (kr a) = |Cmn (a, ω)|2 . (4.58) 4π m=−n 11 It is well known that uniformly sampling a sphere is possible only with specific topologies defined by the so-called platonic solids. However, there are schemes which come very close to uniform sampling for arbitrary number of points (Sloane et al.). 4.5 Spherical microphone arrays 83 Using Unsöld’s theorem (Unsöld, 1927; Arfken et al., 1985) n X Ynm (θ, φ)Ynm (θ, φ)∗ = m=−n 2n + 1 4π (4.59) in (4.58), the angular power spectrum evaluates to Sn (kr a) = (2n + 1) jn (kr a)2 . (4.60) Figure 4.12 shows the angular power spectrum for different values kr a. | S n (k r a )| [ dB] 0 −20 −40 n=0 n=1 n=2 n=3 n=4 −60 −80 −100 −1 10 0 1 10 10 k ra Figure 4.12: Angular power spectrum Sn (kr a) of a plane wave sound field on a sphere of radius a at different frequencies. From Figure 4.12, it is apparent that the sound pressure field exhibits more rapid changes along larger spherical apertures and at higher frequencies. In other words, low-order harmonics dominate at low frequencies (for small values of the product kr a), while higher-order spherical harmonics become equally powerful as the frequency increases. In addition, the low-frequency gain of each angular power spectral coefficient Sn (a, ω) grows at the rate of 6n dB/oct. Figure 4.13 shows the −20dB-effective spherical harmonic bandwidth of the sound pressure field (4.42) for different values kr a. From Figure 4.13, it can be observed that the effective angular bandwidth increases linearly with frequency, similarly to circular microphone apertures. Furthermore, at low frequencies most of the power is contained within the first few spherical harmonics, and the number of important spherical harmonics increases as the frequency increases. (b) Unbaffled first-order spherical microphone aperture Consider the same plane-wave sound field given by (4.55). Let now a free-field spherical aperture of radius a and centered at the origin have a first-order, cardioid-type directional response. It implies that an infinitesimal element of the aperture, located at (a, θ, φ), has a directional response d(ϑ, ϕ) = α + (1 − α)(sin ϑ sin θ cos(ϕ − φ) + cos ϑ cos θ) , (4.61) Microphone Arrays For Sound Field Capture 84 40 N e ff 30 20 10 0 5 10 15 20 k ra 25 30 35 40 Figure 4.13: −20dB-effective spherical harmonic bandwidth of a plane-wave sound field on a sphere of radius a centered at the origin. and similarly to the first-order circular aperture, the signal captured on the aperture is given by ∂ K(a, θ, φ, ω) = α + (1 − α) i e−ikr a(sin ϑ sin θ cos(ϕ−φ)+cos ϑ cos θ) . (4.62) ∂(kr a) Using the spherical harmonic expansion of a plane wave (4.56) in (4.62) gives K(a, θ, φ, ω) = 4π ∞ X (−i)n [α jn (kr a) + (1 − α) i jn0 (kr a)] n=0 n X Ynm (θ, φ) Ynm (ϑ, ϕ)∗ . m=−n (4.63) As in the case of an omnidirectional spherical microphone aperture, the signal captured on the first-order one is composed of infinitely many spherical harmonics. The angular power spectrum (4.58), given by 2 Sn (kr a) = 4(2n + 1) |α jn (kr a) + (1 − α) i jn0 (kr a)| , is shown in Figure 4.14. From Figure 4.7, it can be observed that the angular power spectral coefficients Sn (kr a) do not exhibit high-frequency ripples. This circumvents the ill-conditioning problems when extracting spherical harmonic components of a wave field, which is characteristic of unbaffled omnidirectional spherical microphone apertures. Additionally, both the zero- and first-degree spherical harmonics have constant low-frequency gains, thus avoiding noise sensitivity when extracting these spherical harmonics. The −20dB-effective bandwidth is very similar to that of the omnidirectional spherical microphone aperture. (c) Baffled omnidirectional spherical microphone aperture Sound scattering from rigid spheres was analyzed in Section 2.6.3, and the analysis was given for an incoming sound field composed of a single plane wave arriving from direction (ϑ, ϕ). 4.5 Spherical microphone arrays 85 0 S n (k r a ) [ dB] −20 −40 n=0 n=1 n=2 n=3 n=4 −60 −80 −100 −1 10 0 1 10 10 k ra Figure 4.14: Angular power spectrum Sn (kr a) of a plane wave sound field captured by a cardioid spherical microphone aperture of radius a and parameter α = 0.5 at different frequencies. The sound field around a rigid spherical scatterer of radius a is given by (2.85). Thus, if an omnidirectional spherical microphone aperture is mounted on the surface of a rigid sphere, the sound pressure observed on the aperture is given by ∞ X X n jn0 (kr a) P (a, θ, φ, ω) = 4π (−i) jn (kr a) − 0 hn (kr a) Ynm (θ, φ) Ynm (ϑ, ϕ)∗ . h (k a) r n m=−n n=0 (4.64) Compared to the case of unbaffled spherical microphone aperture, the spherical harmonic coefficients of the sound pressure field on the baffled omnidirectional aperture, given by j 0 (k a) Cmn (a, ω) = 4π (−i)n jn (kr a) − hn0 (krr a) hn (kr a) Ynm (ϑ, ϕ)∗ , n n j 0 (k a) h (k a) contain an additional term n hr0 (krna) r , which denotes the contribution of the scatn tered sound field. The angular power spectrum (4.58) of the sound filed resulting from scattering from a rigid spherical baffle is given by Sn (kr a) = (2n + 1) |jn (kr a) − 0 jn (kr a) h0n (kr a) hn (kr a)|2 , and it is shown in Figure 4.15. From Figure 4.15, it is apparent that the angular power spectral coefficients Sn (kr a) do not oscillate at high frequencies. Viewed as filters, these functions have better behaved inverses, making baffled apertures more convenient for capturing the spherical harmonics of a 3D sound field. Figure 4.16 shows the difference between the −20 dB-effective spherical harmonic bandwidths of baffled and unbaffled spherical omnidirectional microphone apertures, baffled unbaffled ∆Neff (kr a) = Neff (kr a) − Neff (kr a) . Microphone Arrays For Sound Field Capture 86 0 S n (k r a ) [ dB] −20 −40 n=0 n=1 n=2 n=3 n=4 −60 −80 −100 −1 10 0 1 10 10 k ra Figure 4.15: Angular power spectrum Sn (kr a) of a plane wave sound field on a rigid spherical baffle of radius a at different frequencies. 2 ∆N e ff 1.5 1 0.5 0 5 10 15 20 k ra 25 30 35 40 Figure 4.16: The difference ∆Neff (kr a) between the −20dB-effective spherical harmonic bandwidth of a plane-wave sound field on a spherical omnidirectional microphone aperture of radius a with and without a rigid spherical baffle. From Figure 4.16, it can be seen that the effective spherical harmonic bandwidth of the sound pressure field on the baffled aperture is slightly increased at all frequencies. This fact also follows from the known observation that a spherical baffle provides a virtual enlargement of the microphone aperture. 4.5.2 Sampling spherical microphone apertures The analysis of the effects of sampling on a sphere, characteristic of spherical microphone arrays, are much more involved than in the case of uniform sampling on a circle used with circular apertures. It is partly due to the non-existence of general uniform sphere sampling strategies, other than the five special cases of the so-called platonic solids. There is a number of different sphere sampling schemes, and they differ by the number of sampling points and implementation complexity. Here we only briefly 4.5 Spherical microphone arrays 87 present three sampling schemes, known under the names of equiangle sampling, Gaussian sampling, and spherical t-design sampling (Rafaely, 2005). All the presented sampling strategies assume that the sampled function is bandlimited in the spherical harmonic domain. Denoting by N the spherical harmonic bandwidth of a function f (θ, φ), this implies that Fmn = 0 for n > N . (a) Equiangle sampling The equiangle sampling, described in (Driscoll and Healy, 1994), requires M = 4(N + 1)2 samples on a sphere. It is thus to within a constant factor from the theoretical minimum of (N + 1)2 . Both polar angles, θ and φ, are sampled uniformly in 2N + 1 points: θj = φl = jπ , 2N + 1 2lπ , 2N + 1 j = 0, · · · , 2N + 1 l = 0, · · · , 2N + 1 . (4.65) The discrete approximation of the spherical harmonic transform with equiangle sampling takes the form Fmn = 2N +1 2N +1 X X j=0 αj f (θj , φl ) Ynm (θj , φl )∗ , (4.66) l=0 where the weights αj are dependent on the θ-coordinate of a sampling point, and the values of αj are provided in (Driscoll and Healy, 1994). The downside of this sampling scheme is the excess number of required samples. However, it gives rise to efficient processing algorithms (Driscoll and Healy, 1994). (b) Gaussian sampling Gaussian sampling is slightly more efficient—but still suboptimal—and it requires M = 2(N + 1)2 samples. The angle φ is sampled uniformly in 2(N + 1) points, while the angle θ is sampled non-uniformly in N + 1 points, which are extracted from the zeros of the Legendre polynomial PN +1 (cos θ). More precisely, if {ν0 , · · · , νN } are the zeros of the Legendre polynomial PN +1 (x), the Gaussian sampling points are defined by θj φl = cos−1 νj , j = 0, · · · , N lπ = , l = 0, · · · , 2N . N (4.67) The approximation of the spherical harmonic transform for Gaussian sampling takes the form N 2N +1 X X Fmn = αj f (θj , φl ) Ynm (θj , φl )∗ , (4.68) j=0 l=0 where the weights αj are dependent on the θ coordinate of a sampling point, and they can be taken from tables given in (Arfken et al., 1985). Microphone Arrays For Sound Field Capture 88 (c) Spherical t-design sampling Perfectly uniform sampling of a sphere is possible only for M = {4, 6, 8, 12, 20}. It was already mentioned that for sampling a function with spherical harmonic bandwidth of N , one needs at least M = (N + 1)2 samples. Hardin and Sloane (1996) used a notion of a spherical t-design, which denotes a set of M points {(θ1 , φ1 ), · · · , (θM , φM )} on a unit sphere such that the identity Z π Z 0 2π f (cos θ, φ) sin θ dφ dθ = 0 M 1 X f (θl , φl ) , M (4.69) l=1 holds for any polynomial f of degree not greater than t. Using numerical optimization, they were able to obtain sets of points which satisfy (4.69) with high accuracy for different M , and those are tabulated in (Sloane et al.). For functions of spherical harmonic bandwidth of N , spherical t-design sampling schemes with ∼ 1.5(N + 1)2 points provide a good approximation of the sampling condition (Rafaely, 2005) M X 0 αj Ynm0 (θj , φj ) Ynm (θj , φj )∗ = δn−n0 δm−m0 . (4.70) j=1 This is still not close to the theoretical optimum, but in terms of sampling efficiency, it is the best of the three techniques mentioned here. 4.6 Sound field microphones as acoustic beamformers In the preceding sections, it was shown how microphone arrays are designed starting from an analytical solution of the wave equation for different geometries. Each microphone array design used microphone elements with perfect omnidirectional, bidirectional, or cardioid directional responses at all frequencies. Baffled circular and spherical microphones have some commonalities. On the one hand, a baffle provides better conditioning of the circular and spherical harmonics by eliminating the ripples at high frequencies, and provides a virtual aperture enlargement by increasing the low-frequency gain of the said harmonic components. On the other hand, a baffle acts as a shadow, and more so as the frequency of the incoming sound waves increases. The baffle-microphone combination can thus be seen as a microphone with a frequency-dependent directional response, whose directivity increases with frequency. As an example, Figure 4.17 shows directional responses of a baffled pressure microphone at different frequencies. In previous sections, the way baffled microphone array design was considered is best described as a sampling problem. The function being sampled on a circle or sphere is a sound field whose properties are changed by the presence of a baffle. However, one could equally view the problem of a baffled microphone array design as that of achieving the same design goals with a set of microphones with frequencydependent directional responses. Even more, this perspective can in many cases be 4.6 Sound field microphones as acoustic beamformers (a) 89 (b) 0dB −7.5dB −15dB −22.5dB −30dB 0dB −7.5dB −15dB −22.5dB −30dB f=150Hz f=1500Hz f=15000Hz Figure 4.17: Directional response of a microphone mounted on (a) cylindrical and (b) spherical baffle at different frequencies. more practical, as one never has ideal pressure or directional microphones, and even less a perfectly rigid and ideally shaped rigid object to serve as a baffle. In this section, we focus on the latter perspective, and show a microphone array design approach that relies on fewer assumptions about the used microphones or enclosures. The design relies on obtaining responses of the array elements on a set of points (possibly far-field) enclosing the array, which can be a circle or sphere, depending on the array geometry and the spatial characteristic of the captured sound field. These responses can be obtained using a theoretical model, some of which have already been shown, or through anechoic measurements. We show two examples of designing microphone array filters by optimally synthesizing a given directional response on a set of control points. The first example involves designing microphone array filters for a circular microphone array mounted on a perfectly rigid infinite cylindrical baffle, and is optimized for directional acquisition of a horizontal sound field. The second example presents a way to design the so-called Soundfield microphone non-coincidence compensation filters. There are a number of reasons to use array techniques for designing microphone arrays. The theory presented in Sections 4.4 and 4.5 gives closed form solutions in some very specific, idealized cases, but when one builds a microphone array, it is not necessarily well modelled with any of them. In other words, one can usually have only a general idea about the microphone array and the way it interacts with a sound field, such as for instance an idea about the effective circular or spherical harmonic bandwidth of the measured sound field at different frequencies. Additionally, even when there is a good model of the system, an imperfect manufacturing process can render the model less accurate. Therefore, array design which takes into account the knowledge about the effective circular bandwidth of the measured sound field, together with a set of acoustic channel responses from control points to microphones,12 is more flexible and performs comparably or better to the model-based theoretical designs described earlier in this chapter. 12 The responses are preferably obtained through anechoic measurements in order to circumvent model inaccuracies. 90 4.6.1 Microphone Arrays For Sound Field Capture Filter design for a circular microphone array mounted on a rigid cylinder Consider an array of M = 7 pressure microphones assembled in a uniform circular topology at the surface of an infinite, perfectly rigid cylindrical baffle. The microphone array is designed to capture and analyze a horizontal sound field, and thus needs to be able to extract its circular harmonic components. As a rule of thumb (e.g., see Van Trees, 2002), which can also be verified in Figure 4.6, the effective bandwidth of a circular aperture is approximately given by Neff (r, ω) ≈ ω r. c (4.71) This implies that the given microphone array can effectively capture circular harmonic components of the third and lower orders up to frequency fmax ≈ 3.3 kHz. Considering any of the expressions for a sound field on a circular aperture,13 one can notice that the function is symmetric with respect to angles (ϑ, ϕ) and (θ, φ) of plane wave incidence and analysis point, respectively. Taking a different perspective and focusing on microphones’ directional response, where the variable angles are (ϑ, ϕ), the function stays the same and all the properties of a sound field on a circular aperture, such as the effective angular bandwidth, are unchanged. Therefore, the sampling of microphone directional responses, or the decision on the number of control points, needs to take the same effective bandwidth Neff into account. In light of the fact that some aliasing is inevitable, the robustness of the microphone array filter design is improved by oversampling. Hence, in this example we use N = 13 uniformly spaced directions in the angular interval [0, 2π]. Figure 4.18 shows the directional responses corresponding to circular harmonics cos(nφ) of orders n = 0, 1, 2, 3, obtained by solving the following array optimization problem: minimize k |G(ω) H(ω)| − |D| k2 (4.72) subject to |H(ω)| Hmax 1M ×1 , with Hmax = 20 dB. In (4.72), G(ω) is an N × M matrix for microphones’ directional responses in the control directions, H(ω) is the vector with microphone post-filters, and D is the desired directional response. Note that (4.72) is solved using Algorithm 2.1. Apart from the highest-order harmonic components at low frequencies, the synthesized directional responses correspond well to the desired responses. At low frequencies, the highest-order harmonics cannot be well approximated due to the maximumgain constraints; had these constraints been less stringent, the approximation would have been better. Table 4.1 shows relative magnitude errors of directional responses obtained by optimization and with a DFT circular array. From Table 4.1, one can observe that the error between the directional response obtained through optimization is comparable to the error obtained when using a theoretical (i.e., optimal) DFT circular microphone array. Thus, in addition to being more flexible, the optimization-based microphone array filter synthesis achieves high accuracy. 13 The same can be said for a spherical aperture. 4.6 Sound field microphones as acoustic beamformers d (φ ) = 1 91 d (φ ) = c os φ 0dB −7.5dB −15dB −22.5dB −30dB 0dB −7.5dB −15dB −22.5dB −30dB f=500Hz f=1500Hz f=3000Hz d (φ ) = c os 2φ d (φ ) = c os 3φ 0dB −7.5dB −15dB −22.5dB −30dB 0dB −7.5dB −15dB −22.5dB −30dB Figure 4.18: Circular harmonic responses of different orders n of a circular microphone array of M = 7 pressure microphones mounted of a rigid cylindrical baffle of radius a = 5 cm. f 500 Hz 1500 Hz 3000 Hz f 500 Hz 1500 Hz 3000 Hz d(φ) = 1 −169.30 dB −86.10 dB −41.57 dB d(φ) = cos φ −134.78 dB −77.41 dB −32.77 dB d(φ) = cos(2φ) −109.44 dB −57.19 dB −28.35 dB d(φ) = cos(3φ) −9.40 dB −28.44 dB −16.61 dB d(φ) = 1 −170.59 dB −87.05 dB −42.35 dB d(φ) = cos φ −134.78 dB −77.86 dB −33.85 dB d(φ) = cos(2φ) −121.58 dB −58.02 dB −30.16 dB d(φ) = cos(3φ) −38.94 dB −27.09 dB −13.41 dB (a) (b) Table 4.1: Directional response relative magnitude error for a baffled circular microphone array obtained through the optimization procedure (4.72) (a) and a theoretical DFT circular microphone array (b). Microphone Arrays For Sound Field Capture 92 Figure 4.19: Arrangement of cardioid capsules in Soundfield microphone (image source: (Farrar, 1979)). 4.6.2 Soundfield microphone non-coincidence correction filter design Soundfield microphone, shown in Figure 4.19, is a device built from four outwardpointing cardioid capsules arranged at the tips of a regular tetrahedron. The directional response d(θ, φ) = α + (1 − α) sin θ cos φ of the used cardioid capsules can have various values α ∈ (0, 1), but those described in (Farrar, 1979) have a sub-cardioid response with α = 32 . The coordinates of the four microphone capsules are given by rLF = rRB = rLB = rRF = d √ [1 1 1]T 3 d √ [−1 − 1 1]T 3 d √ [−1 1 − 1]T 3 d √ [1 − 1 − 1]T , 3 where according to (Gerzon, 1975), d = 1.47 cm. The signals from the four capsules, SLF (ω), SRB (ω), SLB (ω), and SRF (ω), are often denoted A-format. However, Soundfield microphone usually provides four signals W (ω), X(ω), Y (ω), and Z(ω), known under the name of B-format, and characterized 4.6 Sound field microphones as acoustic beamformers 93 by the following directional responses: dW (θ, φ) = dX (θ, φ) = dY (θ, φ) = dZ (θ, φ) = 1 √ √ √ 2 sin θ cos φ 2 sin θ sin φ 2 cos θ . The usual way of converting the four capsule signals to B-format is through a linear combination represented by a matrix-signal product 1 1 1 1 α α α α W (ω) SLF (ω) √ √ √ √ 6 6 6 6 X(ω) 1 − 1−α −√ 1−α SRB (ω) 1−α 1−α √ √ √ (4.73) 6 6 6 6 S Y (ω) = 4 −√ − 1−α 1−α LB (ω) 1−α 1−α √ √ √ 6 6 6 6 Z(ω) SRF (ω) − 1−α − 1−α 1−α 1−α with α = 23 , followed by the so-called B-format non-coincidence correction filters (see Gerzon, 1975; Faller and Kolundžija, 2009). Instead of following the conventional approach, we start the presentation of noncoincidence filters by noting that Soundfield microphone is an exact model of a sampled cardioid spherical microphone aperture described in Section 4.5.1. Furthermore, since a regular tetrahedron is a platonic solid, the sampling performed by Soundfield microphone is uniform. It follows immediately from (4.63) that in order to obtain the signal W (ω), the four capsule signals need to be pre-filtered by s HW (ω) = 1 , 4(α j0 (kr d) + i(1 − α)j00 (kr d)) (4.74) with kr = ωc . For obtaining the signal X(ω),14 the four capsule signals need to be weighted by the directional response dX (θ, φ) evaluated in their direction15 and additionally pre-filtered with i s HX (ω) = . (4.75) 4(α j1 (kr d) + i(1 − α)j10 (kr d)) We also present an alternative solution for obtaining Soundfield capsule filters, which is based on microphone array optimization. As in the circular array case, we note that the signal of a spherical cardioid microphone aperture (4.62) is symmetric in arguments (ϑ, ϕ) and (θ, φ). This implies that the directional response of a single microphone on the aperture has the same angular dependence as (4.62), and the analysis of the effective spherical harmonic bandwidth from Section 4.5.1 applies equally to microphones’ directional responses. Therefore, it follows from Figure 4.1316 that 14 Signals Y (ω) and Z(ω) are obtained in a similar way to X(ω). 15 For instance, for obtaining X(ω), the signal SLF (ω) is weighted by d(θLF , φLF ), where (θLF , φLF ) are angular spherical coordinates of vector rLF . 16 The effective bandwidth of a cardioid spherical aperture is very similar to that of an omnidirectional spherical aperture. Microphone Arrays For Sound Field Capture 94 the effective bandwidth of microphones’ directional responses is approximately given by Neff = kr d , (4.76) with kr = ωc . The same observation is found in literature on spherical microphones (e.g., see Rafaely, 2005), and the corresponding aliasing frequency is thus given by fmax ≈ 3.71 kHz. For recovering spherical harmonic directional responses of orders zero and one, the minimum number of control directions covering a full sphere is four, but we use N = 12 instead. This uniform sphere sampling with an icosahedron is a slightly oversampled sampling scheme that makes the optimization procedure more robust to aliasing errors. The microphone pre-filters HW (ω) and HX (ω) are obtained by solving the following optimization problem at different frequencies: minimize kG(ω) H(ω) − Dk2 , (4.77) where G(ω) is a matrix containing frequency responses of acoustic channels between microphones and control directions, H(ω) is the vector of microphone filters being computed, and D is a vector with values of the directional response, dW (θ, φ) or dX (θ, φ), evaluated in control directions. s s (ω), with (ω) and HX Figure 4.20 compares the theoretically derived filters HW filters HW (ω) and HX (ω) obtained through optimization. | H W (f )| 15 magnitude [dB] 10 HW s HW 5 0 −5 −10 −15 2 10 3 4 10 10 f [Hz] | H X (f )| 15 magnitude [dB] 10 5 0 −5 −10 HX −15 s HX −20 2 10 3 4 10 10 f [Hz] Figure 4.20: Soundfield microphone non-coincidence compensation filters obtained from theory and by microphone array optimization on a full sphere. 4.6 Sound field microphones as acoustic beamformers 95 From Figure 4.20, one can observe good correspondence between the microphone s filters HW (ω) and HW (ω) up to high frequencies, while the characteristics of HX (ω) s and HX (ω) bifurcate above the aliasing frequencies fmax . This can be explained by a higher influence of aliasing errors above fmax on the array optimization procedure. It is also interesting to observe the obtained directional responses in the horizontal plane in all four cases, which are shown in Figure 4.21. H W (ω ) 90 2 120 H X (ω ) 90 2 60 1 150 120 30 180 30 180 0 210 330 240 1 150 0 210 60 300 330 240 270 300 270 f=300Hz f=3000Hz f=10000Hz s HW (ω ) 90 2 120 s HX (ω ) 90 2 60 1 150 120 30 180 330 240 300 270 1 150 0 210 60 30 180 0 210 330 240 300 270 Figure 4.21: Directional responses in the horizontal plane of Soundfield microphone B-format signals W (ω) and X(ω) obtained by optimization on a full sphere (up) and from theory (down). From Figure 4.21, one can observe that the obtained directional responses in all four cases are good approximations of the desired responses, even well above aliasing frequency fmax . The only difference is in the gain of signals X(ω) and X s (ω), where the former is excessively attenuated, and the latter excessively amplified in the horizontal plane. This surprising consistency in the directional responses well above the aliasing frequency is a consequence of a smart design. In other words, the particular capsule configuration used by Soundfield microphone takes advantage of symmetries that make many higher-order spherical harmonic terms vanish in the horizontal plane. A careful analysis reveals that this consistency is lost in tilted planes, as shown in (Faller and Kolundžija, 2009). To take full advantage of the mentioned symmetries, one can use a microphone array optimization procedure which uses control directions only in the horizontal Microphone Arrays For Sound Field Capture 96 h h plane. Figure 4.22 shows filters HW (ω) and HX (ω) obtained by solving (4.77) with N = 13 uniformly-spaced control angles covering a full circle in the horizontal plane. | H W (f )| 15 magnitude [dB] 10 HW s HW 5 0 −5 −10 −15 2 10 3 4 10 10 f [Hz] | H X (f )| 15 magnitude [dB] 10 5 0 −5 −10 HX −15 s HX −20 2 10 3 4 10 10 f [Hz] Figure 4.22: Soundfield microphone non-coincidence compensation filters obtained from theory and by microphone array optimization in the horizontal plane. h h From Figure 4.22, it can be observed that the optimized filters HW (ω) and HX (ω) s s are more similar to the theoretical filters HW (ω) and HX (ω) than those optimized on a full sphere. Figure 4.23 shows the directional responses of signals W h (ω) and X h (ω) obtained h h with filters HW (ω) and HX (ω), respectively. From Figure 4.23, one can see that the optimized microphone filters give rise to good approximations of the desired responses, both in shape and magnitude, at a wide range of frequencies. 4.7 Conclusions This chapter gave a wide treatment of microphone arrays for capturing horizontal and 3D sound fields, that included gradient, circular, and spherical microphone arrays. We presented orthogonal decompositions of horizontal sound fields that included the Fourier series expansion of the angular spectrum and horizontal helical wave spectrum. In case of general 3D sound fields, the spherical harmonic decomposition was described. All these three representations give rise to strategies for capturing sufficient representations of source-free sound fields using compact microphone apertures. 4.7 Conclusions 97 h HW (ω ) 90 2 120 h HX (ω ) 90 2 60 1 150 120 30 180 300 270 30 180 0 210 330 240 1 150 0 210 60 f=300Hz f=3000Hz 330 240 300 270 f=10000Hz Figure 4.23: Directional responses in the horizontal plane of Soundfield microphone B-format signals W h (ω) and X h (ω) obtained by optimization in the horizontal plane. Microphone arrays were presented as sampling devices able to capture a lowresolution representations of a horizontal or 3D sound field. We saw that both horizontal and 3D sound fields have effective angular or spherical harmonic bandwidth that linearly increases with frequency. Hence, the low-resolution representation of sound fields obtained with microphone arrays is sufficiently accurate up to a certain aliasing frequency, which depends on the array’s size and the number of microphones. Finally, we presented optimization techniques as a framework for designing microphone arrays for sound field capture, whose directional responses correspond to angular orthogonal functions, such as circular or spherical harmonics. This approach, unlike those that follow from theoretical analyses of various idealized circular or spherical apertures and their discrete equivalents, requires only general knowledge about the directional responses of used microphones. The microphone filters are obtained from their directional responses in a discrete set of directions, using a constrained optimization procedure whose objective function quantifies the error between the synthesized and desired directional response. In addition to flexibility, this technique is able to efficiently obtain highly precise solutions that are similar and in some specific cases even better than those provided by theory. 98 Microphone Arrays For Sound Field Capture Chapter 5 Baffled Loudspeaker Array For Spatial Sound Reproduction 5.1 Introduction In this chapter, we deal with a sound radiation problem. In particular, we describe a design of a loudspeaker array which reproduces sound in a directional manner over a wide range of frequencies. The design combines two principles discussed in Chapter 2. The first one is sound radiation from a vibrating piston mounted on the surface of a scattering object, which was touched upon in Sections 2.5.4 and 2.6.2. The system “radiator-scatterer”, whose radiation pattern becomes increasingly directional with frequency, can be viewed as a physical frequency-dependent beamformer. The other principle is a beamformer design, where a number of sound radiators are controlled with pre-filters in order to achieve a desired directional sound reproduction. When designing microphone arrays as described in Chapter 4, the goal was to capture a sound field with directional responses corresponding to orthogonal components of the sound field, such as circular or spherical harmonics. Thus, when using array techniques, one had clearly defined desired directional responses which needed to be optimally synthesized using microphones’ post-filters. Unlike microphone arrays, the loudspeaker arrays described in this section are not required to have that sort of flexibility. As will be explicitly stated, they need to be able to steer sound to a number of directions—but not a continuum thereof—and do it in a directional fashion that is consistent over a wide range of frequencies. Additionally, the ultimate receiver of the reproduced sound—the human listener—is taken into account by incorporating the hearing logarithmic sensitivity into the directional response error function. The design also relies on the fact that listeners are not highly sensitive to small spectral variations in the frequency response of electro-acoustic systems (Bücklein, 1981). Here is a list of design goals which our procedure aims to address. • High directivity The main goal of our loudspeaker array design is highly directional reproduction of sound. Additionally, high directivity should be kept over a wide frequency range, such that the desired spatial effects are as frequency-invariant as possible. 99 Baffled Loudspeaker Array For Spatial Sound Reproduction 100 • Steering capability Not only are we interested in being able to reproduce sound directionally, but also to do so towards a number of different directions. This property is highly desirable for public address (PA) systems or surround reproduction in rooms, and suggests the use of uniformly spaced circular loudspeaker arrays. • Compact size Although not a primary goal, having a compact loudspeaker array with only a few loudspeakers is advantageous for a number of reasons, both from the users’ and designers’ perspective (e.g., saving listening room space and cost). • Measurement-based design Sound scattering and propagation properties on symmetric geometries are well analyzed and understood, but models are often not sufficient for practical system design. There is a number of reasons for discrepancies between a model and a real system. These include manufacturing inaccuracies, model simplifications, equipment miscalibration etc. In order to avoid the mentioned sources of errors, we rely on measurements of the loudspeaker array in a number of control points such that those inaccuracies are accounted for. 5.1.1 Background In various sound reproduction scenarios, it is preferable to have a way of reproducing sound in a directional manner, and doing it consistently over a wide range of frequencies. These scenarios include various public address (PA) systems, both indoor and outdoor, and compact multichannel audio reproduction systems, such as sound boxes and sound bars (e.g., Yamaha, 2011; Sonic Emotion, 2011). There are various solutions for directional sound reproduction. These include flat, large-membrane loudspeaker panels, ultrasonic flat loudspeakers, and loudspeaker arrays. The flat loudspeaker panels (Holophonics, 2011) use large vibrating plates in order to generate flat wavefronts and exhibit a plane-wave-like reproduction. Their directional reproduction performance is impressive, but the limited quality of reproduced sound and the lack of dynamic range render them hardly usable for most applications. Ultrasonic loudspeakers (e.g., Yoneyama et al., 1983; Nakashima et al., 2006; Holosonics, 2011; Sennheiser, 2011) use an array of ultrasound transducers which can be combined with a beamformer in order to form a narrow front-facing radiation beam. The audio signal is modulated to the ultrasonic frequencies, reproduced with the transducer array, and rendered back to the audible spectrum by a demodulation that happens as a result of non-linearities in the air. Similar to the flat loudspeaker panels, ultrasonic loudspeakers have an outstanding directionality, but at a cost of a very limited dynamic range and poor reproduction quality. Finally, the loudspeaker arrays use larger number of loudspeakers and classical beamforming (Ward et al., 1995; Van der Wal et al., 1996). In order to achieve similar directional reproduction in a wide range of frequencies, the loudspeaker beamformer combines different loudspeakers from the array in order to keep the inter-loudspeaker distance inversely proportional to frequency. Loudspeaker arrays can achieve a good and consistent directional reproduction of sound in a wide range of frequencies, but 5.2 Acoustical design 101 they require a large number of loudspeakers, are large in size, and cannot maintain desired directional reproduction up to very high audible frequencies due to the limitations determined by the minimum inter-loudspeaker distance. 5.1.2 Chapter outline Section 5.2 analyzes the theoretical model of sound radiation from a loudspeaker mounted on a rigid cylinder and motivates the baffled loudspeaker array design. Section 5.3 describes different aspects of designing optimized beamforming filters for directional sound reproduction. Design of a beamformer using a simulated baffled loudspeaker model is presented in Section 5.4. A prototype loudspeaker array and its performance are described in Section 5.5. Section 5.6 describes how a directional loudspeaker array can be used for reproducing spatial sound. Conclusions are given in Section 5.7. 5.2 Acoustical design We have seen the boundary value problems in cylindrical and spherical coordinates in Sections 2.5 and 2.6, respectively, and how they were used to derive sound fields radiated by pressure or velocity distributions on cylindrical or spherical boundaries. Of particular interest in this chapter is the radiation pattern of a vibrating piston in a cylindrical baffle, presented in Section 2.5.4. Vibrating piston in a cylindrical baffle serves as a model of a baffled piston loudspeaker that is the basis of our design. It was shown in Section 2.5.4 that the directional response (or radiation pattern) of a baffled vibrating piston changes from being omnidirectional at low frequencies to being highly directional at high frequencies. From the perspective of our design goals, this is certainly not desired, as we seek to achieve a directional response that is invariant over a wide range of audible frequencies. Thus, the use of a single loudspeaker does not suffice, and our design uses a loudspeaker array in a way described in Section 5.3. In this part, we show that one can identify two different trends in the radiation pattern of a baffled piston over audible frequencies. Namely, at low frequencies, the directional response of a baffled piston changes rapidly from omnidirectional to highly directional. This happens from the lowest audible frequencies up to a few kHz, depending on the size of the baffle and piston. As the frequency increases further, the directional response exhibits slow changes at a macroscopic scale. By this we mean that the radiation pattern has roughly the same shape, with only small changes in the front, around the look direction,1 and relatively larger changes in the highlyattenuated back directions. 5.2.1 Baffled loudspeaker model To support the claim about “bi-modality” of the baffled loudspeaker’s directional response, we show an example involving a model of a baffled piston loudspeaker. The model, which was presented in more detail in Section 2.5.4, involves an infinite 1 The look direction is defined by the radial line connecting the baffle’s center and the piston. 102 Baffled Loudspeaker Array For Spatial Sound Reproduction cylindrical baffle of radius a and a vibrating piston of length 2L and circumferential width 2αa. Figure 5.1 illustrates the geometry of the model. Figure 5.1: Model of a piston loudspeaker mounted on an infinite rigid cylindrical baffle. This simplifying baffled loudspeaker model is convenient to illustrate the concept, as it possesses a closed-form analytical solution which readily serves for analysis. As will be stressed later, our design does not rely on this particular model, but on a more general insight it provides. For convenience, we repeat the expression (2.52) for the sound pressure radiated in the far field by a vibrating piston mounted on an infinite cylindrical baffle (Williams, 1999): P (r, θ, φ, ω) ≈ ρ0 c eikr 2π 2 r N X n=−N (−i)n einφ 4bαL sinc(nα) sinc(kz L) , sin θ Hn0 (ka sin θ) (5.1) where r, θ, and φ are the standard spherical coordinates depicted in Figure 5.1, ρ0 is the density of air, c the speed of sound propagation, b the piston’s velocity, k = ω/c the wave number, and Hn0 (x) the first derivative of the Hankel function of the first kind. In the following analysis, the piston will have sides of equal length, i.e., L = aα. Although the far-field response is not easy to predict from (5.1), it can be inferred that: • High-frequency directivity grows inversely with piston’s size 2aα for a fixed baffle radius a. • High-frequency directivity grows with the baffle radius a, when the piston size is kept constant. To confirm the above claims, we illustrate normalized frequency-dependent directional responses of the considered baffled loudspeaker model. Figure 5.2 illustrates 5.2 Acoustical design 103 the influence of the piston’s size, whereas Figure 5.3 shows the influence of the baffle’s radius. Note that the polar diagrams are a bit unusual, as they show normalized directional responses clipped at a threshold value of −15 dB. The reason is to reveal the dependence of the high-frequency directivity on the two considered system parameters—piston’s and baffle’s size, where the directivity is related to the directional response’s −15 dB level. It was already mentioned that the directional response below the chosen threshold, although more varying, is low enough to be considered less relevant for the foreseen applications. L=0.01m L=0.02m 0 0 150 −5 100 −10 50 0 φ [ de g] φ [ de g] 150 5000 10000 f [ Hz ] 15000 −5 100 −10 50 0 −15 L=0.03m 5000 10000 f [ Hz ] 15000 L=0.04m 0 0 150 −5 100 −10 50 5000 10000 f [ Hz ] 15000 −15 φ [ de g] φ [ de g] 150 0 −15 −5 100 −10 50 0 5000 10000 f [ Hz ] 15000 −15 Figure 5.2: Frequency-dependent normalized directional responses of a cylindrically baffled vibrating piston, clipped at −15 dB. The piston side half-lengths aα (equal to L) are varied, while the baffle radius is kept constant at a = 0.08 m. From Figure 5.2 and 5.3, one can see that the directivity of a cylindrically baffled piston is roughly inversely proportional to the piston’s dimension aα and directly proportional to the baffle’s radius a. Furthermore, as a guide for choosing the dimensions of a piston and a baffle, one can use the observation that the −15 dB threshold appears around the angle φT = ±75◦ off the piston’s axis when the piston covers the circumferential angle of 2α = 0.75 rad. Note, however, that the pressure in (5.1) is approximately proportional to piston’s area at low frequencies. Thus, even though smaller pistons can be used to increase loudspeaker’s directivity, their size can not be made too small, as it may violate design goals for achievable sound pressure levels (SPL) at low frequencies. Figure 5.2 and 5.3 also confirm the claim of a roughly bi-modal behavior of baffled Baffled Loudspeaker Array For Spatial Sound Reproduction 104 a=0.08m a=0.16m 0 0 150 −5 100 −10 50 0 φ [ de g] φ [ de g] 150 5000 10000 f [ Hz ] 15000 −5 100 −10 50 0 −15 a=0.32m 5000 10000 f [ Hz ] 15000 a=0.64m 0 0 150 −5 100 −10 50 5000 10000 f [ Hz ] 15000 −15 φ [ de g] φ [ de g] 150 0 −15 −5 100 −10 50 0 5000 10000 f [ Hz ] 15000 −15 Figure 5.3: Frequency-dependent normalized directional responses of a cylindrically baffled vibrating piston, clipped at −15 dB. The piston size lengths are kept constant, aα = L = 2 cm, while the radius a is varied. loudspeaker’s directional responses. Low frequencies, up to a few kHz,2 are characterized by a sharp transition from an omnidirectional to a unidirectional pattern. High frequencies, starting from a few kHz, are characterized by directional responses that have little variation with frequency.3 Figure 5.4 provides a different presentation of the same data. For each angle, it shows the variation of the directional response’s magnitude in the frequency range 5 − 20 kHz. From Figure 5.4, one can see that the directional response in the front directions does not vary significantly at high frequencies. The more prominent variations happen in the rear, highly attenuated directions, making these variations less relevant from the perspective of our design goals. 5.3 Beamformer design From the previous section, we have seen that an adequately sized baffled loudspeaker design can provide a desired directional sound radiation at high frequencies. In order to maintain that behavior down to low frequencies, one needs to use beamforming 2 The “border frequency” between the two modes depends on the baffle’s and piston’s dimensions. 3 Responses at rear angles are more varying with frequency, but they are highly attenuated such that their variability is of little significance for the considered applications. 5.3 Beamformer design 105 0 d (φ ) [ dB] −5 −10 −15 −20 −25 −30 −35 0 20 40 60 80 100 120 140 160 φ Figure 5.4: Variation of the directional response of a cylindrically baffled vibrating piston in the frequency range 5 − 20 kHz, with aα = L = 1.5 cm and a = 8 cm. Light-gray area shows 25 − 75 percentiles, dark-gray area shows 1 − 99 percentiles, and solid line shows the mean directional response. with multiple loudspeakers. Although the sound scattering from a cylindrical baffle is used as a starting point for loudspeaker array and analysis, the beamformer design procedure does not rely explicitly on the cylindrical geometry. Similarly, it does not require a very precise placement of control points (i.e., forcing them to lie exactly on a circle centered at the array’s center), which is a usual requirement of modal beamforming techniques. We use a beamformer design that is fully based on the loudspeaker array measurement. It relies on the previously described bi-modal nature of the directional response of a baffled loudspeaker, and on the observation that a single (front) loudspeaker can drive the high frequencies alone. Additionally, the beamformer design relies on the estimated effective angular band-limitedness of the directional response for choosing the number of control points in order to limit the errors due to aliasing. For deciding on the number of control points, one can use the analysis of the effective angular bandwidth (see Definition 1 on page 73) of a loudspeaker model at different frequencies. As an example, Figure 5.5 shows the −20dB-effective angular bandwidth of a baffled piston with dimensions similar to our loudspeaker array prototype described later in Section 5.5. Based on Figure 5.5, designing a beamformer up to the frequency f0 = 5 kHz requires more than 16 control points. N e ff 15 10 5 0 2000 4000 6000 8000 f [ Hz ] 10000 12000 14000 Figure 5.5: −20dB-effective angular bandwidth of a piston in a cylindrical baffle. The piston’s dimensions are L = aα = 1.5 cm, while the baffle’s radius is a = 8 cm. Baffled Loudspeaker Array For Spatial Sound Reproduction 106 5.3.1 Filter design procedure Figure 5.6: Circular loudspeaker array configuration and control points on a reference circle used for designing beamforming filters. We use a baffled circular loudspeaker array with L = 6 loudspeakers, illustrated in Figure 5.6. The loudspeaker array’s look direction4 coincides with the look direction of one loudspeaker, which we denote the main loudspeaker. Without loss of generality, we assign index 1 to the main loudspeaker. Each loudspeaker’s response is measured in M control points covering a circle of radius r centered at the array’s center. Measurements are done in free-field or anechoic conditions. Additionally, without sacrificing generality, we assign index 1 to the control point lying in the loudspeaker array’s look direction.5 Denote by G(ω) the acoustic channel matrix, where the entry Gij (ω) denotes a filter representing the transmission path from the j-th loudspeaker to i-th measurement point. Note that each column Gi (ω) of the matrix G(ω) contains the directional response of the i-th loudspeaker at frequency ω. Further, let the vector H(ω) contain loudspeaker filters Hi (ω). Motivated by the previous observations, we select the directional response G1 (ω0 ) of the main loudspeaker at some high frequency ω0 to be the desired directional response of the array. Additionally, while keeping the shape of the directional response (polar pattern) unchanged, we scale it with a frequency-dependent factor C(ω) = |G11 (ω)| , |G11 (ω0 )| where G11 (ω) and G11 (ω0 ) are the on-axis responses of the main loudspeaker at frequencies ω and ω0 , respectively. This makes the on-axis desired response of the 4 We will use on-axis and look direction interchangeably throughout the rest of the chapter. 5 The control point with index 1 is also in the look direction of the loudspeaker with index 1. 5.3 Beamformer design 107 entire array equal to the on-axis response of the main loudspeaker.6 The frequencydependent desired response is thus given by D(ω) = C(ω) G1 (ω0 ) . (5.2) Above the frequency ω0 , we suppress beamforming and use only the main loudspeaker. It was shown in Section 2.8 that a beamformer design could be expressed in the frequency domain as a constrained optimization problem. The goal is to compute a vector H(ω) of loudspeaker filter complex gains, where the function being optimized is the error norm E(ω) = G(ω) H(ω) − D(ω) (5.3) between the desired and obtained directional response in the control points, D(ω) and G(ω) H(ω), respectively. Before we present the design procedure in detail, let us motivate the particular choices in the beamformer design formulation used for the problem at hand. Weighted error The human auditory system is sensitive to relative changes of sound quantities, and frequency response irregularities are no exception (Bücklein, 1981). Thus, in a scenario where a desired directional response is highly angle-varying, it is not reasonable to sum the beamformer response errors at control points equally. It is rather judicious to penalize the absolute error more at points where the desired response level is low, and less where it is high. Thus, we look for minimizing the relative error in the control points by dividing the error in each control point by the desired gain in that point, Ew (ω) = diag(|D(ω)|)−1 (G(ω)H(ω) − D(ω)) , (5.4) where diag(x) of a vector x denotes a square diagonal matrix whose main diagonal contains the elements of x, and |x| denotes a vector containing absolute values of the elements of x. Robust design We try to avoid driving any of the loudspeakers with large gain at any frequency. There are two reasons for such a design decision. First, driving loudspeakers with large gains to better match the desired pattern effectively decreases the dynamic range of the loudspeaker array. Second, as described in Section 2.8, the l2 norm of the beamformer filters’ impulse response is related to the beamformer’s white noise gain (Cox et al., 1987) 1 , (5.5) WNG = kH(ω)k22 which quantifies its robustness to errors. More specifically, the larger the white noise gain is, the less sensitive to random (measurement or placement) errors is the designed beamformer. 6 The goal of the described procedure is not to equalize the on-axis frequency response of the main loudspeaker. This can be done separately, using conventional loudspeaker equalization approaches. 108 Baffled Loudspeaker Array For Spatial Sound Reproduction Favoring front loudspeaker We have shown in Section 5.2 that high-frequency directional response of a baffled loudspeaker does not vary substantially with frequency. Thus, if the beamformer’s desired response is set to the directional response of the front loudspeaker at some fixed high frequency ω0 , then it is to be expected that as the frequency approaches ω0 , the beamformer tends to mostly use the main loudspeaker. As a consequence, the response of the beamformer will be dominated by the response of the main loudspeaker, in terms of both magnitude and phase. Additionally, our goal is reproducing sound with high directivity, and phase errors in the synthesized directional response are of little relevance. We saw in Section 2.8 that one can express the beamformer design problem where the error between the magnitudes of the synthesized and desired response is considered. We also presented Algorithm 2.1, which is an iterative procedure for solving this problem, but without guarantees of convergence to the global optimum. The crucial step in any iterative procedure is the initial solution. The initial solution selection for Algorithm 2.1 relies on the high-frequency dominance of the main loudspeaker, and consists of aligning the phase of the desired response to the phase of the main loudspeaker (Kolundžija et al., 2010a, 2011b), D̃(ω) = diag(|G1 (ω)|)−1 diag(G1 (ω)) |D(ω)| . (5.6) At low frequencies, phase differences between the responses of array’s loudspeakers are small, and the impact of the phase correction is negligible on the synthesized directional response. However, the phase alignment improves the desired response synthesis at high frequencies (Kolundžija et al., 2010a, 2011b). It also enables a smooth transition between using all to using only the main loudspeaker, which is highly desirable when designing practical finite impulse response (FIR) filters. Algorithm for computing the beamformer filters Putting the above design decisions together, the beamformer design problem is solved using Algorithm 5.1 for different frequencies ω. In Algorithm 5.1, x ∈ {2, ∞} is the minimized relative error norm (Euclidean or min-max), G0 (ω) is the matrix obtained by removing the first row of the matrix G(ω), D̃ 0 (ω) and D̂ 0 (ω) are vectors obtained by removing the first element of the vectors D̃(ω) and D̂(ω), respectively, R1T (ω) is the first row of the matrix G(ω), and τ is a small constant used for controlling deviations of the on-axis frequency characteristic. If the on-axis frequency response of the beamformer needs to match the desired one exactly, the parameter τ is to be set to zero. 5.4 Simulations In order to assess the wide-band performance of the described beamformer, we simulated a model of a six-element loudspeaker array mounted on an infinite cylindrical baffle. The vibrating pistons, modeling loudspeaker membranes, were uniformly spaced on the baffle’s circumference. The radius of the baffle was a = 8 cm, and the sides of pistons were 2L = 2aα = 3 cm. 5.4 Simulations 109 Algorithm 5.1 Minimize directional response relative magnitude error norm with WNG constraints 1. Choose the solution tolerance 2. Compute the initial solution H(ω) by solving the following quadratic program minimize subject to k diag(|D̃ 0 (ω)|)−1 (G0 (ω) H(ω) − D̃ 0 (ω)) kx |Hj (ω)| ≤ Hmax , j = 1, . . . , L |R1T (ω) H(ω) − D̃1 | ≤ τ |D̃1 | 3. repeat 4. E ← k diag(|D(ω)|)−1 (|G(ω) H(ω)| − |D|) kx 5. Compute D̂(ω) such that ∀j ∈ {1, . . . , N } |D̂j (ω)| = |Dj | ∠D̂j (ω) 6. = ∠(G(ω) H(ω))j Solve the following quadratic program minimize subject to k diag(|D̃ 0 (ω)|)−1 (G0 (ω) H(ω) − D̂ 0 (ω)) kx |Hi (ω)| ≤ Hmax , i = 1, . . . , L |R1T (ω) H(ω) − D̂1 | ≤ τ |D̂1 | 7. E 0 ← k diag(|D(ω)|)−1 (|G(ω) H(ω)| − |D|) kx 8. until |E 0 − E| < The desired directional response was taken to be the response of the main loudspeaker alone at frequency f0 = 5 kHz, which is shown in Figure 5.7. The control points used for computing the loudspeaker filters were placed uniformly on a circle of radius r = 3 m centered at the array’s center. The number of reference points, M = 19, was chosen slightly higher than the 20dB-effective angular bandwidth of the loudspeakers’ directional responses (see Figure 5.5) below the frequency f0 . Figure 5.7 compares directional response of the beamformer to the desired response at a number of frequencies ranging from 300 Hz to 12 kHz. We can observe that there is a good match between the two at all frequencies, in the sense that deviations do not exceed a few decibels. To better illustrate the consistency of the obtained directional response over frequencies, Figure 5.8 shows frequency responses at various angles on the reference circle. We can see that there is no large deviations in the frequency response at most angles. More prominent variations of the frequency response happen at highly attenuated rear angles, but as mentioned earlier—variations at very low sound levels are not detrimental in the foreseen applications. Figure 5.9 shows the frequency responses of the computed beamformer filters. As expected from the beamformer design procedure, the front loudspeaker drives the high frequencies independently, and the other loudspeakers effectively help at low Baffled Loudspeaker Array For Spatial Sound Reproduction 110 f=300Hz f=1500Hz 0dB −7.5dB −15dB −22.5dB −30dB 0dB −7.5dB −15dB −22.5dB −30dB beamformer desired f=5000Hz 0dB −7.5dB −15dB −22.5dB −30dB f=12000Hz 0dB −7.5dB −15dB −22.5dB −30dB Figure 5.7: Loudspeaker array directional responses at various frequencies, on the reference circle of radius r = 3 m. 0 0 deg 23 deg 45 deg 68 deg 90 deg 113 deg 135 deg 158 deg gain [dB] −5 −10 −15 −20 −25 −30 2 10 3 10 f [Hz] 4 10 Figure 5.8: Loudspeaker array frequency responses at various angles on the reference circle of radius r = 3 m. frequencies. Furthermore, we can see a desirable smooth transition between array beamforming and acoustical beamforming (provided by the baffle). 5.5 Experiments 111 20 gain [dB] 10 0 −10 −20 1 2 3 4 −30 −40 2 3 10 10 4 10 f [Hz] Figure 5.9: Loudspeaker array beamformer filters’ frequency responses. 5.5 Experiments In addition to simulations, we tested the proposed approach to directional loudspeaker design in practice. We assembled a cylindrical array of six small Logitech Z4 loudspeakers, as illustrated in Figure 5.10. Figure 5.10: Prototype loudspeaker array consisting of six Logitech Z4 loudspeakers arranged as a cylinder having a radius r ≈ 8 cm. As stated earlier, our beamformer design procedure is entirely based on measured loudspeaker responses, and not on a model. The measurement of the assembled loudspeaker array was made in an anechoic chamber. The loudspeaker array was fixed on a turntable and measured with a fixed omnidirectional microphone r = 2 m away from its center. The measurements of all loudspeakers were made at 13 turntable rotation steps of 2π 13 radians, which is equivalent to measuring the array uniformly at 13 equidistant points on a circle of radius r = 2 m. Measurements were done using swept sines, covering the frequency range from 300 Hz to 12 kHz. Due to limited access to measurement resources, the measurements were performed only once, without averaging multiple responses. Figure 5.11 shows the frequency responses of one loudspeaker from the array at control points covering the angles in the range [0, π]. Figure 5.12 shows the loud- Baffled Loudspeaker Array For Spatial Sound Reproduction 112 20 magnitude [dB] 15 0 deg 28 deg 55 deg 83 deg 111 deg 138 deg 166 deg 10 5 0 −5 −10 3 4 10 10 f [Hz] Figure 5.11: Measured frequency responses of one loudspeaker of the prototype loudspeaker array. The diagram contains frequency responses in seven control points on the measurement circle of radius r = 2 m. f=500Hz 0dB −7.5dB −15dB −22.5dB −30dB f=3000Hz 0dB −7.5dB −15dB −22.5dB −30dB f=1500Hz 0dB −7.5dB −15dB −22.5dB −30dB f=6000Hz 0dB −7.5dB −15dB −22.5dB −30dB Figure 5.12: Measured directional responses of one loudspeaker of the prototype loudspeaker array at different frequencies. The diagram contains directional responses sampled in 13 control points on the measurement circle of radius r = 2 m. speaker’s directional responses at different frequencies. As expected, the directional response at high frequencies above 4 kHz is not highly varying with frequency. Also, the low-frequency directivity, below 2.5 kHz, increases with frequency. In the frequency range between 2.5 kHz and 4 kHz, the directional response becomes less directive. At frequencies around 3 kHz, the on-axis response is weaker than the responses at angles up to 90◦ off the look direction. Additionally, the re- 5.5 Experiments 113 sponses towards side and rear directions are much less attenuated than at low and high frequencies. This mid-frequency behavior is the only example where a practical loudspeaker behavior substantially deviates from the simplified theoretical model analyzed in Section 5.2. To compute the beamformer filters, we specified as desired the directional response of the front loudspeaker at frequency f0 = 5 kHz. The frequency f0 = 5 kHz is also the frequency above which only the front loudspeaker is used for sound reproduction. We allowed the on-axis frequency response of the array to deviate by t = 1 dB from the on-axis frequency response of the front loudspeaker. The frequency responses of the beamformer at the control points belonging to the first two quadrants are shown in Figure 5.13.7 The frequency responses at different control points—except for the rear, highly attenuated directions—do not vary substantially at low frequencies up to 2 kHz, and at high frequencies, above 4 kHz. However, we were not able to achieve the desired directional response in the frequency range 2 − 4 kHz. Frequency responses at side directions have a high peak in this frequency range, which can manifest itself as an audible coloration (Bücklein, 1981). Figure 5.14 illustrate the described frequency-dependent behavior, but using polar plots. magnitude [dB] 0 0 deg 28 deg 55 deg 83 deg 111 deg 138 deg 166 deg −10 −20 −30 3 4 10 10 f [Hz] Figure 5.13: Normalized beamformer frequency responses (0 dB represents the desired on-axis frequency response) in seven control points on the measurement circle of radius r = 2 m. Apart from the mid-frequency anomalies of some of the rear directions, it could be said that the loudspeaker array achieves good wide-band directivity. For its foreseen practical applications (described in the following section), the achieved performance is quite satisfactory. Figure 5.15 shows the frequency responses of the beamformer filters. As expected, the frequencies above f0 = 5 kHz are reproduced only by the front loudspeaker. Below f0 = 5 kHz, all loudspeakers are active, and the frequency responses exhibit smooth transitions between beamforming and using only the main loudspeaker. This smoothness enables implementing the beamformer with short FIR filters. Conversion of the filters’ frequency responses to FIR filters is thoroughly described in Section 6.2.4. 7 The responses in the other two quadrants look very similar. Baffled Loudspeaker Array For Spatial Sound Reproduction 114 f=500Hz f=1500Hz 0dB −7.5dB −15dB −22.5dB −30dB 0dB −7.5dB −15dB −22.5dB −30dB w/ beamformer desired f=3000Hz w/o beamformer 0dB −7.5dB −15dB −22.5dB −30dB f=10000Hz 0dB −7.5dB −15dB −22.5dB −30dB Figure 5.14: Beamformer directional responses in 13 control points on the measurement circle of radius r = 2 m. 0 gain [dB] −10 −20 1 2 3 4 −30 −40 3 4 10 10 f [Hz] Figure 5.15: Beamformer filters’ frequency responses shown for four loudspeakers of the prototype loudspeaker array. 5.6 Applications As briefly mentioned in the introduction, one can foresee a number of uses for a loudspeaker system having high broadband directivity. One possible application is mitigation of adverse effects of a listening room on the reproduced sound. Although the listening room contributes to the naturalness and 5.6 Applications 115 gives an important sense of space, its low-frequency modes can severely impair the reproduced sound. Bad interaction of a loudspeaker system and listening room has a detrimental effect on intelligibility of the reproduced material, be it speech or vocals. We have seen in Section 2.7 that the critical distance of a room, which can be considered an indicator of room’s detrimental effects on speech intelligibility, is directly proportional to the directivity of a source. Thus, by using a directional loudspeaker directed toward the audience, one can expect a reduction of the unwanted coloration and lengthy reverberation tail, and a consequent increase of speech intelligibility. Another application where the described loudspeaker array can be useful is targeted sound reproduction, as in public address systems, advertisement displays, or in exhibition spaces, where it can help reducing “sound pollution”. Last, but not least, the loudspeaker array with high broadband directivity and steering capability can be used for reproducing surround (e.g., stereo or 5.1) contents in rooms. More specifically, using the capability to steer the reproduced sound, it is possible to “project” channels towards different walls in order to evoke auditory events outside the loudspeaker array, widen the auditory scene, and generate ambiance. (a) (b) Figure 5.16: Examples of 5.1 surround reproduction with a six-element circular loudspeaker array. (a) Projecting front channels towards the front wall and surround channels towards side walls. (b) Similar to (a), but the center channel gets projected towards the listener in order to position the dialog or vocal at the loudspeaker array. Figure 5.16 illustrates two examples how the loudspeaker array described in Section 5.5 can be used for reproducing 5.1 surround content. In the example shown in Figure 5.16(a), the front channels get projected towards the front wall in order to widen the frontal auditory scene, while the two surround channels get projected towards the side walls to create ambience and extend the auditory scene towards the sides. Figure 5.16(b) illustrates a slightly modified reproduction strategy, where the center channel gets projected towards the listener in order to anchor the dialogue or vocals to the loudspeaker array. We have done informal listening tests on various 5.1 contents reproduced using 116 Baffled Loudspeaker Array For Spatial Sound Reproduction our six-loudspeaker prototype. With both of the previously described strategies, we were able to generate spatial effects from both the front and side walls of the listening room. Furthermore, the loudspeaker array did not suffer from noticeable timbral artifacts at any direction. 5.7 Conclusions The goal of the work described in this chapter was to design a compact loudspeaker array having wide-band high directivity, with the ability to steer the sound to a number of different directions. As a solution, we proposed a beamformer for a circular loudspeaker array which relies on two principles. One is the directivity increase with frequency of a loudspeaker when mounted on a rigid cylindrical baffle. Since the baffle makes highfrequency directional response of a loudspeaker approximately frequency-invariant, a single loudspeaker is sufficient for reproducing high frequencies. The other principle is magnitude-optimized beamforming, which uses all loudspeakers at low frequencies in order to synthesize the directional response of a single loudspeaker at high frequencies. The effectiveness of the proposed approach was verified through simulations using a model of a baffled piston loudspeaker, and also with a prototype cylindrical loudspeaker array. Informal listening tests showed promising results in applications such as mitigating adverse room effects and surround sound playback in rooms. Chapter 6 Reproducing Sound Fields Using MIMO Acoustic Channel Inversion 6.1 Introduction In this chapter, we present an approach denoted as Sound Field Reconstruction (SFR) (Kolundžija et al., 2009b,c). In essence, SFR is designed to optimally reproduce a desired sound field in a given listening area for a given finite setup, while keeping loudspeaker driving signals well behaved in order to respect physical constraints. SFR has three important aspects: • Design of a control point grid covering the listening area • Selection of active loudspeakers and a subset of control points based on position of the reproduced source and the geometry of the reproduction setup • Computation of loudspeaker filters using multiple-input multiple-output (MIMO) channel inversion. The grid of control points inside the listening area used in SFR is designed following the equivalence of sound reproduction in a continuous and sampled spatial domain, proved in the next section. Furthermore, SFR uses only those loudspeakers that mostly contribute to the sound field reproduction, as this strategy is already known to mitigate high-frequency spatial aliasing problems characteristic of loudspeaker arrays (Verheijen, 1997; Corteel, 2006; Corteel et al., 2008). The active loudspeakers are selected based on geometry, similar to (Verheijen, 1997). Also, in order to avoid over-fitting, the control points where desired sound field evolution cannot be locally matched are discarded following similar geometrical considerations. Finally, SFR uses a variant of MIMO channel pseudo-inversion with truncated singular value decomposition (SVD). This technique allows graceful degradation of sound field reproduction performance when the MIMO channel matrix is ill-conditioned, while keeping loudspeaker filters within practical physical system constraints. Being both 117 118 Reproducing Sound Fields Using Acoustic MIMO Inversion setup and listening area optimized, SFR is able to achieve higher sound field reproduction accuracy than Wave Field Synthesis (Berkhout, 1988), as will be shown with simulations. 6.1.1 Background The first spatial sound reproduction systems date back to the work of Blumlein (1931) on stereo systems in the first half of the last century. The successful two-channel stereo principle—still used widely today—was extended to the four-channel quadraphonic system (Scheiber, 1971) with the aim of providing full-circle spatial reproduction, but it was quickly abandoned, possibly due to unsatisfactory front localization, technical issues, and format incompatibilities. Surround systems using a higher number of channels, such as 5.1 (Allen, 1991; ITU-775, 1994) and 7.1, are based on the observation that accurate localization of the sound coming from the front is more important, and they use more loudspeakers in the front of the listener for improved frontal localization. Additionally, loudspeakers on the side and behind the listening position are used for providing ambience and side/rear localization. All previously mentioned surround sound systems create sound fields with correct spatial attributes only within a narrow listening area called the “sweet spot”. The problem of extending the listening area was addressed by two notable surround sound systems: Ambisonics (see Gerzon, 1973, 1980b) and Wave Field Synthesis (see Berkhout, 1988; Berkhout et al., 1992, 1993). Both approaches are based on an attempt to reproduce a desired sound field in an extended listening area. The theoretical foundations of Ambisonics were laid down in the 1970s (see Cooper and Shiga, 1972; Gerzon, 1973, 1980b), primarily for circular and spherical loudspeaker arrangements. At the heart of the ambisonic reproduction technique is the sound field mode matching in one, central listening spot. In this particular case, mode matching implies matching orthogonal components, such as cylindrical or spherical harmonics, of desired and reproduced sound fields. The early ambisonic systems suffered from a limited sweet-spot size, particularly at medium and high frequencies (Bamford and Vanderkooy, 1995), due to the use of modes of low order, an insufficient number of loudspeakers, and far-field loudspeaker models. Later, Daniel et al. (2003) provided extensions to the initial works on Ambisonics, considering higher order modes and modeling loudspeakers as point sources to more accurately account for propagation effects. Near-field higher-order Ambisonics was shown to have comparable performance to WFS for enclosing loudspeaker configurations (Daniel et al., 2003), but for practical systems, it lacks recording support in the form of a wide-band high-order sound field microphone. Wave Field Synthesis (WFS) systems, on the other hand, are based on the Helmholz integral equation (HIE), presented in Section 2.3. Recall that HIE shows how a desired sound field in a closed source-free (listening) domain can be reproduced by a continuous distribution of secondary monopole and dipole sources on the domain boundary. In the initial works (e.g., see Berkhout, 1988; Berkhout et al., 1993), WFS was derived starting from Rayleigh’s I and II integrals. Recall that for Rayleigh’s I and II integrals, the listening domain is a half-space and secondary sources—monopole in the former and dipole in the latter—are distributed on the bounding plane. Since 6.1 Introduction 119 the reproduction in the horizontal plane is far more important than in the vertical plane (Blauert, 1997), the creators of WFS focused on linear loudspeaker setups and to approximate the performance of planar source distributions. Using stationary phase approximation, they were able to derive the so-called 21/2 -dimensional WFS, which is able to approximately reproduce a desired sound field in the listening plane, while reproducing it exactly on a reference listening line. The initial WFS concept was extended by Start (1996) to include curved loudspeaker distributions, and Verheijen (1997) and de Vries (1996) to reproduce arbitrarily directive sources and use directional loudspeakers, respectively. More recently, Ahrens and Spors (2010) proposed an approach called Spectral Division Method (SDM), which formulates the sound field reproduction using planar and linear loudspeaker distributions as a spatio-temporal spectral inversion. They have shown that in some particular cases, such as the reproduction of plane waves using monopole sources, one is able to obtain a correct closed-form solution for loudspeaker driving signals. However powerful as theoretical tools, the mentioned approaches for sound field reproduction need to cope with limitations imposed by systems used in practice. Namely, they need to be applied to discrete loudspeaker distributions of limited spatial support, varying directivity and possibly multi-path propagation, while listening domains are of finite size. Some of these issues have been addressed by Verheijen (1997), who proposed the use of geometry-based loudspeaker subset selection and spatial tapering towards the loudspeaker array edges, to mitigate the impairments due to spatial aliasing and diffraction, respectively. Corteel (2006, 2007) used a similar loudspeaker selection method, while Spors (2007) proposed a slightly different approach based on the direction of sound intensity vectors. More recent approaches for computing loudspeaker filters for sound field reproduction use a discretization of the listening area and numerically solve a discrete optimization problem. This is partly due to unsatisfactory performance of analytical solutions in practical problems, and partly to avoid restricting the reproduction setup (e.g., having calibrated loudspeakers of prescribed directivity, free-field conditions etc.). One of the earliest numerical approaches, by Kirkeby and Nelson (1993), addresses reproduction of plane waves based on pseudo-inversion of a multichannel acoustic propagation matrix. Similar to (Kirkeby and Nelson, 1993), but applied to WFS for the purpose of room equalization and directive source reproduction, are the works (Corteel, 2006, 2007), where the control points are arranged on four listening lines. Gauthier and Berry (2007) use only four control points arranged as a quadrupole to compute loudspeaker filters that optimize a cost function consisting of the reproduction error and deviation from the initial WFS driving signals. 6.1.2 Chapter outline Section 6.2 describes theoretical and practical aspects of SFR. Section 6.3 presents extensive evaluation of SFR and its comparison with WFS. Practical considerations for realizing SFR systems are discussed in Section 6.4. Conclusions are presented in Section 6.5. 120 6.2 Reproducing Sound Fields Using Acoustic MIMO Inversion Sound Field Reconstruction As mentioned in the introduction, Sound Field Reconstruction (SFR) is a spatial sound reproduction approach which is based on the spectral properties of the plenacoustic function shown by Ajdler et al. (2006). More particularly, it is based on the essential spatial band-limitedness1 of the sound field that emanates from temporally band-limited sound sources. This section provides a description of sampling and interpolation of the plenacoustic function, and shows how these can be used for sound field reproduction with arbitrary reproduction setups. It also describes practical extensions that help to improve the sound field reproduction with SFR in specific finite domains, and briefly presents the design of discrete-time loudspeaker filters for SFR. 6.2.1 Plenacoustic sampling and interpolation In the most general sense, the plenacoustic function p(r, t) describes a sound field in space and time, irrespective of the sources evoking it. In a particular case where the sound field is evoked by a point source located at r 0 emitting a Dirac pulse, the plenacoustic function equals the time-dependent Green’s function g(r, t|r 0 ),2 i.e., the spatio-temporal impulse response of the acoustical medium from point r 0 to point r. The changes of the plenacoustic function in space at a temporal frequency ω can not happen at an arbitrary rate, but are limited by ω according to the relation (Ajdler et al., 2006) 2 (6.1) k 2 ≤ ωc2 , where k is the spatial frequency and c is the speed of sound propagation.3 Based on the observation (6.1), one can define a minimum spatial sampling frequency for a sound field of limited temporal bandwidth. If the maximum temporal frequency of the sources evoking the sound field is equal to ωm , then a spatial sampling frequency of ks = 2ωm /c is sufficient for representing the sound field. This observation extrapolates to a large extent to finite spatial segments (e.g., finite-length lines and finite-area rectangles), as shown in (Ajdler et al., 2006). The possibility of sampling a sound field has an implication that is useful in the context of SFR. Namely, it suggests that correct reproduction of a sound field on a grid of points can guarantee correctness of reproduction between the grid points. Without loss of generality, we show this result for the xy-plane in Theorem 1. However, we first give a lemma which simplifies proving said theorem. Lemma 1. If two functions f (x, y, t) and h(x, y, t), both band-limited in spatiotemporal frequency domain, with the maximum temporal frequency ωm and maximum 2 spatial frequency km = ωm /c (k 2 = kx2 + ky2 ≤ km ), are identical on a 2D grid (n∆x, m∆y) , n, m ∈ Z , 1 Essentially band-limited function in this context refers to a function which has most of its energy in a finitely-supported spectral region, while the energy outside of that region decays exponentially. time-dependent Green’s function has the form g(r, t|r 0 , t0 ), but we implicitly assume that the starting time is t0 = 0 and omit it for brevity. 2 The 3 The given limit on the spatial variations of sound pressure is not entirely correct, but it is to a large extent when sources are not inside of or very close to the considered spatial domain. 6.2 Sound Field Reconstruction 121 where ∆x ≤ km π , ∆y ≤ km π , then they are identical everywhere. Proof. The proof follows from the fact that a band-limited function (e.g., a sound field) is uniquely defined by its samples on a grid satisfying the Nyquist criterion. Since the functions f (r, t) and h(r, t) are both band-limited with the same spectral support, and have identical values on a sampling grid satisfying the Nyquist criterion, they must be identical everywhere. Theorem 1. Consider two sets of sound sources, S1 = {s11 (t), ..., s1k (t)} and S2 = {s21 (t), ..., s2l (t)}, where each source is band-limited with the maximum temporal frequency ωm . Denote by rij the location of source sij (t). Assume that the spectral support of each Green’s function g(r, t|rij ), evaluated in the xy-plane, is confined to the double cone 2 k 2 = kx2 + ky2 ≤ ωc2 . The sound field in the xy-plane evoked by source sij (t) is given by Z pij (x, y, t) = g(x, y, τ |rij ) sij (t − τ ) dτ . Further, let P1 (x, y, t) and P2 (x, y, t) be the superposed sound fields of the sources in S1 and S2 , respectively, given by X Pi (x, y, t) = pij (x, y, t) . j The two sound fields, P1 (x, y, t) and P2 (x, y, t), are identical in the entire xy-plane if they are identical on a grid given by (n∆x, m∆y) , with ∆x ≤ n, m ∈ Z , (6.2) ωm ωm , ∆y ≤ . cπ cπ Proof. The sound field of each source is band-limited in time and space, with maximum temporal and spatial frequencies ωm and km = ωm /c, respectively. Consequently, the sound fields P1 (x, y, t) and P2 (x, y, t), being superpositions of functions band-limited in space and time, are also band-limited with the same maximum frequencies. Their equality follows from Lemma 1. Note also that even if the spatio-temporal spectrum of Green’s function is not confined to the double cone defined by k 2 ≤ ω 2 /c2 , its propagating part is.4 Therefore, it follows that the propagating parts of two sound fields are equal if they are equal on the grid given by (6.2). 4 As mentioned previously, the propagating part contains essentially the entire energy of a sound field. 122 Reproducing Sound Fields Using Acoustic MIMO Inversion Based on this observation, the problem of reproducing the sound field that emanates from temporally band-limited sources with maximum temporal frequency ωm is equivalent to reproducing the sound field on a grid of control points spaced at or above the Nyquist spatial sampling frequency ks = 2ωm /c. In the case of practical sound field reproduction with an array of loudspeakers, the listening area, and thus also the control grid, are finite. Consequently, sound field reproduction can be expressed as a MIMO problem. 6.2.2 Sound Field Reconstruction using MIMO channel inversion MIMO channel inversion is a standard problem that reappears in many multichannel sound applications, such as multi-point room equalization, sound field reproduction, and beamforming (see Kirkeby and Nelson, 1993; Kirkeby et al., 1998; Corteel, 2006). For the sake of completeness, we will present the MIMO channel inversion problem and the particular solution used in SFR. The problem of MIMO channel inversion in the context of sound field reproduction is illustrated in Figure 6.1. The reproduction setup includes an array of L loudspeakers and a grid of M control points covering the listening area, illustrated in Figure 6.1(a). In addition, as shown in Figure 6.1(b), there is a desired acoustic scene that contains N sound sources that would evoke the desired sound field in the listening area. Positions of loudspeakers, control points, and desired sources are known. The transfer function Aij (ω) denotes the sound propagation channel between the jth desired source and ith control point. Similarly, Gik (ω) denotes the sound propagation channel between the kth loudspeaker and the ith control point. Both Aij (ω) and Gik (ω) are known for all pairs desired source-control point and loudspeaker-control point, respectively, either through a theoretical model or through measurement. The goal of the MIMO channel inversion in the context of SFR is the reproduction of the desired sound scene in M control points, i.e., computation of the loudspeaker driving signals that evoke the same signals at the control points as the original sound scene. desired sources loudspeaker array listening area control points (a) Reproduction setup listening area control points (b) Desired sound scene Figure 6.1: Multichannel inversion problem overview. Note that the problem of multichannel inversion can be represented as a superposition of N independent sub-problems, each involving a single desired source. The 6.2 Sound Field Reconstruction 123 loudspeaker signals can then be obtained by summing the contributions for each single-source sub-problem. Thus, without loss of generality, the following MIMO channel inversion analysis is presented only for the first desired source. Denote by S1 (ω), Xj (ω), and Yk (ω) the Fourier transforms of signals of the desired source, the output of the jth loudspeaker, and the sound pressure at the kth control point, respectively. Furthermore, denote by Dl (ω) the signal at the lth control point in the desired sound scene containing only the first desired source. The signals Di (ω) are determined by the effects of the sound propagation paths from the desired source to the control points, and are described by the following product: D(ω) = A(ω) S1 (ω) , (6.3) where T D(ω) = [D1 (ω) D2 (ω) . . . DM (ω)] T A(ω) = [A11 (ω) A21 (ω) . . . AM 1 (ω)] . The signals produced by the loudspeakers at the control points are determined by the sound propagation effects on the loudspeaker signals, and are given by Y (ω) = G(ω) X(ω) , (6.4) where T Y (ω) = [Y1 (ω) Y2 (ω) . . . YM (ω)] G11 (ω) G12 (ω) . . . G1L (ω) G21 (ω) G22 (ω) . . . G2L (ω) G(ω) = . .. . . .. .. . . . GM 1 (ω) GM 2 (ω) . . . GM L (ω) T X(ω) = [X1 (ω) X2 (ω) . . . XL (ω)] . The task of the multichannel inversion is to compute the signals Xj (ω) using the desired signal S1 (ω), i.e., X(ω) = H1 (ω) S1 (ω) , (6.5) where T H1 (ω) = [H11 (ω) H21 (ω) . . . HL1 (ω)] , such that the difference (error) between the vector Y (ω) and vector D(ω), corrected by a constant delay ∆ accounting for the propagation time differences or the modeling delay, is minimized. The multichannel inversion problem is illustrated in Figure 6.2. The solution which minimizes the error power, i.e., the mean squared error (MSE) solution, is given by (e.g, see Nelson and Elliott, 1992) H1 (ω) = e−iω∆ G+ (ω) A(ω) . (6.6) Since it uses a pseudo-inverse G+ (ω) of the transfer matrix G(ω), finding the pseudoinverse of the matrix G(ω) becomes the central problem of MIMO channel inversion. Reproducing Sound Fields Using Acoustic MIMO Inversion 124 Figure 6.2: Block diagram illustrating the MIMO channel inversion problem. The classical full-rank pseudo-inverse expression is given by G+ (ω) = GH (ω) G(ω) −1 GH (ω) , (6.7) where the matrix GH (ω) is the conjugate-transpose of the matrix G(ω). At low frequencies, where the condition number of the matrix G(ω) is large (making it effectively low-rank), (6.7) gives filters with gains beyond the physical limitations of practical loudspeakers. The regularized pseudo-inversion used in (Kirkeby et al., 1998; Corteel, 2006) is also of limited use, as it does not allow easy control of the trade-off between the reproduction accuracy and maximum filter gains. Like a number of MIMO inversion solutions in acoustics (e.g., see Hannemann and Donohue, 2008), we use a pseudo-inversion method based on the truncated singular value decomposition (SVD), which prunes singular values that are below a defined threshold (see Golub and Kahan, 1965). In particular, if G(ω) = U (ω) Σ(ω) V H (ω) (6.8) is the SVD of the matrix G(ω), then the pseudo-inverse of the matrix G(ω) is given by G+ (ω) = V (ω) Σ+ (ω) U H (ω) , (6.9) where the matrix Σ+ (ω) is obtained from Σ(ω) by first setting to zero the singular values whose absolute values are below a defined threshold , replacing the other singular values by their reciprocal, and taking the matrix transpose in the end (Golub and Kahan, 1965). The threshold can be adapted to the matrix G(ω), i.e., it can be set to a fraction of the largest singular value of G(ω).5 At high frequencies, where all singular values of the matrix G(ω) are larger than the threshold, this procedure gives the result identical to (6.7). However, at low frequencies, it gives near-optimal solutions while keeping the loudspeaker filter gains within practical limits. A more detailed treatment of this MIMO channel inversion problem is given in (Kolundžija et al., 2009b). 6.2.3 Practical extensions of Sound Field Reconstruction Filter correction through power normalization on the reference line All sound field reproduction approaches give loudspeaker driving signals that do not provide correct sound field reproduction accuracy above a certain aliasing frequency. 5 In the simulations presented in Section 6.3, the threshold was 20 dB below the largest singular value of G(ω). 6.2 Sound Field Reconstruction 125 Although coming from the same physical limitations as in approaches such as WFS, and which are inherent to the geometry of the used loudspeaker array and location of the reproduced source, the high-frequency problems of SFR can be explained from another perspective. Namely, at high frequencies, where the constructive interference of sound fields of different sources can not be achieved, the least mean squared error solution is biased towards highly attenuating all signals, such that the reconstruction error approaches the desired signal.6 A way of avoiding the aforementioned problems—although not providing the correct reproduction in the wide listening area—is normalizing the filters’ gains at all frequencies such that on a grid of control points, the average power of the reproduced field is equal to the average power of the desired field. In particular, if A1 (ω), . . . , AM (ω) and Y1 (ω), . . . , YM (ω) are respectively the amplitudes of the desired and the reproduced sound field at frequency ω in M control points, then each loudspeaker filter is corrected by H̃i (ω) = cf (ω) Hi (ω) , (6.10) where cf (ω) is a correction factor given by qP M i=1 A2i (ω) i=1 Yi2 (ω) cf (ω) = qP M . (6.11) Loudspeaker subset selection While it might seem beneficial to use all loudspeakers for reproduction with SFR, there are many cases where using only a subset of loudspeakers can give better reproduction provided the optimization is done for a specific finite listening area. This observation for WFS was given by Verheijen (1997), and was later used by various authors (see Corteel, 2006, 2007; Corteel et al., 2008), where it was shown how based on the location of the primary (reproduced) source and the listening area, one can select a sub-array of loudspeakers which physically contribute the most to the sound field reproduction. There is a plausible explanation for such a selection. Considering the case where an impulsive sound arrives from the primary source, one expects that at all locations in the listening area, the received sound is of similar duration and consequently without significant spectral impairments. However, using all loudspeakers makes a combination of the impulse responses—due to different delays—more spread in time and more varying across different positions than in the case when only a subset of loudspeakers is used, causing both temporal and spectral deviations. The selection procedure considers only those loudspeakers that are inside—extended by a predefined selection margin—the cone defined by the primary source and the boundaries of the listening area, as shown in Figure 6.3. The rationale behind such a choice is twofold: first, it uses the loudspeakers whose contribution is largest when all loudspeakers are used, preserving most of the reconstruction accuracy, and second, 6 This is a known phenomenon in Wiener filtering, where at low SNRs, the gain of the Wiener filter approaches zero. 126 Reproducing Sound Fields Using Acoustic MIMO Inversion Loudspeaker line Selection margin Primary source Visible loudspeakers Selected loudspeakers Listening area Selection margin Figure 6.3: Illustration of loudspeaker selection based on the primary source position. The visible loudspeakers are inside of the angle subtended by the listening area to the primary source. Loudspeakers within a selection margin of the visible loudspeakers are also selected. the active loudspeakers have lowest delay spreads due to differences in propagation distance and their position relative to the sound wavefront. Figure 6.4 illustrates how reproduction accuracy of SFR does not change notably when only a subset of six (out of 18) loudspeakers is used for reproducing a sinusoid at frequency f = 500 Hz. The selected loudspeakers lie in the minimal cone centered at the position of the primary source that contains the listening area. Additionally, loudspeakers outside the minimal cone but within a selection margin can also be selected. Control points selection One also needs to be careful with control point selection, since due to physical limitations set by the loudspeaker array and primary source locations, in some parts of the listening area it is impossible to reproduce the evolution of the desired sound field. Thus, it is physically justified to place control points at locations where the sound wave fronts from the primary source and loudspeakers roughly move towards the same direction. In the case of primary point sources, control points form a subset of the reference grid which lies inside of a cone defined by the primary source and the active loudspeakers, as shown in Figure 6.5(a). For plane wave sources, control points form a subset of the reference grid which lies inside of a stripe defined by the active loudspeakers and the plane wave propagation direction, as shown in Figure 6.5(b). It should also be noted that the selected control points should not lie near the loudspeaker array or the primary source in order to avoid the solution’s sensitivity to evanescent (non-propagating) waves. Evanescent waves (Williams, 1999) are a local phenomenon which does not persist with increased distance. Thus, taking sound 6.2 Sound Field Reconstruction 127 (b) 4 3 3 y [m] y [m] (a) 4 2 1 0 2 1 4 6 x [m] 8 0 4 xl 6 x [m] 8 (d) (c) −13.5 4 −14 y [m] 3 −14.5 2 −15 −15.5 1 desired SFR full SFR subset −16 0 4 xl 6 x [m] 8 −16.5 0 1 2 3 4 Figure 6.4: Comparison of SFR with the entire loudspeaker array from Figure 6.9 and SFR with only a sub-array of six selected loudspeakers used to reproduce a point source with frequency f = 500 Hz located at rm = (3 m, 1 m). The used loudspeakers are marked with squares. (a) Snapshot of the desired sound field; (b) snapshot of the sound field reproduced using all loudspeakers; (c) snapshot of the sound field reproduced with the selected loudspeaker sub-array; (d) magnitude response of the three sound fields on the reference line at frequency f = 500 Hz. propagation functions that contain significant evanescent wave energy amounts to model over-fitting and compromises the sound field reproduction accuracy in a larger listening area. 6.2.4 Designing discrete-time filters for Sound Field Reconstruction The frequency-domain SFR filter design procedure uses a non-linear step of discarding small singular values of the system matrix G(ω). The resulting frequency response and the distribution of singular values at different frequencies are shown in Fig 6.6. Apparently, SFR filters H̃k (ω) have abrupt changes around frequencies where singular values cross the predefined threshold . As a consequence, filters H̃k (ω) have long impulse response, which is the main obstacle for designing practical, short discretetime SFR filters. However, it turns out that filters h̃k (t), despite of being piecewise smooth functions with a few discontinuities, are well localized in time and most of their energy is concentrated around one main pulse, as shown in Figure 6.7. Therefore, shorten- Reproducing Sound Fields Using Acoustic MIMO Inversion 128 Reference grid Reference grid Loudspeaker line Loudspeaker line Primary source Selected points Selected points (a) (b) Figure 6.5: Illustration of control points selection based on positions of the used loudspeakers and position or direction of the primary source. The selected control points lie inside of the cone or stripe defined by the loudspeaker positions and primary source position or direction, respectively. (a) Point source reproduction; (b) plane source reproduction. (a) 8 σ1 σ2 6 σ3 σ σ4 σ5 4 σ6 ε 2 0 200 400 600 800 1000 f [Hz] 1200 1400 1600 1800 (b) filter gain [dB] −8 H1 −10 H 2 −12 H3 H4 −14 H 5 −16 H6 −18 −20 0 200 400 600 800 1000 f [Hz] 1200 1400 1600 1800 Figure 6.6: (a) Singular values of the loudspeaker propagation matrix G(ω) at different frequencies ω; (b) magnitude response of SFR filters Hk (ω) obtained from the SFR frequency-domain filter calculation procedure. 6.2 Sound Field Reconstruction 129 ing h̃k (t) does not severely affect the reproduction accuracy, and enables designing efficient discrete-time filters as combinations of a pure delays δNk [n] and a short FIR filters hk [n] (Kolundžija et al., 2009c). Time-domain aliasing Figure 6.7: Conceptual illustration of discrete-time SFR filter design. (1) removing delay dk ; (2) frequency sampling using IDFT; (3) shortening of the filter H̃k (ω), given in (6.10). The SFR discrete-time filter design procedure, illustrated in Figure 6.7, use the following three steps: • Delay removal: The main peak of filters h̃k (t) can have a long delay for sources far away from the loudspeaker array. In order to avoid using excessively long filters that can accommodate a wide range of different delays, the filters’ delays are extracted and realized separately. The delay dk of the main peak of the SFR filter h̃k (t) is extracted considering source-loudspeaker distance and using regression of the phase characteristics of the filter H̃k (ω) (Kolundžija et al., 2009c). • Frequency sampling: At the same time, the problem of frequency sampling of filters H̃k (ω) needs to be solved. In other words, it is necessary to choose the length NT of the inverse discrete Fourier transform (IDFT) used to obtain the discrete-time impulse response h̃k [n]. NT needs to be large enough to give a low time-domain aliasing error. In the setup we used for evaluations, described in Reproducing Sound Fields Using Acoustic MIMO Inversion 130 the next section, NT = 2048 turned out to be long enough for avoiding notable aliasing artifacts at sampling frequency fs = 48 kHz. • Impulse response windowing and delaying: In the end, h̃k [n] is shortened with a tapering window w[n] of length NF (NF < NT ) and delayed by NF /2 in order to make it causal. Figure 6.8 shows an SFR filter of length NF = 512 samples obtained by the described procedure with IDFT length NT = 2048, for the sampling frequency fs = 48 kHz. 0.3 hdi [n] 0.2 0.1 0 −0.1 0 50 100 150 200 250 sample 300 350 400 450 500 Figure 6.8: SFR filter of length NF = 512 samples obtained from frequency response H̃k (ω) using a DFT of length NT = 2048. 6.3 Evaluation SFR was evaluated with simulations of the sound reproduction setup shown in Figure 6.9. The performance of SFR was compared with two different variants of WFS: • WFS I : Basic WFS, as proposed in the initial works on WFS (Berkhout, 1988; Berkhout et al., 1992). • WFS II : WFS that uses loudspeaker selection procedure described in Section 6.2.3, variants of which were proposed by Verheijen (1997) and Corteel (2006); Corteel et al. (2008). In both variants of WFS, we used a double-sided frequency-independent half-cosine tapering window to mitigate the edge effects. The length of the taper on each end was 15 % of the loudspeaker array length. WFS filters were computed starting from the loudspeaker driving function formulas found in WFS literature (Berkhout et al., 1993; Verheijen, 1997). Additionally, a common correction filter cf (ω) was computed and applied to all active loudspeakers in order to achieve the desired average power on the points on the reference line. This procedure was described in Section 6.2.3. The reproduction setup, which is shown in Figure 6.9, consists of 18 loudspeakers spaced at 15 cm. Loudspeakers are modeled as point sources emitting spherical waves. The reproduced primary sources, on the other hand, are modeled as both point and plane sources, emitting spherical and plane waves, respectively. 6.3 Evaluation 131 Loudspeaker line Reference line Control points RE Primary source 2cm RC 4m Listening area 15cm 72cm 4m 6m RS 8m 10m Figure 6.9: A sound field reproduction setup using a linear loudspeaker array of 18 loudspeakers spaced at ∆l = 15 cm. The listening area is a square of size 4 m, 2 m in front of the loudspeaker array. Two different sets of simulations were performed. The first set gives insight into the reproduction accuracy of the tested approaches both in the frequency and time domain through sound field snapshots. The second set of simulations gives a more thorough quantitative performance analysis of the tested approaches. It does so by exhaustively analyzing magnitude frequency responses and group delay errors for a large number of reproduced sources and a large number of listening positions. 6.3.1 Sound field snapshot analysis Sinusoidal sources The first simulation analyzes the spatial accuracy of reproduction of a sinusoidal (single-frequency) point source. It compares snapshots of the desired sound field and sound fields reproduced with the three tested approaches. Figures 6.10 and 6.11 show comparisons for the reproduction of a sinusoidal point source at frequencies f1 = 500 Hz and f2 = 2 kHz, respectively. Low-frequency reproduction, as can be observed in Figure 6.10, is accurate with all three simulated approaches. However, as the frequency increases, aliasing artifacts begin to appear. Figure 6.11 shows the difference in the aliasing artifacts between the three approaches. While WFS I has visible aliasing artifacts along the entire listening area (visible as zero responses along multiple directions), SFR and WFS II have only few directional nulls at the periphery of the listening area, and thus preserve spatial reproduction accuracy up to higher frequencies. Reproducing Sound Fields Using Acoustic MIMO Inversion 132 (b) 4 3 3 y [m] y [m] (a) 4 2 1 0 2 1 4 6 x [m] 0 8 4 xl 4 3 3 2 1 0 8 (d) 4 y [m] y [m] (c) 6 x [m] 2 1 4 xl 6 x [m] 8 0 4 xl 6 x [m] 8 Figure 6.10: Comparison of WFS and SFR in reproducing a point source with frequency f1 = 500 Hz located at rm = (3 m, 1 m). The used loudspeakers are marked with squares. Sound field snapshots: (a) desired, (b) WFS I, (c) WFS II, and (d) SFR. Low-pass filtered pulse train The second simulation shows differences between WFS and SFR from a different perspective. Namely, while the first simulation focused on spatial reproduction accuracy as a function of frequency, this simulation focuses on the spatial reproduction accuracy in a wide range of frequencies. The reproduced primary source is a plane source at angle φ = 180◦ emitting a train of low-pass filtered pulses p(t) spaced in time by Tp = 4 ms. The shape of a single pulse is shown in Figure 6.12. Figure 6.13 shows snapshots of the desired sound field and the sound fields reproduced with SFR and the two variants of WFS. Note that in this scenario loudspeaker selection has no effect and as a consequence all three approaches use all loudspeakers and the two variants of WFS are identical. From the snapshots of the reproduced fields, it is apparent that the shape of sound wave fronts is accurately reproduced across the listening area with both WFS and SFR. However, observing the amplitude of the emitted pulses across the listening area, one can see that with WFS amplitude notably decreases towards the sides. SFR, on the other hand, does not suffer from this problem. Figure 6.14, which shows magnitude frequency responses in the center and at both ends of the listening line (located four meters in front of the loudspeaker array), corroborates the previous visual observation from the sound field snapshots. In par- 6.3 Evaluation 133 (b) 4 3 3 y [m] y [m] (a) 4 2 1 2 1 0 4 6 x [m] 0 8 4 xl 6 x [m] (d) 4 4 3 3 y [m] y [m] (c) 8 2 1 2 1 0 4 xl 6 x [m] 0 8 4 xl 6 x [m] 8 Figure 6.11: Comparison of WFS and SFR in reproducing a point source with frequency f1 = 2 kHz located at rm = (3 m, 1 m). The used loudspeakers are marked with squares. Sound field snapshots: (a) desired, (b) WFS I, (c) WFS II, and (d) SFR. p(t) 0.1 0.05 0 0 0.5 1 1.5 2 2.5 t [ms] Figure 6.12: A low-pass pulse with a cut-off frequency fc = 3 kHz used for constructing a pulse train. ticular, it shows how SFR’s low-frequency magnitude response at both ends of the listening line is flatter and less attenuated relative to the desired characteristics when compared to WFS. At this point, some preliminary conclusions can be drawn about the advantages of SFR over WFS. Compared to WFS I, SFR provides more graceful degradation of reproduction accuracy across the listening area as the frequency increases. This effectively means that compared to WFS I, SFR increases both the aliasing frequency Reproducing Sound Fields Using Acoustic MIMO Inversion 134 (b) 4 3 3 y [m] y [m] (a) 4 2 1 0 2 1 4 6 x [m] 0 8 4 xl 4 3 3 2 1 0 8 (d) 4 y [m] y [m] (c) 6 x [m] 2 1 4 xl 6 x [m] 8 0 4 xl 6 x [m] 8 Figure 6.13: Comparison of WFS and SFR in reproducing a plane source located at an angle φ = π that emits a train of low-pass pulses with a period of Tp = 4 ms. The used loudspeakers are marked with squares. Sound field snapshots: (a) desired, (b) WFS I, (c) WFS II, and (d) SFR. margin and enlarges the effective listening area. Furthermore, loudspeaker subset selection in WFS II helps decreasing the aliasing artifacts as the frequency increases, but as will be shown next, this improvement comes at the cost of increasing average magnitude spectral deviations across the listening space. 6.3.2 Impulse response analysis In order to remove the influence of particular source and listening positions and make a more general observation about the performance of WFS and SFR, we performed a number of simulations involving multiple primary source and listening positions. The simulated primary sources—30 altogether—were divided into three different categories, with each category containing ten sources (see Figure 6.15): • Type I : Focused, frontal point sources located inside of a triangle whose vertexes coincide with two outer loudspeakers and the point C(6 m, 2 m), regularly spaced along the x axis and with y coordinates chosen uniformly at random within the triangle boundaries. • Type II : Point sources located closely behind the loudspeaker array. In the simulations, these sources were regularly spaced along the x axis between x = 0 and x = 4 m, and their y coordinates were chosen uniformly at random between y = 0 and y = 4 m. 6.3 Evaluation 135 (a) Y / Y d [ dB] 10 0 −10 −20 −30 2 10 3 4 10 10 f [Hz] (b) Y / Y d [ dB] 10 0 −10 −20 −30 2 10 WFS SFR 3 4 10 10 f [Hz] (c) Y / Y d [ dB] 10 0 −10 −20 −30 2 10 3 4 10 10 f [Hz] Figure 6.14: Normalized magnitude frequency responses of WFS and SFR in the control points (a) RS (8 m, 0), (b) RC (8 m, 2 m), and (c) RE (8 m, 4 m) relative to a plane wave source located at an angle φ = π. • Type III : Point sources far away from the loudspeaker array, which were modeled as sources emitting plane waves. In the performed simulations, these plane wave sources were positioned at ten regularly-spaced directions between 165◦ and 180◦ . The three categories of simulated primary sources are illustrated in Figure 6.15 The impulse responses for each simulated primary source were computed on a finite rectangular grid of listening points spaced at 1 m along the x axis and 50 cm along the y axis, shown in Figure 6.15. For each primary source category, we formed aggregated plots containing statistics of normalized magnitude frequency responses and group delay errors for all listening points and all primary sources in the category. The normalized magnitude frequency responses are given by Yn (f ) = Y (f ) , Yd (f ) (6.12) where Y (f ) is the reproduced field’s magnitude response in a listening point and Yd (f ) is the desired magnitude response in that point. The group delay error eτ (f ) is given by the difference between the group delay τg (f ) of the reproduced impulse response in a listening point and the group delay τgd (f ) of the desired impulse response in that point eτ (f ) = τg (f ) − τgd (f ) . (6.13) 136 Reproducing Sound Fields Using Acoustic MIMO Inversion Listening points Loudspeaker line Type III Type II Type I 15cm 4m 1m 72cm 4m C 50cm 2m 4m Figure 6.15: Sound field reproduction setup showing three categories of simulated primary sources: Type I, Type II, and Type III, and a grid of listening points covering the listening area, where responses are computed. The plots contain 5 − 95 percentiles, 25 − 75 percentiles, and the median value of the said quantities across the audible frequency range. The aggregated statistical plots of magnitude and group delays provide insight not only into how accurate the tested approaches are on average, but they also show to what extent reproduction accuracy varies across space. Statistical magnitude frequency response plots Figures 6.16, 6.17, and 6.18 show the previously described magnitude frequency response statistical plots for primary sources of Type I, II, and III, respectively. It can be seen that with SFR, the 25 − 75 percentiles of magnitude frequency responses are within 2 dB of the desired responses up to around 4 kHz for all three primary source categories. The median of SFR’s normalized magnitude response lies at 0 dB across low frequencies, as opposed to the median magnitude responses of the two WFS approaches, which vary around 0 dB. Although 5 − 95 percentiles exhibit variations around the median up to around 10 dB, meaning that for some source-listening position pairs the reproduced impulse response differs significantly from the desired one, they are notably smaller than the variations of the corresponding percentiles of magnitude frequency responses of the two WFS approaches. Above the frequency of 4 kHz, the three approaches perform similarly due to spatial aliasing. It should be noted that WFS II exhibits more spectral magnitude artifacts in the extended listening area compared to WFS I. It can thus be said that the previously observed improvement of aliasing performance, apparent in Figure 6.11, comes at the cost of reducing the listening area size. This observation was also reported by Corteel et al. (2008). 6.3 Evaluation 137 WFS I Y / Y d [ dB] 5 0 −5 −10 −15 2 10 3 10 f [Hz] WFS II 4 10 Y / Y d [ dB] 5 0 −5 −10 −15 2 10 3 10 f [Hz] SFR 4 10 Y / Y d [ dB] 5 0 −5 −10 −15 2 10 3 10 f [Hz] 4 10 Figure 6.16: Normalized magnitude frequency responses of WFS I, WFS II, and SFR for focused point sources (Type I) on a grid of listening points. Light-gray area shows 25 − 75 percentiles, dark-gray area shows 5 − 95 percentiles, and solid line shows the median. Statistical group delay error plots Figures 6.19, 6.20, and 6.21 show the group delay error statistical plots for primary sources of Type I, II, and III, respectively. From Figures 6.19 and 6.21, it can be observed that the two WFS approaches have virtually the same group delay performance for focused (Type I) and plane (Type III) sources. This is not surprising, as for most of the simulated focused and plane wave sources, both WFS approaches use all loudspeakers. For focused and plane sources, SFR’s group delay performance is on average better or comparable to both WFS approaches, as can be observed from 25 − 75 group delay error percentiles. SFR’s group delay error is more variable in the extreme cases in the frequency range 500 − 2000 Hz, which is apparent from observing 5 − 95 percentiles. Note, however, that the group delay errors in the frequency range 500 − 2000 Hz are below the group delay discrimination threshold of 2 ms, as found by Flanagan et al. (2005).7 Therefore, a slightly higher group delay variance of SFR should not be a cause of notable perceptual artifacts. Figure 6.20 shows that the group delay performance of WFS II is superior to WFS I when reproducing point sources behind the loudspeaker array (Type II sources). SFR, on the other hand, has group delay errors which are similar to WFS II: on 7 See also (Blauert and Laws, 1978). Reproducing Sound Fields Using Acoustic MIMO Inversion 138 WFS I Y / Y d [ dB] 5 0 −5 −10 −15 2 10 3 10 f [Hz] WFS II 4 10 Y / Y d [ dB] 5 0 −5 −10 −15 2 10 3 10 f [Hz] SFR 4 10 Y / Y d [ dB] 5 0 −5 −10 −15 2 10 3 10 f [Hz] 4 10 Figure 6.17: Normalized magnitude frequency responses of WFS I, WFS II, and SFR for point sources behind the loudspeaker array (Type II) on a grid of listening points. Light-gray area shows 25 − 75 percentiles, dark-gray area shows 5 − 95 percentiles, and solid line shows the median. average, SFR is slightly better, but also slightly more variable in the frequency range 500 − 2000 Hz. 6.3.3 Discussion From the frequency-domain analysis of the impulse responses on a grid of listening points, three observation can be made. • It is apparent that the low-frequency response of SFR exhibits little spectral deviations up to almost 4 kHz for all categories of simulated primary sources and all listening points. Both WFS variants, on the other hand, suffer from higher coloration artifacts across space in the low-frequency range. • It can be seen that more notable spectral deviations across space start at higher frequencies with SFR when compared to both WFS approaches. • The average group delay performance of SFR is slightly better than the two WFS variants, but slightly more variable in the low-frequency range 500 − 2000 Hz. Nevertheless, the range of group delay error variations in this frequency range is below the group delay discrimination threshold of 2 ms (Flanagan et al., 2005). The presented extensive comparisons confirm the previous observation that SFR provides an effective extension of the listening area with correct sound field reproduc- 6.4 Practical considerations 139 Y / Y d [ dB] WFS I 0 −10 −20 2 Y / Y d [ dB] 10 4 10 0 −10 −20 2 10 Y / Y d [ dB] 3 10 f [Hz] WFS II 3 10 f [Hz] SFR 4 10 0 −10 −20 2 10 3 10 f [Hz] 4 10 Figure 6.18: Normalized magnitude frequency responses of WFS I, WFS II, and SFR for plane wave sources at directions α ∈ [165◦ , 180◦ ] (Type III) on a grid of listening points. Light-gray area shows 25−75 percentiles, dark-gray area shows 5−95 percentiles, and solid line shows the median. tion. Additionally, it raises the reproduction aliasing frequency when compared with both variants of WFS, as described in Section 6.2.3. 6.4 6.4.1 Practical considerations Computational complexity The presented approach to reproducing physical sound fields has an appealing performance in terms of reproduction accuracy. However, it comes at a cost of increased complexity that stems from solving an MIMO inversion problem in the frequency domain. For each virtual source, the reproduction system needs to perform SVD of the M × L matrix G(ω) at N2F + 1 frequencies.8 Since the number of control points M is usually larger than the number of loudspeakers L, the complexity of obtaining SFR filters for one virtual source is given by Θ(M 2 LNF ). High computational complexity makes real-time calculation of loudspeaker filters with SFR, like with most numerical approaches, difficult. Instead, one can produce a database of reproduction filters offline, which can then be read in real time for sound field rendering purposes. This would entail dividing the reproduction zone 8 Real-valued filters have conjugate-symmetric spectra. Reproducing Sound Fields Using Acoustic MIMO Inversion 140 τ g − τ gd [ ms] WFS I 1 0 −1 −2 −3 2 τ g − τ gd [ ms] 10 3 10 f [Hz] WFS II 4 10 1 0 −1 −2 −3 2 τ g − τ gd [ ms] 10 3 10 f [Hz] SFR 4 10 1 0 −1 −2 −3 2 10 3 10 f [Hz] 4 10 Figure 6.19: Group delay errors eτ (f ) = τg (f ) − τgd (f ) of WFS I, WFS II, and SFR for focused point sources (Type I) on a grid of listening points. Light-gray area shows 25 − 75 percentiles, dark-gray area shows 5 − 95 percentiles, and solid line shows the median. with a polygonal mesh and pre-computing the filters corresponding to every element of the mesh. In the simplest case of a uniform rectangular mesh, the SFR filters for a single virtual source can be obtained in constant time based on its position. If the rectangular mesh is non-uniform, the filters are obtained in the time it takes to locate the rectangle that contains the virtual source. This complexity is Θ(log Nx ), where Nx is the number of rectangles along the dimension that contains more rectangle “stripes”. For a general mesh, the use of space partitioning data structures, such as kd-trees (Bentley, 1975), makes the complexity of obtaining SFR filters Θ(log NM ), where NM is the mesh size. The previously mentioned filter pre-computation methods allow for real-time sound field rendering, and have already been proposed and used in practical multichannel sound field reproduction systems. 6.4.2 Performing system measurements It has already been stressed that SFR does not put limiting constraints on the reproduction setup. It works irrespectively of loudspeaker or desired source directivity, loudspeaker calibration, or sound propagation characteristics, as long as one is able to obtain the MIMO acoustic channel involving a dense grid of control points. In a practical reproduction system, this requirement might be too strict, as it is hard to imagine that one would measure loudspeaker responses on a fine grid, especially in 6.5 Conclusions 141 τ g − τ gd [ ms] WFS I 4 2 0 2 3 10 f [Hz] WFS II 4 10 τ g − τ gd [ ms] 10 4 2 0 2 3 10 f [Hz] SFR 4 10 τ g − τ gd [ ms] 10 4 2 0 2 10 3 10 f [Hz] 4 10 Figure 6.20: Group delay errors eτ (f ) = τg (f ) − τgd (f ) of WFS I, WFS II, and SFR for point sources behind the loudspeaker array (Type II) on a grid of listening points. Light-gray area shows 25 − 75 percentiles, dark-gray area shows 5 − 95 percentiles, and solid line shows the median. larger venues. Instead, one could compromise by at least measuring the system on a contour in the reproduction plane that encloses the listening area. By doing so, one trades off some reproduction accuracy for practicality. Simulation experiments involving enclosing contours instead of covering grids, not presented in this chapter for the sake of space, show that SFR does not suffer from a noticeable performance loss with the said simplification. 6.5 Conclusions We described Sound Field Reconstruction—a technique for reproducing sound fields in an extended listening area using an array of loudspeakers. SFR is based on a numerical optimization procedure for MIMO channel inversion. The control points covering the listening area, used by the MIMO channel inversion procedure, are spaced below the Nyquist criterion to avoid aliasing. Additionally, SFR uses geometry-based loudspeaker and control points selection to mitigate artifacts due to aliasing and over-fitting. SFR is a flexible sound field reproduction approach applicable to loudspeaker arrays with different topologies and directivities. It also enables reproducing directive sources, and does not restrict applications to free-field or anechoic sound propagation Reproducing Sound Fields Using Acoustic MIMO Inversion 142 WFS I τ g − τ gd [ ms] 4 2 0 2 10 3 10 f [Hz] WFS II 4 10 τ g − τ gd [ ms] 4 2 0 2 10 3 10 f [Hz] SFR 4 10 τ g − τ gd [ ms] 4 2 0 2 10 3 10 f [Hz] 4 10 Figure 6.21: Group delay errors eτ (f ) = τg (f ) − τgd (f ) of WFS I, WFS II, and SFR for plane wave sources at directions α ∈ [165◦ , 180◦ ] (Type III) on a grid of listening points. Light-gray area shows 25−75 percentiles, dark-gray area shows 5−95 percentiles, and solid line shows the median. conditions. We showed that, compared to Wave Field Synthesis, which is the state of the art technique for sound field reproduction using loudspeaker arrays, SFR achieves better average reproduction accuracy in an extended listening area and preserves reproduction accuracy up to higher frequencies. Chapter 7 Multichannel Room Equalization Considering Psychoacoustics 7.1 Introduction Knowledge about interaction of sound with architectural objects goes to the very distant past, and it has been used in design of many venues, including concert halls, churches, opera houses, theatres etc. However, only the last century has seen the development of electro-acoustic systems for audio recording and playback, which have revolutionized the way people consume audio content. Loudspeaker systems have become omnipresent—used in listening spaces that range from the above mentioned venues to living rooms and cars. The combination of a sound reproduction system with the room in which sound reproduction takes place can in some cases have undesired effects. This can be due to various reasons, such as an interior design that has aesthetic or some other criteria as a priority, but can also be due to the characteristics of the audio content being played. For instance, it is well known that the room acoustic properties considered favorable for classical music, such as long reverberation time, can be detrimental for playback of speech. The acoustical treatment of a room, which is used in big halls to cater to the acoustical requirements of a specific event, offers little flexibility. On the other hand, the development of signal processing, especially digital, has given a flexible way to enhance the acoustic properties of a room, or to mitigate some of the detrimental effects it has on the reproduced sound. We have seen in Section 2.7 that low-frequency room acoustics is dominated by usually clearly separated resonances or room modes. We have also seen that the density of room resonances increases with the square of frequency (see (2.100) on page 29), and that starting from the Schroeder frequency, room modes overlap and combine in a complex, location-dependent way that is best modeled by a statistical theory of room acoustics. The Schroeder frequency is dependent on the room’s geometry. For 143 144 Multichannel Room Equalization Considering Psychoacoustics concert halls, it is on the lower end of audible frequencies; in listening rooms, which we focus on in this chapter, it is roughly between 100 Hz and 200 Hz; and it can go up to several hundred Hertz in cars. The low-frequency room resonances below the Schroeder frequency are thus characteristic of the entire room’s listening area, and they define the room’s bass performance. For the playback of music, strong resonances affect the timbre in an audible way, while for speech, the long reverberation tails associated with strong resonances blur the syllables and decrease the intelligibility. Thus, both in music and speech reproduction, it is important to reduce the effect of excessive resonances in order to improve the listening experience. Above the Schroeder frequency, the high spatial variation of the resonant structure of room impulse responses (RIRs) makes the resonance control a highly locationdependent effort. This is the reason why some systems for wide-area room equalization do not correct the room beyond the Schroeder frequency, and why we focus in this chapter only on equalizing low frequencies. The first works on room equalization go back to 1960s, and they were in the spirit of controlling room modes. Namely, Boner and Boner (1965) proposed the use of equalization to attenuate the resonances in sound systems. Groh (1974) analyzed the low-frequency modal behavior of a room and performed equalization by finding adequate placement for a loudspeaker within a room. Similar, albeit more systematic approaches to optimizing the placement and number of low-frequency loudspeakers, have been investigated more recently by a number of authors (see Celestinos, 2006; Welti and Devantier, 2006). There have also been a fair amount of works recently which focus on correcting the low-frequency modal behavior of a room using infinite impulse response (IIR) filters (see Mäkivirta et al., 2003; Welti and Devantier, 2006; Wilson et al., 2003). In this chapter, we present an approach for multiple-loudspeaker low-frequency room correction in an extended listening area, based on convex optimization. Our approach resembles some optimization-based multiple-point RIR equalization approaches (e.g., see Elliott and Nelson, 1989). However, it is more general, since it allows one to systematically incorporate physical and psychoacoustical aspects relevant to RIR correction through convex constraints. We focus on the problem of equalizing the response of one loudspeaker with the help of the remaining loudspeakers. We argue that psychoacoustical phenomena of particular interest for room equalization are temporal masking and the precedence effect, and we show how they can be incorporated through convex constraints on the time-domain profile of the equalization filters’ impulse responses. Excessively driving loudspeakers at some frequencies, characteristic of the efforts of correcting deep notches, is prevented by limiting the maximum gain of equalization filters over frequency. 7.1.1 Chapter outline Section 7.2 presents the room equalization problem and gives a detailed description of our multiple-loudspeaker room equalization approach. In Section 7.3, we show the effectiveness of our approach with a simulation of a five-channel surround setup, where one loudspeaker is equalized with the help of the remaining four. Conclusions are given in Section 7.4. 7.2 Proposed room equalization 7.2 7.2.1 145 Proposed room equalization Problem description Figure 7.1: Equalization of a five-channel loudspeaker setup in a room of dimensions (6m, 4m, 2.5m). In the illustration, the response of loudspeaker S1 is equalized in the listening area around the central control point C1 . Consider a listening room with a multichannel loudspeaker setup, consisting of L loudspeakers S1 , . . . , SL . An example setup, which is considered later in Section 7.3, is shown in Figure 7.1. Note that we are focusing on equalizing the response of one loudspeaker, denoted by the main loudspeaker, with the help of the remaining, for this task called auxiliary loudspeakers. Without loss of generality, we assign index 1 to the main loudspeaker. As the first step, one needs to measure RIRs of all loudspeakers in N control points C1 , . . . , CN which cover the listening area where RIRs are being equalized (gray rectangular area in Figure 7.1). The placement of control points can be systematic or random, and as few as N = 4 control points can capture with high accuracy the 3D sound energy in a room, as reported in (Pedersen, 2007). We denote by Gij (ω) the frequency response of the RIR between loudspeaker j and control point i, and by G(ω) the N × L matrix containing the frequency responses Gij (ω). One also needs to decide on the length Nh of equalization filters hi [n]. Note that working with highly downsampled signals allows using relatively short filters, and multirate filtering (e.g., see Crochiere and Rabiner, 1983) offers savings in the computational complexity. Nh can be up to the order of the length of RIRs, which is still short in the downsampled domain corresponding to a highly reduced sampling frequency fS0 . Let the vector T hi = [hi [0] . . . hi [Nh − 1]] contain the samples of the equalization filter of loudspeaker i, and the vector T h = hT1 . . . hTL (7.1) (7.2) 146 Multichannel Room Equalization Considering Psychoacoustics contain the samples of all loudspeaker filters stacked together. Note that vector h is what we are trying to obtain. Since our design procedure considers the equalized RIRs in the frequency domain, we discretize the frequency axis into Nf uniformly-spaced normalized frequencies ω0 , . . . , ωNf −1 , where ω0 = 0, and ωNf −1 = π corresponds to the Nyquist frequency fS0 /2. The frequency spacing needs to be of the order of room resonances’ bandwidth, which can go down to around 1 Hz (Kuttruff, 2000). A vector T H(ωi ) = [H1 (ωi ) . . . HL (ωi )] , (7.3) containing the frequency responses of the equalization filters, is obtained by the following product: H(ωi ) = V (ωi ) h , (7.4) where V (ωi ) v(ωi ) = IL×L ⊗ v(ωi ) = = h v(ωi ) 0Nh ×1 .. . 0Nh ×1 v(ωi ) .. . ... ... .. . 0Nh ×1 0Nh ×1 .. . 0Nh ×1 iT 0Nh ×1 ... v(ωi ) 1 ejωi . . . ej(Nh −1)ωi T (7.5) , IL×L is an L × L identity matrix, and ⊗ denotes the Kronecker product. The frequency responses at the normalized frequency ωi of the equalized RIRs in the control points are given by T Y (ωi ) = [Y1 (ωi ) . . . YN (ωi )] = G(ωi ) H(ωi ) . (7.6) In order to jointly consider the equalized frequency responses in all control points C1 , . . . , CN and at all frequencies ω0 , . . . , ωNf −1 , vectors Y (ωi ) are stacked into one long vector defined by Y (ω0 ) GT (ω0 ) V (ω0 ) .. .. Y = = h. . . Y (ωNf −1 ) GT (ωNf −1 ) V (ωNf −1 ) Essentially, the goal of equalization is to make the equalized RIRs as close as possible to the desired responses, discussed next, in all control points at all considered frequencies. 7.2.2 Desired response calculation One of the main challenges when designing room equalizers is specifying the desired frequency response D(ω) the equalized system needs to achieve, and there is no wide consensus on this issue. On the other hand, it has long been suggested that an equalization procedure should not undo the effect of a room and make the reproduced sound artificially anechoic, but it should sensibly correct the room’s undesired features, usually associated with strong low-frequency resonances (e.g., see Genereux, 1992). 7.2 Proposed room equalization 147 As briefly mentioned at the beginning of this section, the listening area is sampled with control points in order to capture the essential properties of a sound field developed in the room. At the same time, sampling multiple points allows to avoid position-dependent anomalies, such as deep frequency-response notches associated with nodes of some of the room modes. It was shown in (Pedersen, 2007) that the root mean square (RMS) value of the room magnitude frequency response taken over several measurement points gives a stable estimate of the room’s spectral power profile. Hence, for defining the desired frequency characteristic, we combine the mentioned spatial power averaging with magnitude frequency response smoothing in fractional octave bands. More specifically, the desired response D(ωi ) is obtained as follows: v u N u1 X (G̃m1 (ωi ))2 , D(ωi ) = t N m=1 (7.7) where G̃m1 (ω) is fractional-octave (e.g., 1/3-octave) smoothed magnitude characteristic of the RIR between the main loudspeaker and control point m.1 Since the same desired frequency response is taken for each control point, the vector of desired responses at frequency ωi is given by D(ωi ) = D(ωi ) 1N ×1 , where the column-vector 1N ×1 contains N ones. In order to compare the equalized with desired frequency responses in control points, vectors D(ωi ) need to be stacked together into T a column vector D = D T (ω0 ) . . . D T (ωNf −1 ) . 7.2.3 Choice of a cost function The goal of an optimization procedure is to minimize a cost function, which in the case of our equalization filters design is a function of the difference between the equalized and desired frequency responses, Y and D, respectively, taken over all control points. But before defining the cost function, we first define a resonance detection vector T R = R(ω0 ) . . . R(ωNf −1 ) (7.8) R(ωi ) = |maxm (|Gm1 (ωi )|) − D(ωi )| + , (7.9) by where is a predefined minimum weight that can be given to a spectral magnitude error. Note that R is designed to peak at resonant frequencies. For a cost function, we choose to use a weighted magnitude error that penalizes more the errors at resonant frequencies, and it is given by J = kW ( | Y | − | D | )k2 , (7.10) where W = diag(R) ⊗ IN ×N , and diag(·) makes a diagonal matrix from a vector, with vector’s entries on the main diagonal. 1 Fractional smoothing is thoroughly described in (Hatziantoniou and Mourjopoulos, 2000). 148 7.2.4 Multichannel Room Equalization Considering Psychoacoustics Equalization filter constraints Here we present some psychoacoustical and physical considerations that are used to constrain the computed filters in order to avoid the location-sensitivity, characteristic of some other room equalization approaches. Temporal-masking constraints Temporal masking is a phenomenon where a sound stimulus renders inaudible sounds which precede (backward masking) and follow it (forward masking) (Moore, 1989). Figure 7.2: Illustration of temporal masking for a single pulse. Backward masking, which is not entirely understood, varies greatly between trained and untrained listeners (Moore, 1989; Zwicker and Fastl, 1999), and can go up to 20 ms for the latter. Raab (1961) investigated backward masking of clicks by clicks, and determined that it vanishes above 15–20 ms. Thus, we decide to model backward masking by an exponential curve of the form mb (t) = b1 (−t)b2 in the interval (−∞, −1 ms). The parameters b1 and b2 are determined such that mb (t)|t=−1ms = 0 dB mb (t)|t=−15ms = −60 dB . Forward masking is the better understood of the two phenomena, and it is highly dependent on the relationship between the masking and masked signal. Moore (1989) explained forward masking as a phenomenon that starts as simultaneous masking, and then decays as an exponential function up to 100–200 ms. Fielder (2003) proposed to simplify the findings of several forward masking investigations involving different types of masking and masked signals (Raab, 1961; Zwicker and Fastl, 1999; Jesteadt et al., 1982), and model forward masking by a combination of simultaneous masking that extends up to 4 ms, followed by an exponential curve mf (t) = f1 tf2 7.2 Proposed room equalization 149 in the interval (4 ms, ∞), whose parameters f1 and f2 are defined by mf (t)|t=4ms = 0 dB mf (t)|t=200ms = −60 dB . Combining backward and forward masking curves, mb (t) and mf (t), respectively, and assuming simultaneous masking (Moore, 1989; Zwicker and Fastl, 1999) takes place in the interval [−1 ms, 4 ms], we obtain the masking curve m(t) illustrated in Figure 7.2. Fielder (2003) also proposed considering temporal masking as a criterion for timedomain distortions of equalization systems. In light of the arguments about the limitations of using long equalization filters or filters that exhibit pre-echos, and considering temporal masking, we can expect that the use of short equalization filters has a good chance of avoiding sensitivity to location changes. In other words, we argue that short and well localized (“spiky”) filters, whose amplitude profiles fit into the temporal masking threshold curve m(t) from Figure 7.2, are good solutions for wide-area equalization. Our argument is corroborated by the fact that if a sharp transient is emitted by a loudspeaker, a listener close to the loudspeaker will not hear a distortion thanks to temporal masking. Figure 7.3: Illustration of the temporal-masking constraint for RIR equalization filter of the main loudspeaker. The filter coefficients can take on values only in the shaded area. The temporal-masking constraint is defined as a maximum-amplitude limit to a filter’s impulse response. If m[n] is a sampled version of the temporal masking threshold curve m(t), then the equalization filter hi [n] is constrained with | hi [n] | ≤ m[n] . (7.11) Since we do not use a delay for the main loudspeaker, its amplitude profile is constrained by the sampled version of the causal part of m(t). Figure 7.3 illustrates the temporal masking constraint (7.11) for the main loudspeaker, where the shaded area defines where the samples of h1 [n] can take on values. 150 Multichannel Room Equalization Considering Psychoacoustics Note also that by using the following vector representation h1 = m1 = T [h1 [0] . . . h1 [Nh − 1]] T [m[0] . . . m[Nh − 1]] , the main loudspeaker temporal masking constraint (7.11) can be written as the following convex constraint: | h1 | m1 , (7.12) where denotes component-wise ≤. Auxiliary loudspeaker filter constraints The precedence effect (Wallach et al., 1949) is a phenomenon wherein a sound coming to a listener from different directions and with different delays is perceived in the direction of the earliest arrival. More precisely, inside the interval of 1 ms following the first arrival, multiple wavefronts can combine in what is called summing localization, where the sound is perceived in the direction defined by the relative strengths of the incoming wavefronts. Following the summing localization interval, the precedence effect is active, and delayed sounds do not affect localization2 until the delay reaches the echo threshold. Beyond the echo threshold, delayed sounds are perceived as echos, which are auditory events well separated in time and direction (Blauert, 1997). The echo threshold is dependent on the type of audio stimulus, namely, for speech it is around 50 ms, while for music it goes up to around 100 ms (Blauert, 1997). Motivated by the findings described above, we formulate two additional constraints for the auxiliary loudspeakers: • To prevent the sounds from the auxiliary loudspeakers appear as echoes, we use a combination of delay and gain relative to the main loudspeaker’s filter. The delay and gain should be such that the auxiliary loudspeaker signals are below the echo threshold. • To prevent that the signal is perceived away from the main loudspeaker, the delay needs to be at least about 1 ms over the whole listening area, such that the precedence effect is active and sound is perceived at the main loudspeaker. The described constraints for the auxiliary loudspeakers are realized by the following modification of the temporal-masking constraint (7.11): | hi [n] | ≤ ai m[n − ni ] , (7.13) where ai is a positive attenuation factor, and ni is a lag corresponding to a delay ti , which we set between five and ten milliseconds. The auxiliary loudspeaker filter constraints are illustrated in Figure 7.4, where the shaded area defines where the samples of hi [n] can take on values. 2 Note that during this interval, delayed sounds affect spatial perception, e.g., they can enlarge the perceived width of a source. 7.2 Proposed room equalization 151 Figure 7.4: Illustration of the temporal masking constraint for RIR equalization filter of auxiliary loudspeakers. The filter coefficients are allowed to have values only in the shaded area. Using the following vector notation hi = mi = [hi [0] . . . hi [Nh − 1]]T ai [m[−ni ] . . . m[Nh − 1 − ni ]]T , the auxiliary loudspeaker filter constraint (7.13) becomes | hi | mi . (7.14) Like the temporal-masking constraint (7.12), the auxiliary loudspeaker filter constraint (7.14) is also convex. Combining (7.12) and (7.14), we obtain a single convex constraint of the form |h| m, (7.15) where m = [mT1 . . . mTL ]T . Maximum-gain constraints An additional insurance against location sensitivity is putting a limit on the equalization filter’s gain over frequencies. Namely, this avoids excessively driving a loudspeaker in order to correct a deep notch, which is usually highly position-dependent. The gain of equalization filters are limited using the following set of convex constraints: | V (ωi ) h | H max , T ∀ωi , (7.16) where H max = [H1max . . . HLmax ] is a vector with the maximum gain for each loudspeaker. The maximum-gain constraint is illustrated in Figure 7.5, where the shaded area defines where the magnitude spectrum |Hi (ω)| can take on values. Multichannel Room Equalization Considering Psychoacoustics 152 Figure 7.5: Illustration of the maximum-gain constraint for a RIR equalization filter. The magnitude frequency response of RIR equalization filter is allowed to have values only in the shaded area. 7.2.5 Filter computation procedure The cost function (7.10) is not convex, preventing the use of conventional tools of convex optimization (Boyd and Vandenberghe, 2004) and making the problem difficult to solve. However, we have already seen a similar problem when we discussed magnitude-optimized transducer arrays in Section 2.8, and also in Section 5.3 when we discussed beamformer design for a directional loudspeaker array. In both cases, the solution was based on the iterative, local optimization algorithm by Kassakian (2006). In this particular case we do the same, and propose Algorithm 7.1, very similar to Algorithms 2.1 and 5.1. Algorithm 7.1 Solving a constrained, weighted magnitude least squares problem. 1. Choose the solution tolerance 2. Choose the initial solution h 3. repeat 4. J ← k W ( | Y | − | D | ) k2 5. Compute D̂ such that ∀ωj ∈ {ω0 , . . . , ωNf −1 } |D̂(ωj )| = |D(ωj )| ∠D̂(ωj ) 6. = ∠(G(ωj ) H(ωj )) Solve the following convex program minimize subject to k W ( Y − D̂ ) k2 |h| m | V (ωj ) h | H max , ∀ωj ∈ {ω0 , . . . , ωNf −1 } 7. J 0 ← k W ( | Y | − | D | ) k2 8. until |J 0 − J| < Using the fact that the main loudspeaker has a dominant role—which is also formulated through constraints—we choose the initial solution h = [hT1 . . . hTL ]T 7.3 Simulations 153 defined by h1 = [1 0| .{z . . 0}]T hi = 0Nh ×1 , Nh −1 i > 1. (7.17) Using the same argument as in Section 2.8, it is easily shown that Algorithm 2.1 converges to a local minimum. However, it is not guaranteed convergence to the global minimum, making the choice of the initial solution a crucial step. From our experience, the initial solution (7.17) provides consistently better results than the complex least squares solution (see Section 2.8 for details). 7.3 Simulations In order to verify the effectiveness of our room equalization strategy, we performed a simulation of a five-channel full-range loudspeaker setup, shown in Figure 7.1, using an image source model (Allen and Berkley, 1979) implementation from Habets (2006). We give two examples. The first example, denoted MIMO equalization, involves the described multiple-loudspeaker equalization of the front loudspeaker S1 (center channel),3 with the help of the remaining four. In the second example, denoted SIMO equalization, the front loudspeaker S1 is equalized by a single equalization filter, without the help of the remaining ones. Note that SIMO equalization can be viewed as a special case of MIMO equalization, where the auxiliary loudspeaker filter constraints (7.13) are set such that ai = 0. In both examples, equalization is done up to frequency fmax = 200 Hz, which enables filtering in the downsampled domain corresponding to the sampling frequency fS0 = 400 Hz. The maximum-gain constraints are set to Himax = 3 dB, and the used frequency spacing is 1 Hz. Also, the equalization filters are computed relative to the control points C1 , . . . , C5 (see Figure 7.1). 7.3.1 MIMO equalization In this part, we present a full multichannel equalization of the front loudspeaker S1 , where all equalization filters hi [n] have lengths of Nh = 16 samples. For auxiliary loudspeakers’ filters, we use a delay ni that corresponds to 10 ms, with attenuation ai = 0.25. The resulting equalization filters, both in the time and frequency domain, together with the amplitude constraints, are shown in Figure 7.6. Figure 7.7 shows the variations of RIR frequency characteristics—before and after equalization—on a rectangular grid of control points, spaced at 25 cm, which cover the listening area (see Figure 7.1). From Figure 7.7, it can be seen that after equalization, strong resonances get attenuated, as desired. Also, above 80 Hz, the average magnitude frequency characteristic of the equalized RIRs inside the listening area gets significantly improved, exhibiting a smoother behavior and approximating the desired magnitude characteristic more closely. 3 Note that our numbering differs from the usual way channels are numbered in five-channel surround. Multichannel Room Equalization Considering Psychoacoustics 154 (a) 1 h1 0.8 h2 h[ n ] 0.6 h3 0.4 h4 0.2 h5 0 −0.2 0 5 10 15 20 t [ ms] 25 30 35 (b) 5 0 H [ dB] −5 −10 −15 −20 −25 −30 20 40 60 80 100 120 f [ Hz ] 140 160 180 200 Figure 7.6: Time-domain (a) and frequency-domain (b) characteristics of loudspeaker equalization filters hi [n]. Thin dashed lines mark the amplitude constraints for the main and auxiliary loudspeakers (a) and the maximum-gain constraint (b). 7.3.2 SIMO equalization In this part, we present a single-channel equalization of the front loudspeaker S1 with an equalization filter whose impulse response has Nh = 64 samples. The impulse response of the resulting equalization filter h1 [n], together with the amplitude constraint, is shown in Figure 7.8. Panels (a) and (b) in Figure 7.9 show the values of the peak-detection vector R and the frequency characteristic H1 (ω) of the equalization filter h1 [n], respectively. The former shows that larger weight is given to errors where strong resonances are present, and justifies notches that H1 (ω) has at those frequencies. Panels (c) and (d) in Figure 7.9 show variations of RIR frequency characteristics— before and after equalization, respectively—on a rectangular grid of control points, spaced at 25 cm, which cover the listening area (see Figure 7.1). From panels (c) and (d) in Figure 7.9, it can be seen that after equalization, strong resonances get only slightly attenuated. However, due to short length and the temporal-masking constraint, the filter tends to attenuate all frequencies, bringing the equalized RIRs down by around 2 dB on average, which may not be desired. 7.3.3 Discussion By comparing Figure 7.6(b) and Figure 7.9(b), we clearly see the differences in the ways MIMO and SIMO equalization attenuate room modes. Magnitude responses of 7.3 Simulations 155 (a) −20 −25 G [ dB] −30 −35 −40 −45 −50 −55 20 40 60 80 100 120 140 160 180 200 120 140 160 180 200 f [Hz] (b) −20 −25 Y [ dB] −30 −35 −40 −45 −50 −55 20 40 60 80 100 f [Hz] Figure 7.7: Mean (solid), 25-75 (light-gray), and 3-97 (dark-gray) percentiles of the magnitude frequency responses on a rectangular grid of control points spaced at 25 cm, shown in Figure 7.1, before (a) and after (b) equalizing loudspeaker S1 . The desired frequency characteristic D(ω) is shown with a dashed line. 1 0.8 h[ n ] 0.6 0.4 0.2 0 −0.2 0 50 100 150 t [ ms] Figure 7.8: Impulse response of the single-channel equalization filter h1 [n]. Thin dashed line mark the amplitude constraints for the main loudspeaker. MIMO equalization filters, shown in Figure 7.6(b), do not reflect the RIR resonance patterns. It indicates that they combine in a mode-cancelling manner, and it also explains why short filters suffice for a good equalization performance. On the other hand, magnitude response of a SIMO equalization filter, shown in Figure 7.9(b), reflects the RIR resonance pattern more closely, revealing magnitude response notches at room resonant frequencies. That is the reason why one needs longer SIMO equalization filters. Multichannel Room Equalization Considering Psychoacoustics 156 (a) 1 0.9 R 0.8 0.7 0.6 0.5 20 40 60 80 100 120 140 160 180 200 120 f [ Hz ] 140 160 180 200 120 140 160 180 200 120 140 160 180 200 (b) 1 0 H [ dB] −1 −2 −3 −4 −5 20 40 60 80 100 (c) −20 −25 G [ dB] −30 −35 −40 −45 −50 −55 20 40 60 80 100 f [Hz] (d) −20 −25 Y [ dB] −30 −35 −40 −45 −50 −55 20 40 60 80 100 f [Hz] Figure 7.9: (a) Resonance-detection vector R(ω). (b) Magnitude frequency response H1 (ω) of the single-channel equalization filter h1 [n]. Mean (solid), 25-75 (light-gray), and 3-97 (dark-gray) percentiles of the magnitude frequency responses on a rectangular grid of control points spaced at 25 cm, shown in Figure 7.1, before (c) and after (d) equalizing loudspeaker S1 . The desired frequency characteristic D(ω) is shown with a dashed line in panels (c) and (d). 7.4 Conclusions 157 Resonances at the very low frequencies, below 80 Hz, are easier to equalize, since they are more widely separated, and require lower-Q correction filters. This is apparent in the SIMO equalization case, where resonances are attenuated by a combination of notches at the corresponding frequencies, with the effectiveness similar to the MIMO equalization case. The resonances at higher frequencies are more densely spaced and require veryhigh-Q correction filters, making single-loudspeaker equalization difficult. On the other hand, the mode cancellation approach with multiple loudspeakers offers a way to attenuate these resonances without excessive gains or using impractically long filters. This explains a superior performance of MIMO equalization above 80 Hz. 7.4 Conclusions We presented an approach for low-frequency multiple-loudspeaker RIR equalization based on convex optimization. We showed a way to formulate typical physchoacoustical and physical RIR equalization design constraints in terms of convex constraints on the equalization filters, allowing finding optimal solutions in a systematic fashion. We have also shown the effectiveness of our approach at equalizing a simulated five-channel loudspeaker system in an extended listening area, using short filters and making sure that the equalization system does not cause undesired audible echoes and localization biases. The proposed approach, using the temporal-masking and maximum-gain constraints, can also be applied to single-loudspeaker room equalization. Similarly to the multiple-loudspeaker equalization, this approach was verified on a simulated fivechannel loudspeaker system. It turns out that even with a single channel, our approach offers some improvements over no equalization by being effective at attenuating— although subtly—dominant resonances in a RIR. 158 Multichannel Room Equalization Considering Psychoacoustics Chapter 8 Conclusion In this thesis, we have focused on spatial sound capture and reproduction. In each problem that we encountered along the way, there was a recurring pattern. Namely, at the start we used spatio-temporal properties of the sound field in a considered spatial domain in order to carry out a suitable spatial discretization. The spatial discretization of an otherwise continuous problem consists of measuring responses of transducers (microphones or loudspeakers) in the used transducer array. From then on, we sought a suitable numerical optimization procedure for optimally solving the considered acoustic problem, while respecting particular physical and psychoacoustical constraints. We have seen that the proposed general approach is flexible, since unlike most methods derived from acoustic theory, it does not require ideal and calibrated transducers, or free-field propagation conditions. More specifically, any discrepancies between the acoustic environment and the used transducers on one hand, and their idealized models on the other hand, are accounted for through measurements. A striking fact about the proposed approach is that it uses the same optimization tools for a wide variety of problems. Namely, in all the applications presented in this thesis, which include the design of directional and sound field microphones, directional loudspeaker array, sound field reproduction, and room equalization, we used spatial discretization of transducers’ responses; after specifying a desired system behavior and constraints, we used the same optimization procedure for obtaining a solution. Even in idealized cases, with calibrated transducers having ideal characteristics, our approach achieves equally good (such as in microphone array design) or better (sound field reproduction) results. We have also shown how problem-specific physical and psychophisical observations can be included in the formulation of the used optimization problem. In the problem of directional loudspeaker array design, we limited loudspeaker filter gains, but we also discarded phase information from the cost function in order to achieve a more directional reproduction. For room equalization, we similarly discarded the phase from the cost function and limited loudspeaker gains, but we also added filter amplitude profiles in order to avoid localization bias and temporal distortions in the form of pre- and post-echos. 159 160 Conclusion Future work The possible applications of wide-band directional sound reproduction have been mentioned in Chapter 5, but we have only done informal listening without investigating them in a systematic fashion. A further investigation into multichannel reproduction with a steerable directional loudspeaker array in a room could result in a more systematic strategy for reproducing ambience and lateral direct sounds. The use of such a loudspeaker array for room equalization can also be assessed, since it has long been shown by Jacob (1985) that medium- and high-directivity loudspeakers notably improve speech intelligibility in highly reverberant rooms. We have seen that our approach to reproducing sound fields (SFR) is based on inverting the MIMO acoustic channel from loudspeakers to control points. Loudspeaker characteristics and possible room effects are taken into account implicitly. However, we have not assessed those scenarios, and leave them for future work. Finally, an extension to the proposed constrained optimization framework for multiple-loudspeaker room equalization could be sought. For instance, instead of using a temporal-masking constraint, the cost function could include temporal distortion quantified with the help of temporal masking. This would enlarge the feasible set of equalization filters’ impulse responses, and consequently increase their ability to more effectively control room resonances. Also, the cost function could easily be extended to use frequency-domain averaging, opening the door to full-band equalization. Bibliography T.D. Abhayapala and D.B. Ward. Theory and design of high order sound field microphones using spherical microphone array. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002. M. Abramowitz and I.A. Stegun. Handbook of Mathematical Functions: With Formulas, Graphs, and Mathematical Tables. Dover New York, 1976. J. Ahrens and S. Spors. Sound Field Reproduction Using Planar and Linear Arrays of Loudspeakers. IEEE Transactions on Audio, Speech, and Language Processing, 18(8):2038–2050, Nov. 2010. T. Ajdler, L. Sbaiz, and M. Vetterli. The Plenacoustic Function and its Sampling. IEEE Trans. Sig. Proc, 54(10):3790–3804, 2006. T. Ajdler, C. Faller, L. Sbaiz, and M. Vetterli. Sound field analysis along a circle and its applications to hrtf interpolation. J. Audio Eng. Soc, 56(3):156–175, 2008. I. Allen. Matching the sound to the picture. In AES 9th International Conference, 1991. J.B. Allen and D.A. Berkley. Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am, 65(4):943–950, 1979. G.B. Arfken, H.J. Weber, and H. Weber. Mathematical Methods for Physicists. Academic press New York, 1985. J.S. Bamford and J. Vanderkooy. Ambisonic sound for us. Preprint 99th Conv. Aud. Eng. Soc., 1995. B.B. Bauer. Quadraphonic reproducing system, 1974. US Patent 3,813,494. J.L. Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509–517, 1975. G. Berchin. Precise filter design [dsp tips & tricks]. IEEE Sig. Proc. Mag, 24(1): 137–139, 2007. A.J. Berkhout. A holographic approach to acoustic control. J. Audio Eng. Soc, 36 (12):977–995, 1988. 161 162 Bibliography A.J. Berkhout, D. de Vries, and P. Vogel. Wave front synthesis: a new direction in electroacoustics. Preprint 93th Conv. Aud. Eng. Soc., 1992. A.J. Berkhout, D. de Vries, and P. Vogel. Acoustic control by wave field synthesis. J. Acoust. Soc. Am, 93(5):2764–2778, May 1993. S. Bertet, J. Daniel, and S. Moreau. 3D sound field recording with higher order ambisonics-objective measurements and validation of spherical microphone. Preprint 120th Conv. Aud. Eng. Soc., 2006. J. Blauert. Spatial Hearing: The Psychophysics of Human Sound Localization. The MIT Press, Cambridge, Massachusetts, USA, revised edition, 1997. J. Blauert and P. Laws. Group delay distortions in electroacoustical systems. J. Acoust. Soc. Am, 63(5):1478–1483, 1978. A. Blumlein. Improvements in and relating to sound transmission, sound recording and sound reproduction systems. British Patent Specification 394325, 1931. Reprinted in Stereophonic Techniques, Aud. Eng. Soc, New York, 1986. C.P. Boner and C.R. Boner. A procedure for controlling room-ring modes and feedback modes in sound systems with narrow-band filters. J. Audio Eng. Soc, 13(4): 297–299, 1965. S.P. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. R. Bücklein. The audibility of frequency response irregularities. J. Audio Eng. Soc, 29(3):126–131, 1981. A. Celestinos. Low frequency sound field enhancement system for rectangular rooms using multiple loudspeakers. PhD thesis, Aalborg University, 2006. R.K. Cook, R.V. Waterhouse, R.D. Berendt, S. Edelman, and M.C. Thompson. Measurement of correlation coefficients in reverberant sound fields. J. Acoust. Soc. Am, 27:1072–1077, 1955. D.H. Cooper and T. Shiga. Discrete-matrix multichannel stereo. J. Audio Eng. Soc, 20(5):346–360, 1972. E. Corteel. Equalization in an extended area using multichannel inversion and wave field synthesis. J. Audio Eng. Soc, 54(12):1140–1161, 2006. E. Corteel. Synthesis of directional sources using wave field synthesis, possibilities, and limitations. EURASIP Journal on Applied Signal Processing, 2007(1):188–188, 2007. ISSN 1110-8657. E. Corteel, R. Pellegrini, and C. Kuhn-Rahloff. Wave Field Synthesis with increased aliasing frequency. Preprint 124th Conv. Aud. Eng. Soc., 2008. P.S. Cotterell. On the theory of second-order soundfield microphone. PhD thesis, University of Reading, 2002. Bibliography 163 H. Cox, R. Zeskind, and M. Owen. Robust adaptive beamforming. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(10):1365–1376, 1987. R.E. Crochiere and L.R. Rabiner. Multirate Digital Signal Processing. Prentice Hall, 1983. J. Daniel, R. Nicol, and S. Moreau. Further investigations of high order ambisonics and wavefield synthesis for holophonic sound imaging. Preprint 114th Conv. Aud. Eng. Soc., 2003. H.E. de Bree, M. Iwaki, K. Ono, T. Sugimoto, and W. Woszczyk. Anechoic measurements of particle-velocity probes compared to pressure gradient and pressure microphones. Preprint 122nd Conv. Aud. Eng. Soc., 2007. D. de Vries. Sound reinforcement by wavefield synthesis: adaptation of the synthesis operator to the loudspeaker directivity characteristics. J. Audio Eng. Soc, 44(12): 1120–1131, 1996. J.R. Driscoll and D.M. Healy. Computing fourier transforms and convolutions on the 2-sphere. Advances in Applied Mathematics, 15(2):202–250, 1994. G.W. Elko. Superdirectional microphone arrays. In Acoustic Signal Processing for Telecommunication, pages 181–238. Kluwer Academic Publishers, 2000. G.W. Elko. Microphone array systems for hands-free telecommunication. Speech communication, 20(3–4):229–240, Dec. 1996. G.W. Elko and A.T.N. Pong. A steerable and variable first-order differential microphone array. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1997. S.J. Elliott and P.A. Nelson. Multiple-point equalization in a room using adaptive digital filters. J. Audio Eng. Soc, 37(11):899–907, 1989. F.J. Fahy. Measurement of acoustic intensity using the cross-spectral density of two microphone signals. J. Acoust. Soc. Am, 62(4):1057–1059, 1977. C. Faller and M. Kolundžija. Design and limitations of non-coincidence correction filters for soundfield microphones. Preprint 126th Conv. Aud. Eng. Soc., 2009. K. Farrar. Soundfield microphone. Wireless World, 1979. L.D. Fielder. Analysis of traditional and reverberation-reducing methods of room equalization. J. Audio Eng. Soc, 51(1/2):3–26, 2003. S. Flanagan, B.C.J. Moore, and M.A. Stone. Discrimination of group delay in clicklike signals presented via headphones and loudspeakers. J. Audio Eng. Soc, 53(7-8): 593–611, 2005. R.K. Furness. Ambisonics–an overview. In Proceedings of the 8th International Conference of the Audio Engineering Society, pages 181–189, 1990. Bibliography 164 P.A. Gauthier and A. Berry. Adaptive wave field synthesis for sound field reproduction: Theory, experiments, and future perspectives. J. Audio Eng. Soc, 55(12): 1107, 2007. R.J. Geluk and L. de Klerk. Microphone exhibiting frequency-dependent directivity. EU Patent Application 01201501.2, Apr. 2001. R. Genereux. Adaptive filters for loudspeakers and rooms. Preprint 93rd Conv. Aud. Eng. Soc, 1992. M.A. Gerzon. Practical periphony: The reproduction of full-sphere sound. Preprint 65th Conv. Aud. Eng. Soc., 1980a. M.A. Gerzon. Periphony: With-height sound reproduction. J. Audio Eng. Soc, 21 (1):2–10, 1973. M.A. Gerzon. The design of precisely coincident microphone arrays for stereo and surround sound. Preprint 50th Conv. Aud. Eng. Soc., 1975. M.A. Gerzon. Practical Periphony: The Reproduction of full-sphere sound. Preprint 65th Conv. Aud. Eng. Soc., 1980b. G. Golub and W. Kahan. Calculating the singular values and pseudo-inverse of a matrix. Journal of the Society for Industrial and Applied Mathematics, Series B: Numerical Analysis, 2(2):205–224, 1965. M. Grant, S. Boyd, and Y. Ye. CVX: Matlab software for disciplined convex programming. http://cvxr.com/cvx/, 2011. A.R. Groh. High-fidelity sound system equalization by analysis of standing waves. J. Audio Eng. Soc., 22(10):795–799, 1974. E.A.P. Habets. Room impulse response generator, 2006. J. Hannemann and K.D. Donohue. Virtual sound source rendering using a multipoleexpansion and method-of-moments approach. J. Audio Eng. Soc, 56(6):473, 2008. R.H. Hardin and N.J.A. Sloane. Mclaren’s improved snub cube and other new spherical designs in three dimensions. Discrete and Computational Geometry, 15:429–441, 1996. P.D. Hatziantoniou and J.N. Mourjopoulos. Generalized fractional-octave smoothing of audio and acoustic responses. J. Audio Eng. Soc, 48(4):259–280, 2000. Holophonics. Sound Shower. speakers.html, 2011. http://www.panphonics.com/directional- Holosonics. Audio Spotlight. http://www.holosonics.com/products.html, 2011. ITU-775. Multichannel stereophonic sound system with and without accompanying picture. Rec. BS.775.1, ITU, Geneva, 1994. K.D. Jacob. Subjective and predictive measures of speech intelligibility-the role of loudspeaker directivity. J. Audio Eng. Soc, 33(12):950–955, 1985. Bibliography 165 W. Jesteadt, S.P. Bacon, and J.R. Lehman. Forward masking as a function of frequency, masker level, and signal delay. J. Acoust. Soc. Am, 71:950–962, 1982. D.H. Johnson and D.E. Dudgeon. Array Signal Processing. Prentice Hall, 1993. P.W. Kassakian. Convex approximation and optimization with applications in magnitude filter design and radiation pattern synthesis. PhD thesis, University of California, Berkeley, 2006. O. Kirkeby and P.A. Nelson. Reproduction of plane wave sound fields. J. Acoust. Soc. Am, 94:2992, 1993. O. Kirkeby, P.A. Nelson, H. Hamada, and F. Orduna-Bustamante. Fast deconvolution of multichannel systems using regularization. IEEE Transactions on Speech and Audio Processing, 6(2):189–194, Mar 1998. M. Kolundžija. Microphone processing for sound field measurement. Master’s thesis, EPFL, 2007. M. Kolundžija, C. Faller, and M. Vetterli. Spatio-temporal Gradient Analysis of Differential Microphone Arrays. Preprint 126th Conv. Aud. Eng. Soc., 2009a. M. Kolundžija, C. Faller, and M. Vetterli. Sound Field Reconstruction: An Improved Approach for Wave Field Synthesis. Preprint 126th Conv. Aud. Eng. Soc., 2009b. M. Kolundžija, C. Faller, and M. Vetterli. Designing Practical Filters for Sound Field Reconstruction. Preprint 127th Conv. Aud. Eng. Soc., 2009c. M. Kolundžija, C. Faller, and M. Vetterli. Baffled Circular Loudspeaker Array with Broadband High Directivity. IEEE International Conference on Acoustics, Speech, and Signal Processing, March 2010a. M. Kolundžija, C. Faller, and M. Vetterli. Sound field recording by measuring gradients. Preprint 128th Conv. Aud. Eng. Soc., 2010b. M. Kolundžija, C. Faller, and M. Vetterli. Spatiotemporal gradient analysis of differential microphone arrays. J. Audio Eng. Soc, 59(1/2):20–28, 2011a. M. Kolundžija, C. Faller, and M. Vetterli. Design of a compact cylindrical loudspeaker array for spatial sound reproduction. Preprint 130th Conv. Aud. Eng. Soc., 2011b. M. Kolundžija, C. Faller, and M. Vetterli. Reproducing sound fields using MIMO acoustic channel inversion. accepted to J. Audio Eng. Soc, Nov 2011c. H. Kuttruff. Room Acoustics. Taylor & Francis, 2000. H. Lebret. Optimal beamforming via interior point methods. Journal of VLSI Signal Processing, 14(1):29–41, 1996. H. Lebret and S. Boyd. Antenna array pattern synthesis via convex optimization. IEEE Trans. Sig. Proc, 45(3):526–532, 1997. S.P. Lipshitz. Stereo microphone techniques: Are the purists wrong? J. Audio Eng. Soc, 34(9):716–744, 1986. 166 Bibliography A. Mäkivirta, P. Antsalo, M. Karjalainen, and V. Välimäki. Modal equalization of loudspeaker-room responses at low frequencies. J. Audio Eng. Soc, 51(5):324–343, 2003. J. Merimaa. Applications of a 3-D microphone array. Preprint 112th Conv. Aud. Eng. Soc., 2002. J. Meyer and G. Elko. A highly scalable spherical microphone array based on an orthonormal decomposition of the soundfield. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002. B.C.J. Moore. An Introduction to the Psychology of Hearing. Academic Press, 1989. P.M. Morse and K.U. Ingard. Theoretical Acoustics. Princeton University Press, 1968. Y. Nakashima, T. Yoshimura, N. Naka, and T. Ohya. Prototype of Mobile Super Directional Loudspeaker. NTT DoCoMo Techical Journal, 8(1):25–32, 2006. P.A. Nelson and S.J. Elliott. Active Control of Sound. Academic Press, 1992. H.F. Olson. Gradient microphones. J. Acoust. Soc. Am, 17:192–198, 1946. H.F. Olson. Directional microphones. J. Audio Eng. Soc, 15(4):420–430, 1967. H.F. Olson. The quest for directional microphones at RCA. J. Audio Eng. Soc, 28: 776–786, 1980. A.V. Oppenheim and R.W. Schafer. Discrete-Time Signal Processing. Prentice Hall (Englewood Cliffs, NJ), 1989. J.A. Pedersen. Sampling the energy in a 3-D sound field. Preprint 130th Conv. Aud. Eng. Soc, 2007. V.M.A. Peutz. Articulation loss of consonants as a criterion for speech transmission in a room. J. Audio Eng. Soc, 19(11):915–919, 1971. M.A. Poletti. A unified theory of horizontal holographic sound systems. J. Audio Eng. Soc, 48(12):1155–1182, 2000. M.A. Poletti. The design of encoding functions for stereophonic and polyphonic sound systems. J. Audio Eng. Soc, 44(11):948–963, 1996. D.A. Preves, T.S. Peterson, and M.A. Bren. In-the-ear hearing aid with directional microphone system, May 1998. US Patent 5,757,933. V. Pulkki and C. Faller. Directional audio coding: filterbank and stft-based design. Preprint 120th Conv. Aud. Eng. Soc., 2006. V. Pulkki and M. Karjalainen. Localization of amplitude-panned virtual sources i: stereophonic panning. J. Audio Eng. Soc, 49(9):739–752, 2001. D.H. Raab. Forward and backward masking between acoustic clicks. J. Acoust. Soc. Am, 33:137–139, 1961. Bibliography 167 R. Raangs, W.F. Druyvesteyn, and H.E. De Bree. A low-cost intensity probe. J. Audio Eng. Soc, 51(5):344–357, 2003. B. Rafaely. Analysis and design of spherical microphone arrays. IEEE Transactions on Speech and Audio Processing, 13(1):135–143, 2005. B. Rafaely, I. Balmages, and L. Eger. High-resolution plane-wave decomposition in an auditorium using a dual-radius scanning spherical microphone array. J. Acoust. Soc. Am, 122:2661–2668, 2007a. B. Rafaely, B. Weiss, and E. Bachmat. Spatial aliasing in spherical microphone arrays. IEEE Trans. Sig. Proc, 55(3):1003–1010, 2007b. P. Scheiber. Four channels and compatibility. J. Audio Eng. Soc, 19(4):267–279, 1971. Sennheiser. Audiobeam. http://www.sennheiser.com, 2011. N.J.A. Sloane, R.H. Hardin, and W.D. Smith. Tables of spherical codes. Published electronically at http://www.research.att.com/~njas/packings. W. Snow. Basic principles of stereophonic sound. IRE Transactions on Audio, 3(2): 42–53, 1955. Sonic Emotion. 3D Sound. heartheworldin3D.html, 2011. http://www.sonicemotion.com/se/ch/ S. Spors. Extension of an analytic secondary source selection criterion for wave field synthesis. In Preprint 123th Conv. Aud. Eng. Soc., 2007. E.W. Start. Application of curved arrays in wave field synthesis. Preprint 100th Conv. Aud. Eng. Soc., 1996. P. Stoica and R.L. Moses. Introduction to Spectral Analysis. Prentice Hall, Upper Saddle River, New Jersey, 1997. H. Teutsch and W. Kellermann. EB-ESPRIT: 2D localization of multiple wideband acoustic sources using eigen-beams. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. H. Teutsch and W. Kellermann. Acoustic source detection and localization based on wavefield decomposition using circular microphone arrays. J. Acoust. Soc. Am, 120: 2724–2736, 2006. F.E. Toole. Sound Reproduction: Loudspeakers and Rooms. Focal Press, 2008. A. Unsöld. Beiträge zur quantenmechanik der atome. Annalen der Physik, 387(3): 355–393, 1927. M. Van der Wal, E.W. Start, and D. de Vries. Design of logarithmically spaced constant-directivity transducer arrays. J. Audio Eng. Soc, 44:497–507, 1996. H.L. Van Trees. Optimum Array Processing (Detection, Estimation, and Modulation Theory, Part IV). New York: John Wiley & Sons, Inc, 2002. 168 Bibliography B.D. Van Veen and K.M. Buckley. Beamforming: A versatile approach to spatial filtering. IEEE ASSP Magazine, 5(2):4–24, 1988. E.N.G Verheijen. Sound Reproduction by Wave Field Synthesis. PhD thesis, Delft University of Technology, 1997. H. Wallach, E.B. Newman, and M.R. Rosenzweig. The precedence effect in sound localization. The American Journal of Psychology, 62(3):315–336, 1949. F. Wang, V. Balakrishnan, P.Y. Zhou, J.J. Chen, R. Yang, and C. Frank. Optimal array pattern synthesis using semidefinite programming. IEEE Transactions on Signal Processing, 51(5):1172–1183, 2003. D.B. Ward, R.A. Kennedy, and R.C. Williamson. Theory and design of broadband sensor arrays with frequency invariant far-field beam patterns. J. Acoust. Soc. Am, 97(2):1023–1034, 1995. T. Welti and A. Devantier. Low-frequency optimization using multiple subwoofers. J. Audio Eng. Soc, 54(5):347–364, 2006. E.G. Williams. Fourier Acoustics. Academic Press, 1999. R.J. Wilson, M.D. Capp, and J.R. Stuart. The loudspeaker-room interface-controlling excitation of room modes. In Proc. of the 23rd AES Conf, 2003. Yamaha. Sound Bar / Digital Sound Projector. http://www.yamaha.com, 2011. S. Yan and Y. Ma. Design of fir beamformer with frequency invariant patterns via jointly optimizing spatial and frequency responses. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. M. Yoneyama, J. Fujimoto, Y. Kawamo, and S. Sasabe. The audio spotlight: An application of nonlinear interaction of sound waves to a new type of loudspeaker design. J. Acoust. Soc. Am, 73(5):1532–1536, 1983. E. Zwicker and H. Fastl. Psychoacoustics: Facts and Models, volume 22. Springer Verlag, 1999. Curiculum Vitæ Mihailo Kolundžija Audiovisual Communications Laboratory (LCAV) Swiss Federal Institute of Technology (EPFL) 1015 Lausanne, Switzerland Personal Date of birth: Nationality: Civil status: March 28, 1981 Serbian Single Education 2007–2011 2005–2007 1999–2004 PhD candidate in School of Computer and Communication Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland MSc in Communication Systems, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland Dipl. Ing. in Electrical Engineering and Computer Science, Faculty of Technical Sciences, Novi Sad, Serbia Professional experience 04/2007–present 07/2010–09/2010 03/2004–08/2004 07/2002–09/2002 Research and teaching assistant, Audiovisual Communications Laboratory (LCAV), Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland Software engineer intern, Google Inc., Mountain View, USA Research intern, Micronas GmbH, Freiburg, Germany Research intern, Montanuniversität Leoben, Leoben, Austria 169 Curiculum Vitæ 170 Publications Journal papers 1. 2. M. Kolundžija, C. Faller, and M. Vetterli. Reproducing Sound Fields Using MIMO Acoustic Channel Inversion. Journal of the Audio Engineering Society. Nov. 2011. M. Kolundžija, C. Faller, and M. Vetterli. Spatio-Temporal Gradient Analysis of Differential Microphone Arrays. Journal of the Audio Engineering Society. Feb. 2011. Conference papers 1. 2. 3. 4. 5. 6. 7. M. Kolundžija, C. Faller, and M. Vetterli. Design of a Compact Cylindrical Loudspeaker Array for Spatial Sound Reproduction. AES 130th Convention. May 2011 M. Kolundžija, C. Faller, and M. Vetterli. Sound Field Recording by Measuring Gradients. AES 128th Convention. May 2010 M. Kolundžija, C. Faller, and M. Vetterli. Baffled Circular Loudspeaker Array With Broadband High Directivity. IEEE International Conference on Acoustics, Speech, and Signal Processing. Mar. 2010 M. Kolundžija, C. Faller, and M. Vetterli. Designing Practical Filters For Sound Field Reconstruction. AES 127th Convention. Oct. 2009 M. Kolundžija, C. Faller, and M. Vetterli. Sound Field Reconstruction: An Improved Approach For Wave Field Synthesis. AES 126th Convention. May 2009 M. Kolundžija, C. Faller, and M. Vetterli. Spatio-Temporal Gradient Analysis of Differential Microphone Arrays. AES 126th Convention. May 2009 C. Faller and M. Kolundžija. Design and Limitations of Non-Coincidence Correction Filters for Soundfield Microphones. AES 126th Convention. May 2009 Awards and honors 2011 2007 2003 & 2005 2004 2003 130th AES Convention Technical Student Paper Award Landry prize of the EPFL for a master’s thesis work Mileva Marić Einstein prize of the University of Novi Sad, given to the best student in computer science and engineering Best student award of the Faculty of Technical Sciences Royal Norwegian Embassy in Belgrade award, given to the top 500 Serbian students Languages English (fluent), German (good), French (good), Czech (fair), Italian (basic), Serbian (native)