Spatial Acoustic Signal Processing

Transcription

Spatial Acoustic Signal Processing
Spatial Acoustic Signal Processing
THÈSE NO 5240 (2012)
PRÉSENTÉE le 16 janvier 2012
À LA FACULTÉ INFORMATIQUE ET COMMUNICATIONS
LABORATOIRE DE COMMUNICATIONS AUDIOVISUELLES
PROGRAMME DOCTORAL EN INFORMATIQUE, COMMUNICATIONS ET INFORMATION
ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
POUR L'OBTENTION DU GRADE DE DOCTEUR ÈS SCIENCES
PAR
Mihailo Kolundžija
acceptée sur proposition du jury:
Prof. P. Thiran, président du jury
Prof. M. Vetterli, Dr C. Faller, directeurs de thèse
Prof. K. Brandenburg, rapporteur
Prof. D. de Vries, rapporteur
Prof. M. Hasler, rapporteur
Suisse
2012
ii
Abstract
A sound field on a line or in a plane has an effectively limited spatial bandwidth
determined by the temporal frequency. Similar can be said for sound fields from farfield sources when analyzed on circular and spherical apertures. Namely, for a given
frequency and aperture size, a sound field is effectively composed of a finite number
of circular or spherical harmonic components. Based on these two observations, it
follows that if adequately sampled, sound fields can be represented and manipulated
in a digital domain with negligible loss of information. The optimal sampling surface
depends on the problem geometry, and the set of sampling points needs to be in
accordance with the Nyquist criterion relative to the mentioned effective sound field
bandwidth.
In this thesis, we address the problems of sound field capture and reproduction
from a practical perspective. More specifically, we present approaches that do not depend on acoustical models, but rely instead on obtaining an acoustic MIMO channel
between transducers (microphones or loudspeakers) and a set of sampling (or control) points. Subsequently, sound field capture and reproduction are formulated as
constrained optimization problems in a spatially discrete domain and solved using
conventional numerical optimization tools.
The first part of the thesis deals with spatial sound capture. We present a framework for analyzing and designing differential microphone arrays based on spatiotemporal sound field gradients. We also show how to record two- and three-dimensional
sound fields with differential, circular, and spherical microphone arrays. Finally, we
use the mentioned discrete optimization for computing filters for directional and sound
field microphone arrays.
In the second part of the thesis, we focus on spatial sound reproduction. We
first present a design of a baffled loudspeaker array for reproducing sound with high
directivity over a wide frequency range, which combines beamforming at low, and
scattering from a rigid baffle at high frequencies. We next present Sound Field Reconstruction (SFR), which is an approach for optimally reproducing a desired sound
field in a wide listening area by inverting a discrete, MIMO acoustic channel. In the
end, we propose a single- and multi-channel low-frequency room equalization method,
formulated as a discrete constrained optimization problem, with constraints designed
to prevent excessive equalization filter gains, localization bias, and temporal distortions in the form of pre- and post-echos.
Keywords: Microphone arrays, loudspeaker arrays, gradient microphones, differential microphone arrays, sound field microphones, sound field capture, directional
iii
iv
Abstract
sound reproduction, beamforming, sound field reproduction, room equalization.
Zusammenfassung
Ein auf einer Geraden oder einer Ebene observiertes Schallfeld hat eine limitierte
Bandbreite in der räumlichen Dimension, bestimmt durch die Frequenz der sich ausbreitenden Schallwellen. Gleiches kann über Schallfelder von Fernfeldquellen gesagt
werden, welche auf Kreisen oder Kugeloberflächen observiert werden. Das heisst,
für eine gegebene Frequenz und Kreis- oder Kugelgrösse gibt es eine endliche Zahl
von Kreis- oder Kugelflächenfunktionen, die das Schallfeld bestimmen. Aus diesen
beiden Eigenschaften folgt, dass man Schallfelder diskret räumlich abtasten kann,
ohne nennenswerten Informationsverlust. Die optimale Abtastfläche hängt von der
Problemstellungsgeometrie ab. Die Abtastpunkte erfüllen das Nyquist-Shannonsche
Abtasttheorem.
Diese Arbeit befasst sich mit Problemstellungen der Aufnahme und Wiedergabe
von Schallfeldern. Wir stellen Methoden vor, die nicht auf analytischen akustischen
Modellen basieren, sondern auf Messungen der akustischen Transferfunktionen zwischen den akustischen Wandlern (Mikrophone oder Lautsprecher) und den definierten Schallfeldabtastpunkten. Die Schallfeldaufnahme und Wiedergabe kann dann als
Optimierungsaufgabe mit Nebenbedingungen behandelt werden. Konventionelle numerische Optimierungstechniken können angewandt werden, um Lösungen zu finden.
Der erste Teil dieser Arbeit behandelt die Aufnahme von Schall und Schallfeldern.
Eine Theorie wird vorgestellt mittels der man differenzielle Mikrofonanordnungen
analysieren und entwerfen kann. Die Theorie basiert auf zeitlich-räumlichen Schallfeldgradienten. Wir zeigen auch, wie man Schallfelder in zwei und drei Dimensionen
mittels differentiellen, zirkularen, und sphärischen Mikrofonanordnungen aufnehmen
kann. Schliesslich zeigen wir, wie die erwähnte Methode der Schallfeldabtastung mit
Optimierung benutzt wird, um Filter für gerichtete Mikrofonanordnungen und Mikrofonanordnungen zur Schallfeldaufnahme zu berechnen.
Der zweite Teil dieser Arbeit befasst sich mit der Wiedergabe von Schall und
Schallfeldern. Zuerst stellen wir eine in einen zylindrischen Streukörper eingebettete
Kreislautsprecheranordnung vor, um Schall sehr gerichtet wiederzugeben. Optimierte
Keulenformung unter Berücksichtigung der auftretenden Streuung am Zylinder bei
höheren Frequenzen ermöglicht hohe Richtwirkung über einen grossen Frequenzbereich. Wir stellen auch eine Methode zur Wiedergabe von ganzen Schallfeldern vor.
Um das Schallfeld präzise in einem grossen Bereich wiederzugeben, wird das Mehrgrössensystem, definiert zwischen der Lautsprecheranordnung und den Schallfeldabtastpunkten, invertiert. Schliesslich stellen wir noch ein Verfahren vor, welches mittels
Tieffrequenzabgleichung ein- und mehrkanalige Lautsprecheranordnungen durch diskrete Optimierung an einen Raum anpasst. Wieder verwenden wir diskrete Abtastung
v
vi
Zusammenfassung
des Schallfeldes kombiniert mit Optimierung mit Nebenbedingungen, um Filter ohne
übermässige Verstärkung gewisser Frequenzen zu erhalten und um störende Vorechos
und Echos zu vermeiden.
Schlagworte: Mikrofonanordnungen, Lautsprecheranordnungen, Druckgradientenmikrofone, differenzielle Mikrofonanordnungen, Schallfeldmikrofone, Schallfeldaufnahme, gerichtete Schallwiedergabe, Keulenformung, Schallfeldwiedergabe, Raumentzerrung.
Acknowledgments
First and foremost, I would like to thank my two supervisors, Prof. Martin Vetterli
and Dr. Christof Faller. Martin was a source of great ideas from the first day on, had
all the patience to let me find my way, and believed in my work probably much more
than I did, for which I am very grateful. He is an erudite scientist, brilliant teacher
and supervisor, and a great source of positive energy and humor.1 Christof showed
me how to take things pragmatically and practically, how to focus on realistic goals
and make things work in the end. On a more personal note, he and his family have
been great friends over the last four years, and in the moments of honesty,2 he dared
admitting that I was his only drinking friend (which I took both as a compliment and
criticism).
I would also like to express my gratitude to Professors Diemer de Vries, Karlheinz
Brandenburg, Martin Hasler, and Patrick Thiran for accepting to be on my thesis
committee and giving me useful comments and remarks.
I owe many thanks to our lab’s secretaries, Jocelyne and Jacqueline, for all the
help and for greatly organizing the lab, and to all the colleagues and friends in LCAV
for the great time on and off work. The list may not be comprehensive, but I will try:
Ali, Amina, Andreas, Christophe, Clément, Florence, Francisco, Guillermo, Gunnar,
Ivan, Ivana, Juri, Jay, Patrick, Pedro, RK, Roberto, Simon, Yann, and Zichong. I am
also thankful to my office-mate Feng for all the interesting conversations and jokes,
and for some great pieces of Chinese wisdom that helped me adjust some of my views
in life.
I am grateful to my friends at and outside of EPFL for all the good time we
had together. The list may not be complete: Aleksandar (3x), Ana, Andrei, Bojana,
Danica, Daniel, Dejan, Duško, Ivana (2x), Jelena (2x), Jugoslava, Marija, Marko,
Miloš, Mirjana, Miroslav, Nedeljko, Nenad, Nevena, Nikodin, Nikola, Radu, Roberto,
Ružica, Smiljka, Tamara, Tanja, Viktor, Violeta, Vladan, Vojin, and Zorana.
I am thankful to Veronika for the time we spent together and for putting up with
me almost until the end of this thesis.
I am grateful to my cousin Branka, her husband Nebojša, and their children Marija, Dragiša, and Nataša for all the feasts and haircuts, and the great time we had
during my visits to Bern.
Last but not least, I am grateful to my parents, Nikola and Jovanka, and my sister
Marija, for their unconditional love and support during all these years.
1 This
2 In
is a typical east-European understatement.
vino veritas.
vii
viii
Acknowledgments
Contents
Abstract
iii
Zusammenfassung
v
Acknowledgments
vii
1 Introduction
1.1 Thesis motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Acoustics Fundamentals
2.1 Fundamental acoustic equations . . . . . . . . . . . . .
2.1.1 Euler’s equation . . . . . . . . . . . . . . . . .
2.1.2 The acoustic wave equation . . . . . . . . . . .
2.2 Point source and Green’s function . . . . . . . . . . .
2.2.1 Point source . . . . . . . . . . . . . . . . . . . .
2.2.2 Green’s function . . . . . . . . . . . . . . . . .
2.2.3 Time-dependent Green’s function . . . . . . . .
2.2.4 General solutions of the acoustic wave equation
2.3 Helmholtz integral equation . . . . . . . . . . . . . . .
2.3.1 Rayleigh’s integrals . . . . . . . . . . . . . . . .
2.4 Plane waves . . . . . . . . . . . . . . . . . . . . . . . .
2.4.1 Evanescent waves . . . . . . . . . . . . . . . . .
2.4.2 The angular spectrum . . . . . . . . . . . . . .
2.5 Cylindrical waves . . . . . . . . . . . . . . . . . . . . .
2.5.1 Boundary value problems . . . . . . . . . . . .
2.5.2 Helical wave spectrum . . . . . . . . . . . . . .
2.5.3 Rayleigh’s integral . . . . . . . . . . . . . . . .
2.5.4 Piston in a cylindrical baffle . . . . . . . . . . .
2.5.5 Scattering from rigid cylinders . . . . . . . . .
2.6 Spherical waves . . . . . . . . . . . . . . . . . . . . . .
2.6.1 Boundary value problems . . . . . . . . . . . .
2.6.2 Spherical wave spectrum . . . . . . . . . . . . .
2.6.3 Scattering from rigid spheres . . . . . . . . . .
2.7 Room acoustics . . . . . . . . . . . . . . . . . . . . . .
2.7.1 Wave theory of room acoustics . . . . . . . . .
ix
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
3
5
5
6
6
7
7
7
7
8
9
9
11
12
12
13
14
16
16
17
19
20
21
23
25
25
26
Contents
x
.
.
.
.
.
.
.
.
.
.
.
.
30
31
34
35
36
38
3 Microphone Arrays For Directional Sound Capture
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.2 Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Differential microphone arrays . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Spatial derivatives of a far-field sound pressure field . . . . .
3.2.2 Spatio-temporal derivatives of a far-field sound pressure field
3.2.3 Differential microphone arrays . . . . . . . . . . . . . . . . .
3.3 Directional microphone arrays as acoustic beamformers . . . . . . .
3.3.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
45
45
45
47
48
48
48
51
56
59
59
2.8
2.7.2 Statistical room acoustics
2.7.3 Geometrical acoustics . .
2.7.4 Reverberation time . . . .
2.7.5 Critical distance . . . . .
Acoustic beamforming . . . . . .
2.8.1 Beamformer filter design .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4 Microphone Arrays For Sound Field Capture
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.2 Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Wave field decomposition . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Horizontal sound field decomposition . . . . . . . . . . . . . . .
4.2.2 Three-dimensional sound field decomposition . . . . . . . . . .
4.3 Measuring a horizontal sound field with gradient microphone arrays .
4.3.1 Gradient-based horizontal sound field microphones . . . . . . .
4.4 Circular microphone arrays . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1 Continuous circular microphone apertures . . . . . . . . . . . .
4.4.2 Sampling circular microphone apertures . . . . . . . . . . . . .
4.5 Spherical microphone arrays . . . . . . . . . . . . . . . . . . . . . . . .
4.5.1 Continuous spherical microphone apertures . . . . . . . . . . .
4.5.2 Sampling spherical microphone apertures . . . . . . . . . . . .
4.6 Sound field microphones as acoustic beamformers . . . . . . . . . . . .
4.6.1 Filter design for a circular microphone array mounted on a rigid
cylinder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6.2 Soundfield microphone non-coincidence correction filter design
4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5 Baffled Loudspeaker Array For Spatial Sound Reproduction
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.2 Chapter outline . . . . . . . . . . . . . . . . . . . . . . .
5.2 Acoustical design . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.1 Baffled loudspeaker model . . . . . . . . . . . . . . . . .
5.3 Beamformer design . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.1 Filter design procedure . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
61
61
61
62
62
63
66
67
69
73
74
79
80
81
86
88
90
92
96
99
99
100
101
101
101
104
106
Contents
5.4
5.5
5.6
5.7
Simulations .
Experiments .
Applications .
Conclusions .
xi
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
108
111
114
116
6 Reproducing Sound Fields Using MIMO Acoustic Channel Inversion
117
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.1.2 Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.2 Sound Field Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . 120
6.2.1 Plenacoustic sampling and interpolation . . . . . . . . . . . . . 120
6.2.2 Sound Field Reconstruction using MIMO channel inversion . . 122
6.2.3 Practical extensions of Sound Field Reconstruction . . . . . . . 124
6.2.4 Designing discrete-time filters for Sound Field Reconstruction . 127
6.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.3.1 Sound field snapshot analysis . . . . . . . . . . . . . . . . . . . 131
6.3.2 Impulse response analysis . . . . . . . . . . . . . . . . . . . . . 134
6.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.4 Practical considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.4.1 Computational complexity . . . . . . . . . . . . . . . . . . . . . 139
6.4.2 Performing system measurements . . . . . . . . . . . . . . . . . 140
6.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
7 Multichannel Room Equalization Considering Psychoacoustics
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1.1 Chapter outline . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Proposed room equalization . . . . . . . . . . . . . . . . . . . . . .
7.2.1 Problem description . . . . . . . . . . . . . . . . . . . . . .
7.2.2 Desired response calculation . . . . . . . . . . . . . . . . . .
7.2.3 Choice of a cost function . . . . . . . . . . . . . . . . . . .
7.2.4 Equalization filter constraints . . . . . . . . . . . . . . . . .
7.2.5 Filter computation procedure . . . . . . . . . . . . . . . . .
7.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3.1 MIMO equalization . . . . . . . . . . . . . . . . . . . . . . .
7.3.2 SIMO equalization . . . . . . . . . . . . . . . . . . . . . . .
7.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
143
143
144
145
145
146
147
148
152
153
153
154
154
157
8 Conclusion
159
Bibliography
161
Curiculum Vitæ
169
xii
Contents
Chapter 1
Introduction
1.1
Thesis motivation
Recent work by Ajdler et al. (2006) has shown that sound fields are effectively bandlimited in the Fourier basis on linear and planar geometries, where the spatial effective bandwidth is determined by the temporal frequency. Similarly, a two- or
three-dimensional sound field from far-field sound sources analyzed on a circular or
spherical aperture has effectively a finite number of non-zero circular or spherical harmonics, respectively (e.g., see Ajdler et al., 2008; Rafaely et al., 2007b). The effective
number of harmonics is dependent on the temporal frequency and aperture radius.
The above observations have important implications for sound field capture, representation, and reproduction. Namely, it is well known that signals of finite bandwidth
can be sampled without loss of information, provided that sampling is done according to the Nyquist criterion. Thus, given the geometry of a problem, an appropriate
choice of a sampling strategy converts the problem of sound field capture or reproduction into a discrete domain. The advantages of working with sampled sound fields
are manifold.
• One avoids dependence on idealized models, commonly used when deriving
sound field capture and reproduction strategies from the acoustic principles such
as Helmholtz integral equation and Rayleigh’s integrals, boundary value problems, and scattering from rigid bodies.1 Instead, one can measure microphone
or loudspeaker array’s characteristics using an appropriately designed sampling
strategy, and implicitly account for any effects which make them inconsistent
with a theoretical model.
• One can formulate the tasks of sound field capture and reproduction in terms
of a well-known array signal processing framework (Van Trees, 2002; Johnson
and Dudgeon, 1993) and solve them efficiently using the tools for numerical
optimization (Boyd and Vandenberghe, 2004).
• Physical limitations or psychoacoustical observations relevant to the analyzed
1 The
listed acoustic principles are presented in Chapter 2.
1
Introduction
2
loudspeaker
array
Figure 1.1: (a) Directional sound capture; (b) sound field capture; (c) directional
sound reproduction; (d) sound field reproduction.
problem can be made part of the numerical optimization procedure, e.g., through
constraints or as a part of the cost function.
We would like to particularly stress the last bullet point, since it gives an additional
flexibility and advantage over purely physical approaches.
Given these observations, we expect that working with discretized sound fields
is a flexible way to achieve the optimal sound field capture and reproduction for a
particular capture or reproduction system, taking into account its characteristics and
including its limitations.
This thesis describes the application of discretized sound field processing through
constrained numerical optimization to the following problems in acoustics:
• Directional sound capture with microphone arrays, illustrated in Figure 1.1(a)
• Sound field capture with microphone arrays, illustrated in Figure 1.1(b)
• Directional sound reproduction with baffled loudspeaker arrays, illustrated in
Figure 1.1(c)
• Sound field reproduction with loudspeaker arrays, illustrated in Figure 1.1(d)
• Single-channel and multichannel room equalization.
1.2 Thesis outline
1.2
3
Thesis outline
In Chapter 2, we give an overview of the acoustics fundamentals relevant for the
problems presented in later chapters. We give a review of the acoustic wave equation and its eigenfunction decompositions in different coordinate systems, Helmholtz
integral equation, and radiation and scattering from rigid cylinders and spheres. In
addition, we present the room acoustics fundamentals relevant for room equalization,
and we give a detailed account of acoustic beamforming.
In Chapter 3, we describe directional sound capture. We give a framework for
the analysis and design of differential and gradient microphone arrays, which is based
on analyzing spatio-temporal gradients of a sound field from far-field sources (Kolundžija et al., 2009a, 2011a). We also show how directional microphone arrays can be
designed by discretizing microphones’ directional responses and formulating and solving a numerical optimization problem for synthesizing a desired directional response.
We verify the effectiveness of this approach through simulations.
Chapter 4 focuses on microphone arrays for sound field capture. We relate decompositions of two- and three-dimensional sound fields in terms of circular and spherical
harmonics, respectively, to sound capture with directional microphones. We show a
way to capture a complete representation of a two-dimensional sound field with gradient microphones (Kolundžija et al., 2010b). In addition, we show how to capture twoand three-dimensional sound fields using unbaffled and baffled circular and spherical
arrays of pressure or first-order gradient microphones. Similarly to the directional microphone arrays, we show how filters for sound field microphone arrays can be designed
by discretizing the microphones’ directional responses and formulating and solving a
numerical optimization problem for desired response synthesis. Using simulations, we
show how this approach can be used to measure circular harmonic decomposition with
a circular microphone array, and to obtain the so-called non-coincidence correction
filters for the Soundfield microphone (Faller and Kolundžija, 2009).
In Chapter 5, we describe a design of a baffled loudspeaker array used for highly
directional sound reproduction over a wide frequency range (Kolundžija et al., 2010a,
2011b). Our design is based on optimal beamforming in the magnitude sense for directional reproduction at low frequencies, and scattering of sound from rigid bodies for
directional reproduction at high frequencies. We show that for directional sound reproduction, a baffled loudspeaker delivers an essentially consistent directional performance at high frequencies. In addition, we use beamforming with multiple loudspeakers at low frequencies to synthesize the directional response of a single loudspeaker at
high frequencies. We show the feasibility of our approach through simulations of an
acoustical model, and through experiments with a prototype loudspeaker array.
Chapter 6 describes an approach for reproducing sound fields, termed Sound Field
Reconstruction (SFR), motivated by the essential spatio-temporal band-limitedness
of sound fields (Kolundžija et al., 2009b,c, 2011c). This property allows expressing
sound field reproduction as an inversion of the MIMO acoustic channel between a
loudspeaker array and a grid of control points constrained by the Nyquist sampling
criterion. The use of a MIMO channel inversion based on truncated singular value decomposition (SVD) of the acoustic channel matrix allows for the optimal reproduction
accuracy subject to a limited effort constraint. We present a detailed procedure for
obtaining loudspeaker driving signals that involves selection of active loudspeakers,
4
Introduction
coverage of the listening area with control points, and frequency-domain FIR filter
design. Through extensive simulations comparing SFR with Wave Field Synthesis,
we show that on average SFR provides higher sound field reproduction accuracy.
In Chapter 7, we consider the problem of multiple-loudspeaker low-frequency
room equalization for a wide listening area, where the equalized loudspeaker is helped
using the remaining ones. Using a spatial discretization of the listening area, we
formulate the problem as a multipoint error minimization between desired and synthesized magnitude frequency responses. The desired response and cost function are
formulated with a goal of capturing the room’s spectral power profile, and penalizing strong resonances. Considering physical and psychoacoustical observations, we
argue for the use of gain-limited, short, and well-localized equalization filters, with
an additional delay for loudspeakers that help the equalized one. We propose an optimization framework for computing room equalization filters, where the mentioned
filter requirements are incorporated as convex constraints. We verify the effectiveness
of our equalization approach through simulations.
Chapter 8 gives a summary of the thesis and some directions for future work.
Chapter 2
Acoustics Fundamentals
In this chapter, we give an overview of the acoustic concepts that are used throughout the remaining chapters of this thesis. Without trying to thoroughly present strict
derivations of every concept, we start from the fundamental principles, meaning fundamental acoustic equations, and uncover ways that lead from those fundamentals
to some of the more sophisticated acoustic concepts, including the acoustic integral
equations, plane wave decomposition, radiation and scattering from cylindrical and
spherical bodies, and some important aspects of the sound fields formed in rooms.
This chapter is organized as follows. Section 2.1 presents two important equations
of linear acoustics—Euler’s equation and the acoustic wave equation. The related concepts of a point source and Green’s function are described in Section 2.2. Section 2.3
presents Helmholtz integral equation and Rayleigh’s integrals, obtained by introducing integral theorems to the realm of acoustics. The concept of plane waves, used
very often in analyses of sound fields from distant sources, is discussed in Section 2.4.
Section 2.5 presents the solution to the wave equation in cylindrical coordinates, and
analyzes the problem of sound radiation from cylindrical geometries. Similarly, Section 2.6 discusses the solution to the wave equation in spherical coordinates, and the
related problems of sound scattering and radiation from spheres. Section 2.7 presents
some fundamental properties of sound fields in rooms, which one needs to be aware of
when designing systems that alter the effects of room acoustics. Finally, Section 2.8
presents spatial filtering or beamforming—a technique used for controlling the spatial characteristic of sound acquisition or radiation with an array of microphones or
loudspeakers, respectively.
2.1
Fundamental acoustic equations
In this section, we present two fundamental equations of linear acoustics. These
equations are obtained under the simplifying assumptions that the non-viscous fluid
medium outside the region occupied by sources is homogeneous and at rest, and that
the acoustic pressure and particle disturbances generated outside the source are small
enough that one can take the first-order approximations of the general equations of
fluid dynamics.
5
Acoustics Fundamentals
6
2.1.1
Euler’s equation
Euler’s equation, obtained by applying the Newton’s second law on an infinitesimal
fluid volume, relates the velocity vector of the fluid particles u(r, t), and the sound
pressure p(r, t) in the point r. It is given by (Morse and Ingard, 1968; Williams,
1999)
∂
ρ0 u(r, t) = −∇p(r, t) ,
(2.1)
∂t
where ρ0 is the fluid density at equilibrium, and ∇p(r, t) is the spatial gradient of the
sound pressure. The spatial gradient in Cartesian coordinates has the form
∇p(x, y, z, t) =
∂
∂
∂
ex +
ey +
ez ,
∂x
∂y
∂z
(2.2)
where ex , ey , and ez are unit vectors that point in positive coordinate directions. The
forms of the spatial gradient in cylindrical and spherical coordinates, given by (2.34)
and (2.62), respectively, are derived by using coordinate system transforms (Morse
and Ingard, 1968).
The acoustic phenomena are commonly analyzed in the steady state. The Fourier
transform is the tool for interchangeably switching between the time and frequency
domain, where the latter serves for the steady-state analysis. The frequency-domain
version of Euler’s equation is obtained by applying the Fourier transform to both sides
of (2.1), and it takes the form
U (r, ω) = −
2.1.2
1
∇P (r, ω) .
iωρ0
(2.3)
The acoustic wave equation
If the considered source-free homogeneous non-viscous fluid medium is initially at
rest, and fluid pressure and particle disturbances are small, the propagation of sound
waves in the medium is governed by the homogeneous acoustic wave equation. The
linear homogeneous acoustic wave equation in the time domain reads (Morse and
Ingard, 1968)
1 ∂ 2 p(r, t)
= 0,
(2.4)
∇2 p(r, t) − 2
c
∂t2
where c is the speed of sound in the fluid medium,1 and ∇2 (r, t) = div grad p(r, t) is
the Laplacian of the sound pressure. As will be seen in later sections of this chapter,
the Laplacian takes on different forms in different coordinate systems.
Like the Euler’s equation, the acoustic wave equation is commonly used in its
steady-state, frequency-domain form, better known as Helmholtz equation. The homogeneous Helmholtz equation is obtained by applying the Fourier transform to (2.4),
and has the form
∇2 P (r, ω) + k 2 P (r, ω) = 0 ,
(2.5)
1 The
speed of sound in air at 20◦ C is 343 m/s.
2.2 Point source and Green’s function
7
where k denotes the acoustic wave number that depends on the angular frequency ω,
or equivalently on the acoustic wave length λ through
k=
ω
2π
=
.
c
λ
(2.6)
The general solutions of the acoustic wave equation can take on different forms. As
will be seen later in the chapter, the solutions are usually expressed as a superposition
of eigenfunctions of the homogeneous wave equation,2 and those are determined by
the geometry and the boundary conditions of the considered problem.
2.2
2.2.1
Point source and Green’s function
Point source
A point source is a model of an infinitesimally small volume acting as a source of
energy that produces acoustic waves. For a given frequency ω, it is defined as the
solution of the inhomogeneous Helmholtz equation for an unbounded medium (Morse
and Ingard, 1968)
∇2 gω (r|r 0 ) + k 2 gω (r|r 0 ) = −δ(r − r 0 ) ,
(2.7)
where r denotes the observation point, r 0 the position of the point source, and δ(r−r 0 )
is the three-dimensional Dirac delta function.
2.2.2
Green’s function
The solution of (2.7) is also known under the name free-field Green’s function, and
it represents the spatial kernel of the wave equation. The free-field Green’s function
has the form (Morse and Ingard, 1968)
0
eikkr−r k
.
gω (r|r ) =
4πkr − r 0 k
0
(2.8)
Note that a general solution of the inhomogeneous Helmholtz equation has the
form
Gω (r|r 0 ) = gω (r|r 0 ) + χω (r) ,
(2.9)
where χω (r) is any solution to the homogeneous Helmholtz equation (2.5). The
solution χω (r) is added to the free-field Green’s function in order to satisfy some
predefined boundary conditions.
2.2.3
Time-dependent Green’s function
The time-dependent free-field Green’s function is the solution of the inhomogeneous
acoustic wave equation (Morse and Ingard, 1968)
∇2 g(r, t|r 0 , t0 ) −
1 ∂ 2 g(r, t|r 0 , t0 )
= −δ(r − r 0 )δ(t − t0 ) ,
c2
∂t2
(2.10)
2 In many cases, when steady-state analysis is used, the solutions are expressed in the frequency
domain as a superposition of the eigenfunctions of the homogeneous Helmholtz equation.
Acoustics Fundamentals
8
where the expression on the right denotes a pulse wave travelling from a point source
at r 0 and starting at t0 . The time-dependent free-field Green’s function takes the
following form:
kr − r 0 k
1
0 0
0
g(r, t|r , t ) =
δ t−t −
.
(2.11)
4πkr − r 0 k
c
Without loss of generality, the pulse-starting instant t0 can be made equal to zero,
giving the time-dependent Green’s function the form
1
kr − r 0 k
gr|r0 (t) = g(r, t|r 0 , 0) =
δ
t
−
,
(2.12)
4πkr − r 0 k
c
which can be viewed as the free-field acoustic channel impulse response between two
points, r 0 and r.
2.2.4
General solutions of the acoustic wave equation
If an arbitrary simple-harmonic source distribution F (r, ω) radiates sound waves into
a fluid medium occupying a volume V0 , the sound pressure distribution inside the
volume V0 and on its boundary surface S0 satisfies the inhomogeneous Helmholtz
equation
∇2 P (r, ω) + k 2 P (r, ω) = −F (r, ω) ,
(2.13)
The solution P (r, ω) of (2.13) is obtained by combining (2.13) and (2.7), and has
the form (Morse and Ingard, 1968)
ZZZ
P (r, ω) =
Gω (r|r0 ) S(r0 , ω) dV0
V0
ZZ
∂
∂
+
P (r, ω) − P (r, ω)
Gω (r|r0 ) dS0 , (2.14)
Gω (r|r0 )
∂n0
∂n0
S0
where n0 denotes an outward-pointing normal to the surface S0 . The sound pressure
distribution defined by (2.14) emanates from a distribution of sources and reflections
from the volume boundary surface.
In case of an unbounded medium, the second term in (2.14) disappears, and the
sound pressure distribution depends only on the active sound sources
ZZZ
P (r, ω) =
Gω (r|rs ) S(rs , ω) dV0 .
(2.15)
V0
If the source distribution is defined as a function f (r, t) of space and time, the
spatio-temporal sound pressure distribution is obtained using the time-dependent
Green’s function. For instance, in the case of an unbounded medium, the sound
pressure distribution is obtained as follows:
ZZZ Z ∞
p(r, t) =
f (r0 , t0 ) g(r, t|r0 , t0 ) dt0 dV0 .
(2.16)
V0
−∞
2.3 Helmholtz integral equation
2.3
9
Helmholtz integral equation
Helmholtz integral equation (HIE) relates the pressure inside a source-free volume to
the pressure and normal component of the particle velocity on the volume’s boundary
surface. Broadly speaking, there are two types of HIE: interior and exterior.
Figure 2.1: The haded regions represent the source-free volume V where the sound
pressure field is computed. (a) Interior HIE illustration, where the sound sources are
outside of V ; (b) Illustration of exterior HIE, where the volume V encloses the sound
sources and spreads to infinity.
In the interior HIE, illustrated in Figure 2.1(a), one is interested in the sound
pressure field inside of a finite source-free volume V , while the radiating sound sources
are located outside of V .
On the other hand, exterior HIE, shown in Figure 2.1(b), relates to problems
where a source-free domain V spreads to infinity, and its interior boundary surfaces
∂V1 , ∂V2 , etc. enclose the sound sources that evoke the analyzed field. Also, the
boundary surface ∂V represents the union of all source-enclosing boundary surfaces
∂Vi and the space-enclosing infinite boundary surface ∂V∞ :
∂V = ∂V1 ∪ ∂V2 ∪ · · · ∪ ∂V∞ .
Both the interior and exterior HIEs are derived using Green’s second identity and
the inhomogeneous Helmholtz equation, and have the following form (Williams, 1999):
ZZ
∂
0
P (r , ω) =
P (r, ω)
Gω (r|r 0 ) − iρ0 ω Gω (r|r 0 ) Vn (r, ω) d(∂V ) .
(2.17)
∂n
∂V
In (2.17), ∂V denotes the bounding surface of the volume V , Gω (r|r 0 ) is Green’s
function, Vn (r, ω) is the projection of the particle velocity vector onto the inward
∂
pointing boundary surface normal n, and ∂n
stands for the operation of taking a
directional derivative along the direction of n.
2.3.1
Rayleigh’s integrals
Rayleigh’s integrals are a special form of HIE, obtained from (2.17) by taking a particular volume V and a suitable choice of Green’s function Gω (r|r 0 ).
Acoustics Fundamentals
10
Under the assumption that all radiating sound sources are located in the halfspace z < 0, the interior HIE is applied to the volume V that covers the half-space
z > 0. In these circumstances, the integration surface ∂V includes the plane z = 0
and the infinite hemisphere centered at the origin and “enclosing” the half space z > 0.
Figure 2.2 illustrates the particular domain used for deriving Rayleigh’s integrals.
Figure 2.2: Shaded regions represent the particular half-space source-free volume V
used for deriving Rayleigh’s integrals.
Assuming that Green’s function satisfies the Sommerfeld radiation condition3
∂
lim krk
− ik Gω (r|r 0 ) = 0 ,
(2.18)
∂krk
krk→∞
the surface integral over the bounding surface ∂V∞ vanishes, turning the HIE into
ZZ
∂
0
P (r , ω) =
P (r, ω)
Gω (r|r 0 ) − iρ0 ω Gω (r|r 0 ) Vn (r, ω) d(∂Vxy ) , (2.19)
∂n
∂Vxy
where ∂Vxy denotes the xy-plane.
Rayleigh’s I integral
Rayleigh’s I integral is a particular form of (2.19), where Green’s function Gω (r|r 0 )
∂
Gω (r|r 0 ) vanishes in the xy-plane. This
is chosen such that the term P (r, ω) ∂n
particular Green’s function has the form
0
0
eikkr−r k
eikkr−rM k
Gω (r|r ) =
+
0 k ,
4πkr − r 0 k 4πkr − rM
0
(2.20)
0
where the vector rM
= (x, y, −z) is the mirror image of r 0 = (x, y, z) in the xy-plane.
After substituting (2.20) in (2.19), Rayleigh’s I integral takes the form
ZZ
P (r 0 , ω) = −2
iρ0 ω Gω (r|r 0 ) Vn (r, ω) d(∂Vxy ) .
(2.21)
∂Vxy
3 The
Sommerfeld radiation condition in two dimensions reads
»
–
1
∂
lim krk 2
− ik Gω (r|r 0 ) = 0 .
krk→∞
∂krk
2.4 Plane waves
11
Rayleigh’s II integral
Unlike Rayleigh’s I integral, where Green’s function derivative term was made zero in
the xy-plane, for obtaining Rayleigh’s II integral, Green’s function Gω (r|r 0 ) is chosen
such that the term iρ0 ω Gω (r|r 0 ) Vn (r, ω) vanishes in the xy-plane. This choice of
Green’s function is given by
0
Gω (r|r 0 ) =
0
eikkr−r k
eikkr−rM k
−
0 k ,
4πkr − r 0 k 4πkr − rM
(2.22)
0
where the vector rM
is the same mirror image as in (2.20).
Rayleigh’s II integral is given by
ZZ
∂
Gω (r|r 0 ) d(∂Vxy )
P (r 0 , ω) = 2
P (r, ω)
∂n
∂Vxy
ZZ
0
eikkr−r k
ik
= 2
P (r, ω) 1 + kr−r
cos ϑ
d(∂Vxy ) , (2.23)
0k
4πkr − r 0 k
∂Vxy
where ϑ is the angle between the vector r 0 − r and the z axis, as shown in Figure 2.2.
Discussion
Rayleigh’s I and II integrals offer a practical interpretation, which is the essence of
some sound field reproduction approaches, like Wave Field Synthesis discussed in
Chapter 6.
In case of Rayleigh’s I integral, it can be seen that a sound field emanating from
sound sources in the half space z < 0 can be reproduced by driving omnidirectional
point sources in the xy-plane with signals proportional to the z-component of the
particle velocity vector Vz (r, ω) in the xy-plane.
Rayleigh’s II integral can be interpreted slightly differently. There, a sound field
from radiating sound sources in the half-space z < 0 can be reproduced by driving
dipole4 sources in the xy-plane with signals proportional to the sound pressure P (r, ω)
in the xy-plane.
2.4
Plane waves
Plane waves are simple-harmonic functions of space and time, obtained by solving
the homogeneous acoustic wave equation in Cartesian coordinates, which is given by
∂2p
∂2p ∂2p
1 ∂2p
+ 2 + 2 − 2 2 = 0.
2
∂x
∂y
∂z
c ∂t
(2.24)
They are usually expressed as more general, analytic (or complex) functions of the
spatial coordinate r = (x, y, z) and time t,
p(r, t) = P0 ei(k
T
r−ωt)
,
(2.25)
4 A dipole source has a bidirectional (or figure-of-eight) radiation characteristic, i.e., the waves
radiated by the source towards direction ϑ relative to its axis have an amplitude proportional to
cos ϑ.
Acoustics Fundamentals
12
where P0 is a complex amplitude that accounts for the magnitude and phase at the
origin, ω is the angular frequency, and k = (kx , ky , kz ) is the wave vector having the
notion of a three-dimensional spatial frequency. The wave vector components, kx ,
ky , and kz , denoting the spatial frequencies along the axes x, y, and z, respectively,
satisfy
q
ω
kx2 + ky2 + kz2 = k = .
(2.26)
c
An equivalent way to represent a plane wave is through its Fourier transform with
respect to time
T
P (r, ω) = 2π P0 δ(ω − ω0 ) eik r .
(2.27)
2.4.1
Evanescent waves
Propagating plane waves, for which all the spatial frequencies kx , ky , and kz have real
values, are characterized by harmonic oscillations of sound pressure with the same
amplitude at any point in space. However, the wave vectors k = (kx , ky , kz ) with
real-valued components are not the only solutions of the homogeneous acoustic wave
equation. For instance, even if kx2 + ky2 > k 2 , the acoustic wave equation is satisfied
when
q
kz = i kx2 + ky2 − k 2 = i kz0 .
(2.28)
This particular case defines an evanescent wave, which takes the form
0
p(r, t) = P0 e−kz z ei(kx x+ky y) .
(2.29)
The evanescent wave, defined by (2.29), is a plane wave that propagates parallel to
the xy plane, in the direction kx ex + ky ey , while its magnitude decays exponentially
in coordinate z.
The evanescent waves are important in the analysis of vibrating structures and
wave transmission and reflection, as they develop close to the surface of a vibrating
structure and on boundaries between two differing media. However, in the problems of
sound field reproduction or capture of sound waves from distant sources, the spatially
ephemeral evanescent waves have less relevance.
2.4.2
The angular spectrum
Consider the sound radiation problem used for deriving Rayleigh’s integrals, shown
in Figure 2.2, where the radiating sound sources are located in the half-space z < 0,
and the sound field is analyzed in the other half-space, z ≥ 0. This case somewhat
simplifies the following analysis, but does not sacrifice generality.
It follows directly from the definition of the multidimensional Fourier transform
that any finite-energy field can be seen as a superposition of plane waves. However,
sound fields come with an additional property. In source-free regions, sound fields satisfy the homogeneous Helmholtz equation, which limits the arbitrariness of the spatial
frequencies kx , ky , and kz . In particular, given the value of the spatial frequencies
along directions x and y, kx and ky , respectively, the spatial frequency in direction
z, kz , can not take on an arbitrary value; instead, it needs to satisfy (2.26). Consequently, the plane wave spectrum, also known under the name of angular spectrum,
2.5 Cylindrical waves
13
can be defined as a function P (kx , ky , z, ω) of two spatial frequencies, kx and ky , and
spatial coordinate z.
The angular spectrum then completely determines the sound pressure field through
the inverse Fourier transform given by
ZZ
1
P (r, ω) =
P (kx , ky , z, ω) ei(kx x+ky y+kz z) dkx dky ,
(2.30)
4π 2
with
kz =
q
k 2 − kx2 − ky2 .
(2.31)
For the considered sound radiation problem, it is physically justified to chose positive
real values kz for propagating plane waves, as the radiation in the half-space z ≥ 0
can only take place in the positive z direction.
It should be noted that (2.30) holds for any z ≥ 0, and that the angular spectrum
P (kx , ky , 0, ω), or equivalently the sound pressure field P (x, y, 0, ω) in the plane z = 0,
provides a complete knowledge about the sound field in the entire half-space z ≥ 0.
The relationship is given by
P (kx , ky , z, ω) = P (kx , ky , 0, ω) eikz z ,
(2.32)
where kz is given by (2.31). Naturally, this comes at no surprise, as it was already
shown by Rayleigh’s II integral that knowing the sound pressure field in the plane
z = 0 completely defines the sound field in the upper half-space, defined by z > 0.
2.5
Cylindrical waves
Figure 2.3: Relations between cylindrical and Cartesian coordinates.
The cylindrical coordinate system and its relation the Cartesian coordinate system
is shown in Figure 2.3. In this section, both the sound pressure and particle velocity
are represented as functions of cylindrical coordinates (r, φ, z) and angular frequency
ω.
Euler’s and Helmholtz equation have the basic forms (2.3) and (2.5), respectively.
The Laplace operator in cylindrical coordinates has the form (Morse and Ingard,
1968)
1 ∂
1 ∂2
∂2
∂2
+ 2 2+ 2,
(2.33)
∇2 = 2 +
∂r
r ∂r r ∂φ
∂z
Acoustics Fundamentals
14
while the gradient operator reads
∇=
∂
1 ∂
∂
er +
eφ +
ez .
∂r
r ∂φ
∂z
(2.34)
The unit vectors er , eφ , and ez point in the positive coordinate directions.
The general frequency-domain solution of the homogeneous acoustic wave equation
in cylindrical coordinates can be derived using separation of variables, and has the
form (Williams, 1999)
P (r, φ, z, ω) =
∞
1 X inφ
e
2π n=−∞
Z ∞h
i
An (kz , ω) Hn(1) (kr r) + Bn (kz , ω) Hn(2) (kr r) eikz z dkz ,
(2.35)
−∞
where
kr =
p
k 2 − kz2 .
(1)
In (2.35), the Hankel function of the first kind Hn (kr r) represents an outgoing wave,
(2)
while the Hankel function of the second kind Hn (kr r) stands for an incoming wave.
The coefficients An (kz , ω) and Bn (kz , ω) thus determine the strength of diverging and
converging waves, respectively. Their values depend on boundary conditions, which
are commonly specified on coordinate surfaces.
Equivalently, the steady-state solution of the wave equation in cylindrical coordinates can be expressed using the Bessel Jn (· ) and Neumann Nn (· ) functions as
(Williams, 1999)
P (r, φ, z, ω) =
∞
1 X inφ
e
2π n=−∞
Z ∞
[Cn (kz , ω) Jn (kr r) + Dn (kz , ω) Nn (kr r)] eikz z dkz ,
(2.36)
−∞
where the relation between the Hankel functions of the first and second kind on one,
and Bessel and Neumann functions on the other hand, is given by
Hn(1) (x)
= Jn (x) + i Nn (x)
Hn(2) (x)
= Jn (x) − i Nn (x) .
The time-domain solution p(r, φ, z, t) of the wave equation in cylindrical coordinates is obtained by applying the inverse temporal Fourier transform to P (r, φ, z, ω)
Z ∞
1
p(r, φ, z, t) =
P (r, φ, z, ω) eiωt dω .
(2.37)
2π −∞
2.5.1
Boundary value problems
There are two boundary value problems that are of importance in acoustics, and they
are depicted in Figure 2.4. In the exterior boundary value problem in cylindrical
2.5 Cylindrical waves
15
Figure 2.4: Boundary value problem in cylindrical coordinates. (a) Interior boundary
value problem, where all the sources si are outside the cylindrical boundary surface
defined by r = b. (b) Exterior boundary value problem, where all the sources si are
inside the cylindrical boundary surface defined by r = a.
coordinates, all the sources are within an infinite cylindrical boundary defined by
r = a, and the volume of validity of the homogeneous wave equation is defined by
r > a. On the other hand, in the interior boundary value problem, all the sources
are located outside the cylindrical boundary surface defined by r = b, whose interior
is the cylindrical volume of validity of the homogeneous wave equation.
For the interior boundary value problem in cylindrical coordinates, the general
solution is of the standing-wave type. It is derived from (2.36), taking note of the fact
that it needs to be finite at the origin. The solution has the following form:
P (r, φ, z, ω) =
1
2π
∞
X
einφ
Z
∞
Cn (kz , ω) eikz z Jn (kr r) dkz ,
(2.38)
−∞
n=−∞
where function Cn (kz , ω) is determined by the boundary condition on the surface
r = b.
The general solution to the exterior value problem in cylindrical coordinates must
consist of outgoing waves only. It is thus derived from (2.35) by forcing to zero the
incoming-wave part, and takes the form
P (r, φ, z, ω) =
1
2π
∞
X
n=−∞
einφ
Z
∞
−∞
An (kz , ω) eikz z Hn(1) (kr r) dkz .
(2.39)
Specifying the boundary condition on the surface r = a determines the functions
An (kz , ω), which allow one to obtain the radiated sound field in the region defined by
r > a.
Acoustics Fundamentals
16
2.5.2
Helical wave spectrum
Consider the exterior boundary value problem on the infinite cylindrical boundary
defined by r = a. The steady-state sound pressure field on the boundary is given by
P (a, φ, z, ω) =
1
2π
∞
X
Z
einφ
∞
−∞
n=−∞
An (kz , ω) eikz z Hn(1) (kr a) dkz .
(2.40)
Taking the Fourier series expansion with respect to angle φ and the Fourier transform with respect to coordinate z of the steady-state sound pressure field P (r, φ, z, ω)
yields the following Fourier series/transform pair:
Pn (r, kz , ω)
=
P (r, φ, z, ω)
=
1
2π
1
2π
Z
2π
∞
Z
P (r, φ, z, ω) e−inφ e−ikz z dz dφ
0
−∞
∞
X
inφ
Z
(2.41)
∞
Pn (r, kz ) eikz z dkz .
e
(2.42)
−∞
n=−∞
By merely comparing (2.40) and (2.42), one can see that
Pn (a, kz , ω) = An (kz , ω) Hn(1) (kr a) .
(2.43)
The term Pn (r, kz , ω) defines the helical wave spectrum of a sound field in cylindrical
coordinates.
Furthermore, expressing An (kz , ω) from (2.43) and substituting it into (2.38) leads
to the following expression:
P (r, φ, z, ω) =
1
2π
∞
X
einφ
n=−∞
Z
(1)
∞
Pn (a, kz , ω)
−∞
Hn (kr r)
(1)
Hn (kr a)
eikz z dkz .
(2.44)
Again, a simple comparison of (2.44) with (2.42) reveals the relationship between
helical spectra on cylindrical surfaces with different radii,
(1)
Pn (r, kz , ω) =
2.5.3
Hn (kr r)
(1)
Hn (kr a)
Pn (a, kz , ω) .
(2.45)
Rayleigh’s integral
Using the Euler’s equation in cylindrical coordinates, and a definition of the helical
wave particle velocity spectrum analogous to (2.41), one can derive the following
relationship between the pressure Pn (r, kz , ω) and radial velocity spectra Ẇn (r, kz , ω)
(Williams, 1999):
Pn (r, kz , ω) =
iρ0 ck Hn (kr r)
Ẇn (a, kz , ω) ,
kr Hn0 (kr a)
(2.46)
where the superscript in the Hankel function of the first kind was dropped for notational simplicity.
2.5 Cylindrical waves
17
Rayleigh’s I integral formula is obtained by applying the inverse Fourier transform
to (2.46),
Z
∞
Hn (kr r) ikz z
iρ0 ck X inφ ∞
Ẇn (a, kz , ω)
P (r, φ, z, ω) =
e
e
dkz ,
(2.47)
0
2π n=−∞
k
r Hn (kr a)
−∞
where
1
Ẇn (a, kz , ω) =
2π
Z
2π
0
Z
∞
0
0
ẇ(a, φ0 , z 0 ) e−inφ e−ikz z dz 0 dφ0 .
(2.48)
−∞
In order to solve for the sound pressure distribution evoked by a particle velocity
distribution Ẇn (a, kz ) on the cylindrical surface r = a, one would need to numerically
solve the integral in (2.47). However, there is a method to approximately determine
the far-field sound pressure distribution which uses the so-called stationary phase
integral approximation (Williams, 1999)
P (r, φ, θ, ω) ≈
ρ0 c eikr
π r
∞
X
(−i)n
n=−∞
Ẇn (a, k cos θ, ω) inφ
e .
sin θ Hn0 (ka sin θ)
(2.49)
The far-field sound pressure decay indicated by (2.49) is not in accordance with
the far-field decay rate of helical waves, given by √1r . However, the derivation of
(2.49) hinges on the assumed smoothness of Ẇn (a, kz ), which is roughly equivalent to
treating the vibrating area of the cylinder as having finite extent, making the sound
pressure amplitude decay of 1r less of a surprise (Williams, 1999).
2.5.4
Piston in a cylindrical baffle
Figure 2.5: Geometry of a rectangular piston of length 2L and width 2αa on the
surface of an infinite cylindrical baffle of radius a.
Since the model of a vibrating piston mounted on the surface of an acoustically
rigid cylinder is used in Chapter 5, we give a short overview of how to obtain its
far-field radiation pattern.
Acoustics Fundamentals
18
The piston is modeled as a rectangle folded along the circumference of an infinite
cylindrical baffle, as shown in Figure 2.5. The cylindrical baffle has radius a, and the
piston has length 2L and width 2aα.
The velocity of the piston is denoted by b, while the velocity on the rest of the
cylindrical surface is zero due to the baffle’s rigidity.
Taking the Fourier series expansion with respect to angle φ and the Fourier transform with respect to spatial coordinate z of the radial velocity W (a, φ, z), one obtains
Z L
Z a
b
−inφ
eikz z dz ,
(2.50)
e
dφ
Ẇn (a, kz , ω) =
2π −a
−L
where it is arbitrarily—but without loss of generality—assumed that the piston is
centered at φ = 0.
The solution of (2.50) contains a product of two sinc functions
Ẇn (a, kz , ω) =
4bαL
sinc(nα) sinc(kz L) .
2π
(2.51)
Substituting (2.51) into the far-field approximation of Rayleigh’s I integral (2.49),
and using kz = k cos θ, one obtains the radiated sound pressure in the far field
(Williams, 1999):
P (r, θ, φ, ω) ≈
ρ0 c eikr
2π 2 r
N
X
(−i)n einφ
n=−N
4bαL sinc(nα) sinc(kz L)
.
sin θ Hn0 (ka sin θ)
(2.52)
A more detailed analysis of the radiation pattern in the audible frequency range
is given in Chapter 5. Here we show approximations of the radiation patterns at very
low and high frequencies, which are the two extreme cases in terms of the piston’s
ability to reproduce sound directionally.
Low-frequency radiation pattern
At very low frequencies, the radiation pattern of a piston in a rigid cylindrical baffle
can be analyzed using the small-argument behavior of Hn0 (x), given by (Williams,
1999)
n+1
in! 2
0
Hn (x) ∼
,
πn x
where
n =
1
2
n=0
.
n≥1
It turns out that when ka → 0, the term for n = 0 dominates the sum in (2.52),
giving an approximate low-frequency far-field radiation pattern (Williams, 1999)
P (r, θ, φ, ω) ≈ −
iρ0 ck eikr
Q
,
4π
r
(2.53)
where Q = 4αaLb defines the volume flow.
From (2.53), it is apparent that at low frequencies, the radiation pattern of a
vibrating piston in a rigid cylindrical baffle resembles that of a point source.
2.5 Cylindrical waves
19
High-frequency radiation pattern
The high-frequency radiation pattern of a piston in a cylindrical baffle can be analyzed by considering the large-argument behavior of Hn0 (ka sin θ), which has the form
(Williams, 1999)
r
2
Hn0 (ka sin θ) ∼ (−i)n
eiπ/4 eika sin θ .
(2.54)
πka sin θ
Using the previous approximation in (2.52) and marginalizing the effect of the
term sinc(nα), one gets the following approximation:
r
∞
X
πka −ika sin θ
eikr
e
sinc(kL cos θ)
(2.55)
einφ .
P (r, θ, φ, ω) ∼
r
2 sin θ
n=−∞
The last summation in (2.55) can be represented as the Fourier series expansion
of a stream of Diracs,
∞
X
einφ = 2π
n=−∞
∞
X
m=−∞
δ(φ − 2πm) .
Thus, the high-frequency radiation pattern of a piston in a cylindrical baffle collapses
to a Dirac function pointing in the azimuthal direction defined by the piston’s center,
indicating a single-direction radiation at high frequencies.
2.5.5
Scattering from rigid cylinders
The interaction of radiated sound with objects in the acoustic medium is characterized
by the changes to the sound field caused by the object, usually modeled through the
so-called scattered sound field. Analyzing sound scattering is of high importance in
underwater acoustics, but here it is analyzed for its importance in the microphone
array design problems, described in Chapter 4.
The usual way of analyzing scattering by an object is by considering a sound field
consisting of a single plane wave. To simplify the analysis a little, let the plane wave
with unit magnitude arrive from angle ϕ = 0, with wave fronts parallel to the z axis.
Due to the particular value of the wave vector k, with kz = 0, kx = k, and ky = 0,
the incoming wave field has the form
Pi (r, φ, z, ω) = eikr cos φ .
(2.56)
The incoming wave field admits the Jacobi-Anger expansion given by (Abramowitz
and Stegun, 1976)
∞
X
Pi (r, φ, z, ω) =
in Jn (kr) einφ .
(2.57)
n=0
The sound field Pt (r, φ, z, ω), that results from the scattering of the incoming wave
field from a rigid cylinder, can be represented as a sum of the incoming sound field
Pi (r, φ, z, ω), and the scattered sound field Ps (r, φ, z, ω),
Pt (r, φ, z, ω) = Pi (r, φ, z, ω) + Ps (r, φ, z, ω) .
(2.58)
Acoustics Fundamentals
20
For an infinite rigid cylindrical scatterer of radius a, the radial component of the
particle velocity vector needs to vanish on the cylindrical surface r = a, i.e.,
Ẇi (a, φ, z, ω) + Ẇs (a, φ, z, ω) = 0 .
(2.59)
Equivalently, using the Euler’s equation (2.3), the boundary condition (2.59) can
be expressed as
∂
(Pi (r, φ, z, ω) + Ps (r, φ, z, ω)) |r=a = 0 .
(2.60)
∂r
The scattered sound field Ps (r, φ, z, ω) can be modeled as a superposition of outgoing waves only, and can thus be represented using (2.39). After substituting in (2.60)
the two models for Pi (r, φ, z, ω) and Ps (r, φ, z, ω), (2.57) and (2.39), and solving, one
obtains the following expression describing the total sound field:
∞
X
Jn0 (ka)
Hn (kr) einφ .
Pt (r, φ, z, ω) =
i
Jn (kr) − 0
H
(ka)
n
n=−∞
2.6
n
(2.61)
Spherical waves
Figure 2.6: Relations between spherical and Cartesian coordinates.
Analyzing acoustic problems in spherical coordinate system is often advantageous,
as many finite-size sound radiators or scatterers can be well modeled with spheres.
Also, it can often be more practical to build sound measurement and reproduction
systems of spherical or approximately spherical shape.
The relationship between spherical and Cartesian coordinates, illustrated in Figure 2.6, is given by
x
= r sin θ cos φ
y
= r sin θ sin φ
z
= r cos θ .
The general form of Euler’s and Helmholtz equations are given by (2.3) and (2.5),
respectively. As before, their coordinate-system-dependence is related to the gradient
2.6 Spherical waves
21
and Laplace operators, respectively, whose forms in spherical coordinates are given
by (Morse and Ingard, 1968)
∇
=
∇2
=
∂
1 ∂
1
∂
er +
eθ +
eφ
∂r
r ∂θ
r sin θ ∂φ
1
∂
∂
1
1 ∂
∂2
2 ∂
,
r
+
sin
θ
+ 2
2
2
2
r ∂r
∂r
r sin θ ∂θ
∂θ
r sin θ ∂φ2
(2.62)
(2.63)
with er , eφ , and eθ being the unit vectors which point in the positive coordinate
directions.
The general steady-state solution of the wave equation is obtained through separation of variables, and has the form (Williams, 1999)
P (r, θ, φ, ω) =
∞ X
n
X
(2)
m
Amn (ω) h(1)
n (kr) + Bmn (ω) hn (kr) Yn (θ, φ) ,
(2.64)
n=0 m=−n
(2)
(1)
where hn (· ) and hn (· ) are the spherical Hankel functions of the first and second
kind, respectively. The former corresponds to the outgoing, while the latter to the
incoming spherical waves relative to the origin. The angular functions Ynm (θ, φ) are
called the spherical harmonics, and are defined as
s
(2n + 1) (n − m)! m
m
Yn (θ, φ) ,
P (cos θ) einφ ,
(2.65)
4π
(n + m)! n
where Pnm (cos θ) denote the associated Legendre functions (see Arfken et al., 1985).
Equivalently, one can express the standing-wave-type general steady-state solution of the wave equation in terms of the spherical Bessel and spherical Neumann
functions, jn (· ) and yn (· ), respectively. The solution of the standing-wave type has
the form (Williams, 1999)
P (r, θ, φ, ω) =
∞ X
n
X
(Cmn (ω) jn (kr) + Dmn (ω) yn (kr)) Ynm (θ, φ) .
(2.66)
n=0 m=−n
The expansion parameters Amn (ω), Bmn (ω), Cmn (ω), Dmn (ω), which in both
solution types depend on the specified boundary conditions, satisfy the following
relationship:
2.6.1
Cmn (ω)
=
Dmn (ω)
=
1
2 (Amn (ω)
i
2 (Amn (ω)
+ Bmn (ω))
− Bmn (ω)) .
Boundary value problems
The exterior boundary value problem is closely related to radiation from compact
bodies. It gives a solution to the sound pressure field from a source (or a number of
sources) enclosed by the sphere of a given radius a that is centered at the origin, as
shown in Figure 2.7(b). The solution of the exterior boundary value problem is of the
traveling-wave-type, given by (2.64), where the incoming wave part vanishes:
P (r, θ, φ, ω) =
∞ X
n
X
n=0 m=−n
m
Amn (ω) h(1)
n (kr) Yn (θ, φ) .
(2.67)
Acoustics Fundamentals
22
Figure 2.7: Boundary value problem in cylindrical coordinates. (a) Interior boundary
value problem, where all the sources si are outside the spherical boundary surface
defined by r = b. (b) Exterior boundary value problem, where all the sources si are
inside the spherical boundary surface defined by r = a.
It should be noted that the solution (2.67) is valid only for r ≥ a.
One important property of the sound field representation given by (2.67) is the
fact that knowing the coefficients Amn (ω) on a separable surface, such as the sphere
of radius a, gives the full description of the sound field outside the sphere of radius
r = a. For the sphere of radius r = a, the coefficients Amn (ω) can be determined
from the knowledge of the sound pressure field P (a, θ, φ, ω) on the sphere. Using the
orthonormality of the spherical harmonics, they are obtained through
Amn (ω) =
1
(1)
hn (ka)
Z
2π
0
Z
0
π
P (a, θ, φ, ω) Ynm (θ, φ)∗ sin θ dθ dφ .
(2.68)
In the interior boundary value problem, one is interested in the sound field formed
by sound sources outside the sphere of radius b, as illustrated in Figure 2.7(a). The
solution of the interior boundary value problem is of the standing-wave type, given
by (2.66). The solution needs to be finite at the origin, so only the terms containing
the spherical Bessel function jn (· ) can be non-zero. Thus, the solution is of the form
P (r, θ, φ, ω) =
∞ X
n
X
n=0 m=−n
Cmn (ω) jn (kr) Ynm (θ, φ) .
(2.69)
2.6 Spherical waves
2.6.2
23
Spherical wave spectrum
Sound pressure spherical wave spectrum
Spherical wave spectrum of the sound pressure denotes the spherical harmonics expansion of a sound field on a sphere of a given radius r. It is given by
Pmn (r, ω) =
2π
Z
0
Z
0
π
P (r, θ, φ, ω) Ynm (θ, φ)∗ sin θ dθ dφ .
(2.70)
The inverse spherical harmonic transform, which gives the sound pressure field
from the sound pressure spherical wave spectrum, is given by
P (r, θ, φ, ω) =
∞ X
n
X
Pmn (r, ω) Ynm (θ, φ) .
(2.71)
n=0 m=−n
Also, the spherical wave spectra at two radii, r0 and r, are related by
(1)
Pmn (r, ω) =
hn (kr)
(1)
hn (kr0 )
Pmn (r0 , ω) .
(2.72)
Velocity spherical wave spectrum
One can define the spherical wave spectrum of the particle velocity vector in the
same way the sound pressure spherical wave spectrum was defined in (2.70). The
relationship between the spherical wave spectra of the sound pressure P (r, θ, φ, ω) and
the radial component of the particle velocity vector Ẇ (r, θ, φ, ω) is given by (Williams,
1999)
1 h0n (kr)
Ẇmn (r, ω) =
Pmn (r0 , ω) .
(2.73)
iρ0 c hn (kr0 )
(1)
In (2.73), hn (· ) was used instead of hn (· ) for notational simplicity.
Axisymmetric source
An axisymmetric source on a sphere is a useful concept that is later used for analyzing the radiation pattern of a vibrating piston mounted on a spherical body. The
axisymmetric source is modeled with an azimuth-independent surface velocity
Ẇ (a, θ, φ, ω) = Ẇ (a, θ, ω) .
(2.74)
In terms of the spherical wave spectrum, this makes the coefficients Amn (ω) in
(2.67) vanish for m 6= 0. Using the definition of the spherical harmonics given by
(2.65), and the relationship between the spherical wave spectra of sound pressure and
velocity, given by (2.73), the radiated sound pressure can be expressed by
Z π
∞
X
2n + 1 hn (kr)
P (r, θ, ω) = iρ0 c
P (cos θ)
Ẇ (a, θ0 , ω) Pn (cos θ0 ) sin θ0 dθ0 .
0 (ka) n
2
h
0
n
n=0
(2.75)
Acoustics Fundamentals
24
Extracting the part which corresponds to the expansion of the surface velocity
ẇ(θ, a) in the Legendre polynomial basis, given by
Ẇn (a, ω) =
2n + 1
2
Z
π
Ẇ (a, θ0 , ω) Pn (cos θ0 ) sin θ0 dθ0 ,
(2.76)
0
and using the orthogonality of Legendre polynomials, one obtains
P (r, θ, ω) = iρ0 c
∞
X
Ẇn (a, ω)
n=0
hn (kr)
Pn (cos θ) .
h0n (ka)
(2.77)
Circular piston in a spherical baffle
Figure 2.8: Geometry of a circular piston of radius rp = aα on the surface of a
spherical baffle of radius a.
A circular piston mounted on the pole of a rigid sphere, shown in Figure 2.8, is
an example of an axisymmetric source that can model a compact piston loudspeaker.
Denoting by b the velocity of the piston, and noting that the baffle is acoustically
rigid, the velocity distribution on the surface of the sphere of radius r = a is given by
Ẇ (a, θ, ω) =
b
0
0≤θ≤α
.
α<θ≤π
(2.78)
Computing the Legendre polynomial expansion of Ẇ (a, θ, ω) using (2.76) and the
recurrence formula for the Legendre polynomials (Arfken et al., 1985)
(2n + 1)Pn (x) =
dPn−1
dPn+1
−
,
dx
dx
(2.79)
and substituting it into (2.77), results in the radiated pressure field (Williams, 1999)
P (r, θ, ω) =
∞
iρ0 cb X
hn (kr)
[Pn−1 (cos α) − Pn+1 (cos α)] 0
Pn (cos θ) .
2 n=0
hn (ka)
(2.80)
2.7 Room acoustics
2.6.3
25
Scattering from rigid spheres
The procedure used for obtaining the sound field formed by scattering from an infinite
rigid cylinder can be followed for the case of a rigid spherical scatterer.
Let the plane wave arrive from the direction defined by angles (ϑ, ϕ). The incoming
wave field admits the spherical harmonics expansion of the form (Williams, 1999)
Pi (r, θ, φ, ω) = 4π
∞
X
n=0
in jn (kr)
n
X
Ynm (θ, φ) Ynm (ϑ, ϕ)∗ .
(2.81)
m=−n
The total sound field Pt (r, θ, φ, ω) is a sum of the incoming sound field Pi (r, θ, φ, ω),
and the scattered sound field Ps (r, θ, φ, ω):
Pt (r, θ, φ, ω) = Pi (r, θ, φ, ω) + Ps (r, θ, φ, ω) .
(2.82)
Let a be the radius of the spherical scatterer. Due to the nature of the boundary,
the radial component of the particle velocity vector needs to vanish on the spherical
surface r = a, i.e.,
Ẇi (a, θ, φ, ω) + Ẇs (a, θ, φ, ω) = 0 .
(2.83)
Equivalently, using Euler’s equation (2.3), the boundary condition (2.83) can be
expressed as
∂
(Pi (r, θ, φ, ω) + Ps (r, θ, φ, ω)) |r=a = 0 .
(2.84)
∂r
Due to its nature, the scattered sound field Ps (r, θ, φ, ω) can be modeled as a
superposition of outgoing waves only. It can thus be represented using (2.67). After
substituting the two models for Pi (r, θ, φ, ω) and Ps (r, θ, φ, ω), (2.81) and (2.67), and
solving (2.84), one obtains the following expression describing the total sound field:
Pt (r, θ, φ, ω) = 4π
2.7
∞
X
X
n
j 0 (ka)
Ynm (θ, φ) Ynm (ϑ, ϕ)∗ .
in jn (kr) − n0
hn (kr)
h
(ka)
n
m=−n
n=0
(2.85)
Room acoustics
Sound propagation in rooms is a result of the interaction between the active sound
sources, room geometry, and the properties of walls, floor, ceilings, and any other
objects that occupy the room. In general, this interaction is difficult to model precisely, and even more difficult to solve analytically. However, there are models of
sound propagation in rooms which possess acceptable accuracy in targeted frequency
ranges.
In general, one can roughly identify three frequency ranges with different characteristic behavior of room acoustics:
• At low frequencies, which spread roughly up to the Schroeder frequency fS
(defined later), the sound propagation in rooms is best described by the wave
theory of room acoustics.
Acoustics Fundamentals
26
• At medium frequencies, above the Schroeder frequency fS and up to the high
frequencies where room dimensions are much larger than the sound wavelength
(roughly 4fS ), the statistical model of room acoustics is commonly used.
• At very high frequencies, where sound wavelengths are vanishingly small compared to the room dimensions, the model of ray acoustics or geometrical acoustics is the most appropriate.
2.7.1
Wave theory of room acoustics
Sound propagation in rooms is characterized by standing wave sound motion. The
analysis of a sound field in a room in terms of the room’s normal modes is most
appropriate when the sound wavelengths are of the order of magnitude of the room’s
dimensions.
The homogeneous Helmholtz equation
The starting point when analyzing the modal behavior of a room is the homogeneous
Helmholtz equation, given by
∇2 Pω (r) + k 2 Pω (r) = 0 ,
(2.86)
with k = ωc .
Additionally, the solution of the acoustic wave equation in a room needs to satisfy
the boundary conditions on the wall surfaces. These are characterized by the wall
impedance, defined by
P (rs , ω)
Z(rs , ω) =
,
(2.87)
Ẇn (rs , ω)
where Ẇn (rs , ω) is the outward-pointing normal component of the particle velocity
vector in the surface point rs .
Using the definition of the wall impedance, given by (2.87), and Euler’s equation
(2.3), the boundary condition can be expressed as
Z(rs , ω)
∂
P (rs , ω) + iωρ0 P (rs , ω) = 0 ,
∂n(rs )
(2.88)
∂
where ∂n(r
P (rs , ω) is the directional derivative of P (rs , ω) along the outwards)
pointing surface normal in the point rs .
Green’s function of a room
It has been shown (e.g., see Morse and Ingard, 1968) that satisfying both the homogeneous Helmholtz equation (2.86) and the wall boundary conditions (2.88) is possible
only for a discrete set of values of k, denoted by Kn and called the room eigenvalues. Each eigenvalue Kn yields a solution Ψn (r, ω) called the room eigenfunction or
normal mode. These eigenfunctions are orthogonal, and they satisfy
ZZZ
V Λn
for n = m
∗
Ψm (r, ω) Ψn (r, ω) dV =
,
(2.89)
0
for n 6= m
V
2.7 Room acoustics
27
where the integration is done over the entire volume V of the room.
By knowing the eigenfunctions of a room, one can express Green’s function of a
room, which is the solution of the inhomogeneous Helmholtz equation (2.7), as a series
of normal modes
X Ψn (r, ω) Ψ∗ (r 0 , ω)
n
Gω (r|r 0 ) =
.
(2.90)
2 − k2 )
V
Λ
(K
n
n
n
In general, the eigenvalues Kn of a room are complex quantities, which can be
expressed in the form
Kn = ωcn + i δcn ,
(2.91)
where δn represents a damping constant. Replacing (2.91) into (2.90) and assuming
δn ωn , one obtains (Morse and Ingard, 1968; Kuttruff, 2000)
Gω (r|r 0 ) = c2
X
n
Ψn (r, ω) Ψ∗n (r 0 , ω)
.
V Λn (ω 2 − ωn2 − 2iδn ωn )
(2.92)
ω
From (2.90), one can see that as the frequency f = 2π
approaches the nth resonant
ωn
frequency fn = 2π , the sound field in a room is dominated by the nth resonant mode,
whose amplitude is proportional to the damping constant δn .
Room impulse response
The time-dependent Green’s function of a room g(r, t|r 0 , t0 ), also known as the room
impulse response (RIR) or acoustic transfer function from r 0 to r, is obtained by
applying the inverse Fourier transform to (2.92).
The RIR is often expressed more succinctly with gr|r0 (t), where the initial time
t0 implicitly takes the value of zero. More often than not, the locations of the source
and destination, r 0 and r, respectively, are known from the context, and the RIR is
denoted simply by g(t).
The RIR between source and destination points can be viewed as an acoustics
channel. If a point source located at r 0 emits the signal s(t), the signal observed in
the point r is given by
Z ∞
sr (t) =
s(τ ) g(t − τ ) dτ ,
(2.93)
−∞
which follows from (2.15).
Normal modes in a perfectly rigid rectangular room
The preceding analysis of the sound motion and the general solution of the wave
equation in rooms give an insight into the resonant, standing-wave nature of the
sound in rooms. Any additional details of the room acoustics might be blurred by the
difficulty to precisely model or measure the reflective properties of walls and the room
geometry, which are needed in order to obtain its exact eigenfunction-based Green’s
function.
Additional knowledge about wave phenomena in rooms can be obtained by analyzing a simple, rectangular room model, where the walls are perfectly rigid; this
Acoustics Fundamentals
28
Figure 2.9: Rectangular room of dimensions (Lx , Ly , Lz ).
particular case provides a solution in a closed form. Even though rooms in practice are not perfectly rectangular, and even less with perfectly reflecting walls, the
properties of rooms met in practice follow the same trends.
The model of a rectangular room of size (Lx , Ly , Lz ) is shown in Figure 2.9. The
eigenfunctions that satisfy both the homogeneous Helmholtz equation (2.86) and rigid
wall boundary conditions
∂
P (rs , ω) = 0
(2.94)
∂n(rs )
are obtained by separation of variables and have the form
Ψm (r, ω) = cos(kx,mx x) cos(ky,my y) cos(kz,mz z) .
(2.95)
The three-dimensional index m = (mx , my , mz ) has integer components, and the
eigenvalues Km of the wave equation have a three-dimensional form
T
T
mx
my
mz
Km = kx,mx ky,my kz,mz = π
π
π
.
Lx
Ly
Lz
(2.96)
The eigenfrequencies fm of the room are real-valued, and are related to the room
eigenvalues Km by
s
2 2 2
c
my
mz
c
mx
kKm k =
+
+
.
(2.97)
fm =
2π
2
Lx
Ly
Lz
Room modal density
One parameter of particular interest for the analysis of the steady-state solution of
the wave equation in rooms is the number of room modes as a function of frequency
Nm (f ). This quantity is determined by noting that the room eigenfrequencies, given
by (2.97), form a 3D lattice generated by the basis
c
c
c
F =
2Lx 2Ly 2Lz
T
.
(2.98)
2.7 Room acoustics
29
The number of positive eigenfrequencies below the frequency f equals the number
of grid points in the first octant which are inside the sphere of radius f centered at
the origin. With a small error for not correctly accounting for eigenfrequencies on
coordinate surfaces and axes, this number can be approximated with
Nm (f ) ≈
4π f 3
V 3 .
3
c
(2.99)
Additionally, it is interesting to know the modal density on the frequency axis,
denoting the number of modes per unit frequency. The modal density is obtained by
differentiating (2.99), and it is given by
dNm (f )
f2
= 4π 3 .
df
c
(2.100)
Steady-state room acoustics
So far, it was shown that the steady-state solution of the wave equation in a room is a
superposition of infinitely many room modes. Each mode Ψn (r, ω) becomes dominant
as the frequency ω approaches the real part ωn of the corresponding room eigenvalue
Kn , pushing the room frequency response to a local maximum. Under the assumption
of a small damping constant δn ωn , each modal term in (2.92) can be approximated
by a transfer function of a resonant system, yielding (Kuttruff, 2000)
Gω (r|r 0 ) = c2
X
n
Ψn (r, ω) Ψ∗n (r 0 , ω)
.
V Λn (ω 2 − ωn2 − 2iδn ω)
(2.101)
In words, the steady-state solution to the room wave equation can be viewed as a
combination of resonances, each resonance having a 3 dB half-width5 of (Kuttruff,
2000)
δn
∆fn =
.
(2.102)
π
The damping constants δn in most rooms are in the range [1, 20] s−1 . Hence, the
half-width of the room resonances are of the order of 1 Hz.
If one contrasts the half-width of resonances with the modal density given by
(2.100), or its inverse, representing the average spacing between two resonances, the
following can be observed:
• At low frequencies, resonances are well separated, with distances between successive resonant frequencies exceeding their half-bandwidth.
• As the frequency increases, the half-width of single room resonances becomes
larger than the average distance between successive resonances, causing resonances to overlap. The resulting steady-state acoustic behavior of a room is
characterized by a strong interaction of multiple, densely-spaced resonances.
5 As the name suggests, the 3 dB half-width of a resonance is a half of the width of the frequency
range where its main lobe is above 50% of its maximum power.
Acoustics Fundamentals
30
In order to distinguish between the two frequency regions—the low-frequency region with clear separation of room resonances and the high-frequency region characterized by the interaction of densely-spaced room resonances, Schroeder defined a
limiting frequency fs , later named the Schroeder frequency, as the frequency where
on average three modal frequencies fall within a single resonance half-width. The
Schroeder frequency is given by (Kuttruff, 2000)
r
RT60
5500
Hz ≈ 2000
fs ≈ p
Hz ,
(2.103)
V
V hδn i
where hδn i is the average value of the damping constant δn , V is the room’s volume,
6.91
is the so-called reverberation time of the room.
and RT60 = hδ
ni
In large halls, the Schroeder frequency is usually below 50 Hz (Kuttruff, 2000),
rendering the wave theory approach practically useless. On the other hand, in a
small living room of dimensions 6 × 5 × 2.5 and reverberation time RT60 = 0.5 s, the
Schroeder frequency is around 163 Hz. According to (2.99), the same room has around
32 resonant frequencies below Schroeder frequency, which dominate its low-frequency
behavior.
2.7.2
Statistical room acoustics
At any frequency higher than the Schroeder frequency, the sound pressure field in a
room is not affected by one dominant room mode; instead, it is influenced by many
overlapping modal resonances.
Since every mode of a room results from a different interaction of sound waves
with the room, it can be assumed that modes are independent and with random
phases at any frequency. Consequently, the sound pressure field at any point in the
room has real and imaginary parts that can be modeled as centered Gaussian random
variables, and a magnitude which follows Rayleigh’s distribution. Furthermore, the
frequency characteristic of the sound field Pr (ω) at any point r in a room can be
modeled as a Gaussian random process for ω ≥ 2πfS . The same can be said for the
spatial dependence of the sound pressure field Pω (r) in the said frequency range.
If in addition to modal overlap, one assumes that no active sound sources or room
boundaries are close to points of observation and the energy density flow does not
exhibit a preferred direction, then one talks about a diffuse sound field. In a diffuse
sound field, the sound propagation at any point is purely isotropic.
Spatial coherence function
One characteristic of a diffuse complex sound field is its spatial coherence function
Φω (r), which represents the correlation coefficient of the sound pressure between two
points separated by the vector r. The spatial coherence function is given by (Cook
et al., 1955)
E [P (r 0 , ω) P ∗ (r 0 + r, ω)]
sin(k krk)
=
,
(2.104)
Φω (r) =
E [|P (r 0 , ω)|2 ]
k krk
where E[· ] denotes the expectation operator.
Considering the spatial coherence function of a diffuse sound field, one can see
that its absolute value drops off as a function of distance r. Being Gaussian, the
2.7 Room acoustics
31
real and imaginary parts of the sound field in two points at a given frequency ω
become roughly independent when the value of the absolute spatial coherence drops
below 0.2. The corresponding critical distance, called the reach of causality, is given
by (Kuttruff, 2000)
λ
rcorr ≈ .
(2.105)
π
Furthermore, one can analyze the magnitude of the sound pressure field for a
given frequency ω as a function of the spatial location, Pω (r), and observe a random
pattern characterized by an interchange of maxima and minima. The average distance
of adjacent maxima of a diffuse sound field is given by (Kuttruff, 2000)
hrmax i ≈ 0.79 λ .
(2.106)
Frequency coherence function
Similarly to the spatial coherence function of a diffuse sound field, one can define its
frequency coherence function Φr (ω) as the correlation coefficient of the sound pressure
in a point r between two frequencies separated by ∆ω (Kuttruff, 2000):
Φr (∆ω) =
hδn i2
E [P (r, ω) P ∗ (r, ω + ∆ω)]
,
=
E [|P (r, ω)|2 ]
hδn i2 + (π∆ω)2
(2.107)
where hδn i is the average damping constant. It should be noted that the frequency
coherence function Φr (ω) is independent of location, as long as the sound field is
diffuse.
From the definition of the frequency coherence function, one can derive the critical
frequency shift, defined as the frequency difference ∆fcorr for which the sound pressure
field at frequencies f and f + ∆fcorr become approximately independent. The critical
frequency shift is given by (Kuttruff, 2000)
∆fcorr ≈ 0.64 hδn i .
2.7.3
(2.108)
Geometrical acoustics
It was already mentioned that above the Schroeder frequency, defined by (2.103),
the density of modes become very high so that the modal description of room acoustics becomes of little use. At high frequencies, where acoustics wavelength becomes
negligible relative to the room dimensions, one can apply the geometrical, or wave
acoustics, similar to the methods used in geometrical optics.
The main concept of geometrical acoustics is the abstraction of a sound ray, which
represents a portion of a spherical wave that can be visualized as a beam of vanishing aperture. The intensity of sound carried by a sound ray follows a ∼ 1r decay
characteristic of a spherical wave.
Geometrical acoustics makes additional simplifications related to the interaction
of sound rays and walls, and also the mutual interaction of sound rays. In particular,
diffraction effects are neglected, and sound rays propagate in straight line segments.
Also, the interference between sound rays is not considered, making their superposition purely additive in intensity or energy sense (Kuttruff, 2000).
Acoustics Fundamentals
32
The concept of specular reflections, used in geometrical acoustics, can be modeled
by an abstraction of the so-called image sources. Although the concept of image
sources is of limited use in the general sense, the assumptions of geometrical acoustics,
stated above, make it sufficiently accurate in practice (Kuttruff, 2000).
Figure 2.10:
(Lx , Ly , Lz ).
Image source model in a 3D rectangular room of dimensions
Similarly to the analysis of modal density, one can analyze the temporal distribution of sound reflections modeled with image sources using the model of a rectangular
room, shown in Figure 2.9. For a point source placed at the location rs inside the
room, the image sources can be obtained by recursively mirroring the sources around
the room walls, starting from the original source position rs . The set of image source
points in space forms a regular pattern that can be described by a 3D lattice, shown
in Figure 2.10. It should be noted that each image source can account for directivity
of the original sound source, as well as the energy losses due to absorption coefficients
of the walls.
Each sound ray reaching the destination after a number of reflections can be seen
as coming from one of the image sources. It is characterized by its delay, energy,
and direction. If absorption or reflection characteristics of walls are independent of
frequency, then according to the previously described geometrical acoustics model,
the room impulse response is given by
X
An δ(t − tn ) ,
(2.109)
g(t) =
n
2.7 Room acoustics
33
where An is an amplitude factor accounting for energy losses, and tn is the delay of
the nth sound ray.
A temporal and energy distribution of a room impulse response is commonly
depicted using a reflection diagram or echogram, such as the one in Figure 2.11.
Figure 2.11: Reflection diagram of a room impulse response, with a distinct direct
sound and two reflection types: early reflections, up to 80 ms after the direct sound,
and densely grouped reflections called the room reverberation that follow the early
reflections.
Based on a reflection diagram, one can distinguish three parts of a room impulse
response:
• Direct sound, that represents the sound reaching the observation point from the
source without going through any interaction with the room boundaries.
• Early reflections, which denote the first few, sparsely-spaced strong reflections,
usually from the closest image sources and within 80 ms after the direct sound.
The early reflections have a great influence on sound spaciousness in rooms.
• Reverberation, which denotes the late part of the room response, characterized
by a dense distribution of low-energy reflections. The late reflections determine
the subjective characteristic known as the listener envelopment (Toole, 2008).
Reflection density
The average temporal density of reflections in a room can be obtained in the same
way the frequency density of room modes was obtained. By using the image source
model, shown in Figure 2.10, one can estimate the number of reflections up to the
time instant t by counting the number of image sources in the sphere of radius r = ct:
4π c3 t3
.
(2.110)
3 V
Taking the temporal derivative of (2.110), one arrives to the average temporal density
of reflections, given by
c3 t2
dNr (t)
= 4π
.
(2.111)
dt
V
Nr (t) =
Acoustics Fundamentals
34
2.7.4
Reverberation time
The early works on architectural acoustics by Sabine (Kuttruff, 2000) lead to the
definition of the reverberation time, which is an important measure of evolution of
sound energy in rooms. The reverberation time quantifies the rate of energy decay in
a room due to air absorption and wall reflections losses.
In order to derive the reverberation time, assume that due to the air absorption,
sound energy density decays exponentially with distance r according to e−mr , where
m is an attenuation constant. Furthermore, let each reflection from a wall decrease
the sound intensity carried by a sound wave by 1 − α, where α is the absorption
coefficient of the wall. The evolution of sound energy density is described by
w(t) = w0 e−mct (1 − α)n̄t ,
(2.112)
where n̄ is the average number of reflections per unit of time. In case of diffuse sound
propagation in a room, the average number of reflection per time can be approximated
by (Kuttruff, 2000)
cS
(2.113)
n̄ ≈
4V
where S is the total area of the walls.
Computing the reverberation time as the time it takes the energy density w(t) to
drop by 60 dB of its initial value w0 , one obtains the Eyring’s reverberation formula,
given by (Kuttruff, 2000)
RT60 = 0.161
V
.
4mV − S ln(1 − α)
(2.114)
The Sabine’s reverberation formula is similar to the Eyring’s reverberation formula
given in (2.114), and it has the form (Kuttruff, 2000)
V
,
(2.115)
4mV + S ᾱ
where ᾱ is the average absorption coefficient.
The reverberation time, as observed by Sabine, depends on the volume of a room
and the absorption properties of air and the room walls. Since both the air and
walls are better absorbers at higher frequencies, the reverberation time is frequency
dependent and tends to be longer at low frequencies. This is particularly noticeable
with low-frequency room modes, where the reverberation time RT60 takes on values
much larger than the average reverberation time taken over all frequencies.
The reverberation time can be measured by exciting the room with a stationary
white noise signal from a single source, and abruptly switching it off. The time it
takes for the measured energy to decay to −60 dB of the its steady-state value defines
the reverberation time RT60 . The energy decay can be described by the energy decay
curve EDC(t) (Kuttruff, 2000), which can be either measured or computed from the
room impulse response g(t) using the following formula:
!
R∞ 2
g (τ ) dτ
t
.
(2.116)
EDC(t) = 10 log10 R ∞ 2
g (τ ) dτ
0
RT60 = 0.161
Figure 2.12 illustrates the energy density curve and the definition of the reverberation time.
2.7 Room acoustics
35
Figure 2.12: Energy density curve in a room and the definition of the reverberation
time RT60 as the time it takes the energy density to decrease by 60 dB after an abrupt
interruption of the sound source.
2.7.5
Critical distance
It was already mentioned that the sound field in a room can be decomposed into the
direct and reverberant component, Pd (r, ω) and Pr (r, ω), respectively.
The direct sound conveys all the properties of free-field sound propagation from a
point source. The energy density Wd (r, ω) of the direct sound field is related to the
distance d between the source rs and the observation point r. Assuming the source
emits sound waves with constant power P (ω), the energy density is given by (Morse
and Ingard, 1968; Kuttruff, 2000)
Wd (d, ω) =
P (ω)
.
4πcd2
(2.117)
One can also account for a possible directivity of the source, denoted by the gain D,
giving the energy density
P (ω)D
Wd (d, ω) =
.
(2.118)
4πcd2
Note that for an omnidirectional point source, D = 1.
The reverberant sound is commonly modeled as perfectly diffuse. The energy
density of a diffuse sound is obtained from the law of sound decay in a room, and for
a source radiating at constant power P (ω), it is given by (Kuttruff, 2000)
Wr (d, ω) =
4 P (ω)
cA
(2.119)
where A is the so-called equivalent absorption area of the room. Note that the energy
density in a diffuse sound field is isotropic.
The critical distance or diffuse-field distance defines the distance from the source
at which the energy densities of the direct and reverberant sound fields become equal,
Acoustics Fundamentals
36
Figure 2.13: Distance-dependence of energy densities of direct and reverberant sound
fields, Wd (d) and Wr (d), respectively, and the critical distance dc where the two are
equal.
as illustrated in Figure 2.13. It is obtained by combining (2.118) and (2.119), and is
given by (Kuttruff, 2000)
r
dc =
DA
[m] ≈ 0.1
16 π
r
DV
[m] .
π RT60
(2.120)
The critical distance defines the reach of the direct sound, as up to the distance dc
from the source, the sound field is dominated by the direct sound component.
It should be mentioned that the above expression for the critical distance has its
limitations, since it is developed using the reverberation model that uses low wall
absorption coefficients. In practice, this is often not the case, and (2.120) often
underestimates the true critical distance value.
In rooms where speech communication takes place, such as classrooms, office
spaces, lecture halls etc., the speech intelligibility is highly dependent on the distance from the source. Investigations have shown that speech intelligibility decreases
as the receiver moves away from the source, and that starting at the critical distance
and moving further, it remains roughly constant (Peutz, 1971).
2.8
Acoustic beamforming
In this section, we present the fundamentals of spatial filtering or beamforming (e.g.,
see Van Veen and Buckley, 1988), which is used throughout Chapters 3, 4, and 5.
In essence, beamforming is a signal processing technique used to control the spatial
aspect of wave radiation or acquisition performed by an array of transducers.6 Thus,
a beamforming problem has the following three components:
6 In
acoustic applications, transducers can either be microphones or loudspeakers.
2.8 Acoustic beamforming
37
• A medium where the physical field of interest propagates, which usually refers
to an acoustic or electromagnetic channel. The field propagation in the medium
is characterized by Green’s function Gω (r|r 0 ).
• An array of transducers used for acquiring or generating the physical field.
• A transducer signal pre- or post-processor used for controlling the spatial aspects
of physical field generation or acquisition, respectively.
The system comprising the last two blocks, i.e., the array of transducers and the
attached signal processor, is denoted as a beamformer.
Intrinsically related to beamforming is a property of a transducer (or beamformer)
called the directional response, which quantifies the combined effect of the transducer
(or beamformer) and the propagation medium.
Assuming a transducer is centered at the origin and its axis aligned with the xaxis, the directional response d(r, ω) captures the propagation characteristic from the
point r all the way to the transducer’s output. Hence, it combines the propagation
characteristic of the medium, captured by Green’s function, and the spatial properties
of the transduction process.
Note that one is usually interested in far-field characteristics, where the source
is located in the infinity, r = ∞, and effectively radiates plane waves. In the farfield case, the directional response is represented more simply by d(θ, φ, ω), and if
one is interested in the directional response only in the xy-plane, it takes the form
d(φ, ω). As an example, an ideal pressure microphone located at the origin has the
directional response d(θ, φ, ω) = 1, while a dipole microphone located at the origin
has the directional response d(θ, φ, ω) = sin θ cos φ.
When designing a beamformer, one usually defines a desired directional response
d(r, ω) on an enclosing control circle or sphere, depending on the problem geometry.
The circle or sphere can be of infinite radius, in case of plane-wave incidence, or a
finite radius rs , if point sources are considered.7 Unless stated otherwise, the desired
directional response is frequency-independent, and is denoted by d(θ, φ) in the far-field
case and by d(r) in the near-field case.
Consider a transducer array with M transducers positioned at rm and characterized by their directional responses dm (r, ω), m = 1, . . . , M . Note that, due to
transducers’ translations and rotations, the response from each transducer to the
control surface, Gm (r, ω), gets modified. As an example, consider a beamformer
design in the xy-plane, and a transducer located at rm = (rm , φm ) with a far-field directional response dm (φ). The “propagation-transduction” channel Gm (ϕ, ω) between
the direction ϕ and the output of transducer m then takes the form
Gm (ϕ, ω) = dm (ϕ − φm , ω) e−ikrm cos(ϕ−φm ) ,
(2.121)
with k = ωc .
An acoustic beamformer design is defined as a problem of finding a linear combination of transducers’ responses Gm (r, ω) that best approximates the desired directional
response d(r, ω) on the control surface at every frequency ω. In optimization theory
7 The finitely-distant source does not have to be a point source, but one needs to be able to obtain
its response on a circle or sphere enclosing the designed array.
Acoustics Fundamentals
38
parlance, this is a semi-infinite programming problem, which is difficult to solve.
Hence, it is usually discretized by considering a finite number Nf of frequencies ωn
below the sampling frequency, and a finite number N of control points on the control
surface.
2.8.1
Beamformer filter design
(a)
(b)
Figure 2.14: Illustration of microphone (a) and loudspeaker (b) array beamformer
design problems.
The two beamforming applications dealt with in this thesis are designs of microphone and loudspeaker arrays, illustrated in Figure 2.14. Leaving out the information about the radius in case of finitely-distant control points, let (ϑn , ϕn ) denote the
spherical angular coordinates of control point n. Furthermore, denote by Gnm (ω) the
channel between the control point (or direction) n and the output of transducer m,
i.e.
Gnm (ω) = Gm (ϑn , ϕn , ω) .
2.8 Acoustic beamforming
39
The process of obtaining the set of channels Gnm (ω) for transducer m can be
viewed as a discretization or sampling of its modified directional response.8 Its discretized version is then given by the vector
Vm (ω) = [G1m (ω) G2m (ω) . . . GN m (ω)]T .
(2.122)
In a similar manner, an acoustic beamformer filter design uses a discretized version
of the desired directional response
D(ω) = [d(θ1 , φ1 ) d(θ2 , φ2 ) . . . d(θN , φN )]T .
(2.123)
Denote by Xm (ω) the input signal of transducer m, and let signals from all transducers be aggregated into a vector
X(ω) = [X1 (ω) X2 (ω) . . . XM (ω)]T .
(2.124)
The task of the transducer array is to produce transducer pre- or post-filters
H(ω) = [H1 (ω) H2 (ω) . . . HM (ω)]T ,
(2.125)
so that the “virtual transducer”, whose signal is given by
Y (ω) = X T (ω) H(ω) ,
(2.126)
has a response that approximates the desired one in the control points.
The frequency response of the array from control point n is defined by the following
linear combination of frequency responses from control point n to the outputs of all
the transducers:
Yn (ω) = [Gn1 (ω) Gn2 (ω) . . . GnM (ω)]T H(ω) .
(2.127)
Thus, the directional response of the array discretized in the control points is given
by
Y (ω)
=
[Y1 (ω) Y2 (ω) . . . YN (ω)]T
= G(ω) H(ω) ,
(2.128)
where G(ω) is the N × M matrix containing frequency responses of the acoustics
channels between the control points and transducers.
Given the description of the system setup, the task of designing transducer array
filters can be formulated as a constrained optimization problem. The objective function is commonly an error norm between the desired and obtained frequency responses
in the control points. It can involve all of the control points, or a subset thereof—such
as when the main lobe of the directional response is of high importance, while the
side lobes are controlled by constraints.
The constraints can be manifold. They can relate to arrays’ response Y (ω) or the
filter gains H(ω).
8 When responses are obtained from points at a finite distance, the transducer’s near-field directional response gets discretized.
Acoustics Fundamentals
40
The directional response constraints can be of the equality type, where the directional response at a set of control points needs to exactly match the desired one, be
it only in magnitude or both in phase and magnitude. One can also utilize inequality
constraints for the directional response. For instance, the strength of the response
in certain control points can be limited in order to achieve the so-called side-lobe
suppression.
In the transducer design problems we present here, the focus is on achieving a desired response at all control points. Therefore, the directional response constraints are
not used. However, we will use a constraint on the directional response in Chapter 5
that gives guarantees on the response in the beamformer’s look direction.
Filter gains are usually constrained by limiting their l2 -norm, which is related to
the so-called white noise gain (Cox et al., 1987). In addition, one can limit the gain
of each filter separately. The purpose of such constraints is limiting the beamformer’s
sensitivity to random errors that can come from transducers’ self noise, miscalibration,
placement errors, or computational noise.
Complex-gain optimized transducer arrays
Let the desired response d(θ, φ) be specified both in terms of amplitude and phase,
or as a vector of possibly complex gains. As the notation suggests, it is assumed that
the desired response is frequency-independent. Denote by
D = [d(θ1 , φ1 ) d(θ2 , φ2 ) . . . d(θN , φN )]T
the value of the desired directional response in N control points or directions.
Assume that the maximum allowed l2 -norm of the filter vector H(ω) is Hmax
at all frequencies. The transducer array design problem can then be stated as the
following frequency-domain optimization problem:
minimize
subject to
kG(ω) H(ω) − Dkx
kH(ω)k2 ≤ Hmax ,
(2.129)
where x ∈ {2, ∞} is the minimized error norm (i.e., Euclidean or min-max). As previously mentioned, this semi-infinite optimization problem is solved at Nf frequencies
ωn .
To obtain discrete-time finite impulse response (FIR) filters hi [n] of length Nh ,
one can use one of the following two options:
1. Separate frequency optimization
Solve (2.129) for a uniform grid of Nf = N2h + 1 normalized frequencies9 in the
range [0, π], and find FIR filters which best approximate the obtained spectra,
either by an inverse DFT (Kolundžija et al., 2009c) or by some frequency-domain
filter design procedure (Oppenheim and Schafer, 1989; Berchin, 2007).
2. Direct time-domain FIR filter computation (Lebret and Boyd, 1997; Yan
and Ma, 2005)
9 It
is assumed that the computed filters are real and have a conjugate-symmetric spectrum.
2.8 Acoustic beamforming
41
Denote by vector hm the Nh coefficients of the impulse response hm [i] of the
filter used by transducer m. Let the vector h contain all impulse responses hm :
h = [hT1 hT2 . . . hTM ]T .
(2.130)
The vector of filters’ frequency responses for a given normalized frequency ω is
then given by
H(ω) = V (ω) h ,
(2.131)
where the matrix V (ω) has entries defined by

v(ω)
0Nh ×1
 0Nh ×1 v(ω)

V (ω) = IM ×M ⊗ v T (ω) =  .
..
 ..
.
0Nh ×1
0Nh ×1
...
...
..
.
0Nh ×1
0Nh ×1
..
.
...
v(ω)
T


 ,

(2.132)
with
v(ω)
=
[1 e−iω . . . e−i(Nh −1)ω ]T
0Nh ×1
=
T
[0
| 0 {z. . . 0}] .
(2.133)
Nh
To obtain the filter coefficients at a single normalized frequency ω, one needs to
replace (2.131) in (2.129), giving rise to the following formulation:
minimize
subject to
where F (ω) = G(ω) V (ω).
kF (ω) h − Dkx
kV (ω) hk2 ≤ Hmax .
(2.134)
In order to solve the optimization problem at Nf normalized frequencies10
ωn , n = 1, . . . , Nf , one needs to stack matrices in order to optimize for all
frequencies ωn jointly:


F (ω1 )
 F (ω2 ) 


FA =  .
(2.135)
 , DA = 1Nf ×1 ⊗ D .
 ..

F (ωNf )
The transducer array design is then stated as follows:
minimize
subject to
kFA h − DA kx
k V (ωl ) h k ≤ Hmax ,
l = 1, . . . , Nh
(2.136)
Both separate frequency optimization and direct FIR computation can be solved
using interior point methods (Boyd and Vandenberghe, 2004), which are efficient
algorithms for solving convex programs. The separate frequency optimization has
lower complexity, since it deals with solutions of Nh times lower dimension. However,
one needs to be careful when choosing filter lengths, since it can in general give
frequency responses Hi (ω) which cannot be well approximated by FIR filters of short
length Nh .
10 In this case, the number of evaluation frequencies N does not have to be equal to the filter
f
length Nf .
Acoustics Fundamentals
42
Magnitude-optimized transducer arrays
In some cases, it is of interest to match a desired directional response in a magnitude sense, without worrying about phase. The previously described optimization
procedures optimize the difference between two vectors of complex gains—one of the
transducer array and the other of the desired response—implicitly taking the phase
into the optimization criterion.
In order to optimize the transducer array’s directional response in the magnitude
sense, the objective function needs to be changed. In case of a single frequency ω, the
optimization problem takes the form:
minimize
subject to
k | G(ω) H(ω) | − | D | kx
kH(ω)k2 ≤ Hmax ,
(2.137)
The objective function in (2.137) is not convex and has multiple local minima in
general. Hence, conventional tools of convex optimization (Boyd and Vandenberghe,
2004) cannot be used. Instead, one can use the so-called local solutions, which iteratively improve on an initial solution of the array filter coefficients. The quality of the
final solution relative to the optimum is then dependent on the initial solution.
We present here a variation of a local optimization algorithm by Kassakian (2006),
but a similar procedure was proposed by Wang et al. (2003). The algorithm in
(Kassakian, 2006) was given for the problem called magnitude least squares (MLS).
We present Algorithm 2.1, which has a more general form that allows one to minimize
an arbitrary convex norm of the magnitude error, and also include convex constraints.
Algorithm 2.1 Minimize the error norm of the directional response’s magnitude
(adapted from (Kassakian, 2006)).
1. Choose the solution tolerance 2. Choose the initial solution H(ω)
3. repeat
4.
E ← k |G(ω) H(ω)| − |D| kx
5.
Compute D̂(ω) such that ∀j ∈ {1, . . . , N }
|D̂j (ω)| = |Dj |
∠D̂j (ω)
6.
= ∠(G(ω) H(ω))j
Solve the following quadratic program
minimize
subject to
k G(ω) H(ω) − D̂(ω) kx
kH(ω)k2 ≤ Hmax
7.
E 0 ← k |G(ω) H(ω)| − |D| kx
8. until |E 0 − E| < In Algorithm 2.1, x stands for any convex norm, including the most widely used
l2 - and l∞ -norms.
2.8 Acoustic beamforming
43
Since | |x| − |y| | ≤ |x − y|, with equality only when the complex numbers x and
y have the same argument (phase, in this context), step 6. can only decrease the
objective function. Furthermore, since the objective function is non-negative, Algorithm 2.1 provides a solution that lies in a local minimum of the objective function.
Algorithm 2.1 adds an additional, outer iteration loop to the complex-gain optimized beamformer computation. The resulting increase in complexity might render
the direct FIR filter computation solution prohibitively complex, especially for longer
FIR filters. Thus, if complexity is a concern, it is more suitable to use the separate
frequency optimization.
44
Acoustics Fundamentals
Chapter 3
Microphone Arrays For
Directional Sound Capture
3.1
3.1.1
Introduction
Background
The history of directional microphones goes back to the first half of the twentieth
century, with the works of Olson (Olson, 1946, 1967, 1980) figuring most prominently.
Olson and his coworkers at RCA Laboratories were seeking a solution to replace
pressure-sensitive, omnidirectional1 microphones for sound pickup in motion picture
studios. The problem with pressure microphones was due to the requirement of nonintrusive recording, where the microphone would be placed several meters away from
the actors. As discussed in Section 2.7, sound pickup at comparable distances may
go way past the critical distance of a room, making the recorded sound less natural
and notably decreasing speech intelligibility due to the dominance of reflected and
reverberant sound.
Along with some sophisticated constructions that could acquire sound directionally that did not gain widespread usage, the design which stuck out for its simplicity,
adequate form factor, and consistent directional performance over a wide range of
frequencies was the pressure gradient or velocity microphone, and variations thereof
known as the unidirectional or cardioid microphone (Olson, 1946, 1967, 1980). Cardioid microphones enjoyed great popularity in studio recording for their suppression
of sounds in the back half-space, and they are the most widely used directional microphones to date (Olson, 1980; Elko, 2000). Around the same time as Olson, Blumlein
started experimenting with what later became popular under the name two channel stereo systems, as a way to reproduce sound beyond the anchored reproduction
positions defined by the used loudspeakers (Blumlein, 1931). The law of intensity
stereo, expressed through the law of sines or the law of tangents (Pulkki and Karjalainen, 2001), needed an appropriate recording setup which would automatically, in
1 Omnidirectional refers to the fact that the microphone picks up sound in a direction-independent
fashion.
45
46
Microphone Arrays For Directional Sound Capture
Figure 3.1: Blumlein XY stereo microphone pair consisting of the velocity microphones with dipole directional response pointing towards ±45◦ . The x axis points
towards the front.
the recording process, mix the incoming sound such that it gets faithfully reproduced
over a two-channel stereo system. As a solution, Blumlein came up with the socalled Blumlein XY pair —a technique widely used even today—that consists of two
matched velocity microphones pointing towards ±45◦ relative to the front. Blumlein
XY pair is illustrated in Figure 3.1. As velocity microphones with bidirectional or
figure-of-eight directional response, Blumlein used a pair of closely-spaced pressure
microphone capsules, which will be described in Section 3.2.
Apart from spot and surround recording, gradient microphones are used for particle velocity and sound intensity measurements under the name of p-p probes (e.g,
see Fahy, 1977; Raangs et al., 2003; de Bree et al., 2007; Merimaa, 2002).
The more recent research on directional microphones expands into the realm of
microphone arrays. What has previously been achieved through careful acoustical
design, can to a certain extent be achieved and even extended using multiple microphones and array signal processing techniques. Blumlein pair can be considered one
of the earliest and simplest microphone arrays. Olson (1946) described various microphone array configurations, including any-order pressure gradient microphones, combinations thereof called unidirectional microphones, and unidirectional microphones
that combine gradient microphones and delay elements. Although he denoted all these
different microphone types as gradient microphones, a more recent naming convention classifies as gradient only the microphone arrays that combine pressure elements,
while microphone arrays that also employ delay elements are termed differential microphone arrays (Elko, 2000). Differential microphone arrays found applications in
hands-free communication (e.g., see Elko, 1996) and hearing aid devices (e.g., see
Preves et al., 1998; Geluk and de Klerk, 2001).
Microphone arrays capture sound in a directional fashion using beamforming,
which can roughly be classified into two main directions.
The first category is a classical adaptive beamforming application of acquiring
a desired signal, while trying to suppress any noise sources and interferences (Cox
et al., 1987). The adaptation is not restricted to undesired sources, but holds for
the desired source as well, in the sense that the desired source might change location
3.1 Introduction
47
and the microphone array needs to be able to track it. As the name suggests, adaptive beamforming is a signal-dependent way of capturing sound, and its performance
depends on a number of factors, such as the ability to locate the desired source, statistical properties of different sources and mutual dependence between active sound
sources. Also, adaptive beamforming is usually performed in the frequency domain,
and in general achieves a frequency-dependent directional capture of sound.
The other category of microphone arrays, which is treated in detail in Chapter 4,
focuses on decomposing the captured sound field in a way that facilitates its analysis.
This type of decomposition is also related to designing multichannel sound reproduction strategies, such as for instance ambisonic decoding (Gerzon, 1980a, 1973; Cooper
and Shiga, 1972), which is based on matching sound field’s orthogonal harmonic components in a single point. These strategies are discussed in more detail in Chapter
6.
Unless stated otherwise, the analyses of microphone arrays consider far-field conditions in the audible frequency range. As it was previously mentioned, far-field
conditions correspond to the simplifying assumption where sound sources emit plane
waves. Also, the problems of sound field capture and analysis, unlike the radiation
problems, are usually concerned with direction of arrival of the incoming sound waves.
Thus, wave vector k used in this chapter points toward the direction of arrival of sound
waves, and as a consequence, a plane wave has a slightly different form, given by
p(r, t) = P0 e−i(k
T
r+ωt)
,
where P0 is its complex amplitude, and ω its frequency.
3.1.2
Chapter outline
Section 3.2 presents an analysis of a plane-wave sound pressure field viewed as both a
spatial and temporal phenomenon, i.e., as a multivariate function of spatial location
and time. The presented analysis exposes the operations of taking gradients and
directional derivatives of a sound pressure field as combinations of its spatial and
temporal derivatives. It also gives a clear interpretation of gradient and differential
microphone arrays: the former as devices used for measuring only spatial derivatives,
and the latter as devices used for measuring spatio-temporal derivatives of the sound
pressure field. In essence, it shows the equivalence of the two microphone array types.
Our framework also allows for easy design of differential microphone arrays given a
desired directional response.
In Section 3.3, we show how a discretized problem of designing a microphone array
beamformer can come to the same end. In other words, we show that following the
steps of
• Assembling a microphone array
• Obtaining directional responses of each microphone in a number of directions
relative to a single, central reference point
• Computing microphone post-filters that optimally synthesize the desired directional response in the control directions
48
Microphone Arrays For Directional Sound Capture
can be used to design directional microphones. Since this design procedure can account for non-ideal characteristics of different microphones through measurements,
and can incorporate additional constraints such as maximum filter gains, it is more
flexible and practical than the conventional theoretical approaches.
Conclusions are given in Section 3.4.
3.2
3.2.1
Differential microphone arrays
Spatial derivatives of a far-field sound pressure field
Assume that a single far-field source emits a simple complex sinusoid with temporal
frequency ω. Furthermore, let the formed sound field be represented by a plane wave
with wave vector k,2 given by
p(r, t) = P0 e−i(k
T
r+ωt)
,
(3.1)
where r = [x y z]T and k = ωc .
Let n = [nx ny nz ]T be an arbitrary unit-norm vector. The spatial derivative
of the sound pressure field p(r, t) along the direction n quantifies its directional rate
of change along n. It is given by the projection of the sound pressure field’s spatial
gradient onto the vector n:
∂
p(r, t) = nT ∇p(r, t) = −i p(r, t) nT k = −ik cos α p(r, t)
∂n
(3.2)
where α is the angle between vectors k and n.
Iterating the operation of taking a directional derivative along n leads to the
expression for the nth-order spatial derivative along n, which reads
∂m
p(r, t) = (−ik)m (cos α)m p(r, t) .
∂nm
(3.3)
In expression (3.3), frequency- and angle-dependent terms are clearly separated.
The frequency-dependent term (−ik)m indicates a high-pass magnitude frequency
characteristic irrespective of angle α, which is specific to an mth-order differentiator.
Frequency characteristics for different orders m are shown in Figure 3.2(a). The
directional characteristic, determined by angle α, goes from omnidirectional for m = 0,
to highly-directional high-order cosine terms, as shown in Figure 3.2(b).
3.2.2
Spatio-temporal derivatives of a far-field sound pressure
field
Consider again a single-frequency plane wave with wave vector
k = [k cos φ sin θ k sin φ sin θ k cos θ]T ,
2 We mentioned in the introduction to this chapter that the wave vector k point towards the
direction of arrival of sound waves.
3.2 Differential microphone arrays
49
100
80
m=0
m=1
m=2
G [dB]
60
40
20
0
−20
1
2
10
3
10
4
10
10
f [Hz]
(a)
m=0
m=1
0dB
−7.5dB
−15dB
−22.5dB
−30dB
m=2
0dB
−7.5dB
−15dB
−22.5dB
−30dB
0dB
−7.5dB
−15dB
−22.5dB
−30dB
(b)
Figure 3.2: Directional (a) and frequency (b) characteristics of spatial derivatives of
different order m for a plane-wave sound field.
which gives rise to the sound pressure field given by (3.1). Being a function of three
spatial coordinates, x, y, and z, and a temporal coordinate t, the total gradient of
the sound pressure field (3.1) is given by
∇p(x, y, z, t)
with k =
T
=
=
−ik [cos φ sin θ sin φ sin θ cos θ c] p(x, y, z, t) ,
∂p ∂p ∂p ∂p
∂x ∂y ∂z ∂t
T
(3.4)
ω
c.
First-order spatio-temporal derivative
To compute a spatio-temporal derivative, one needs to project the gradient, given
by (3.4), onto a vector with both spatial and temporal components.
Let us define a somewhat unintuitive notion of spatio-temporal direction with the
following unit vector:
T
u = [ρu cos φu cos θu ρu sin φu cos θu ρu cos θu ut ] ,
(3.5)
where ρu ∈ [0, 1], θu ∈ [0, π], and φu ∈ [0, 2π] define spatial coordinates, and ut ∈
[−1, 1] temporal coordinate of the vector u. Note that having a unit norm implies
Microphone Arrays For Directional Sound Capture
50
ρ2u + u2t = 1. Also, the ratio ρuut gives the relation between the spatial and temporal
parts of the vector u.
The spatio-temporal derivative of a plane-wave sound field along the direction
given by the vector u then reads
∂
p(x, y, z, t)
∂u
= uT ∇p(x, y, z, t)
= −ik (ρu cos α + cut ) p(x, y, z, t) ,
(3.6)
where α is the angle between the spatial directions defined by (θ, φ) and (θu , φu ).
As with the spatial derivative, there is a clearly separated high-pass frequency
term −ik, that is characteristic to differentiators.
The directional characteristic is a combination of the first-order bidirectional characteristic ρu cos α and the omnidirectional characteristic cut . The overall directional
response is determined by relative contributions of the two derivatives—spatial and
temporal—given by the ratio ρuut . In the two extreme cases, when ρu = 0 and ρu = 1,
the directional characteristic is omnidirectional and bidirectional, respectively. The
other interesting combinations, such as a cardioid, and the so-called tailed cardioids
and sub-cardioids, are defined in Table 3.1 and shown in Figure 3.3.
Response type
ρu
ut
Cardioid
Sub-cardioid
Hyper-cardioid
Super-cardioid
c
(c, 0)
3c
√
3− 3
√
c
3−1
Table 3.1: Some well known first-order directional characteristics expressed through
the ratio ρuut of the spatio-temporal derivative.
cardioid
0dB
−7.5dB
−15dB
−22.5dB
−30dB
subcardioid
0dB
−7.5dB
−15dB
−22.5dB
−30dB
hyper−cardioid
0dB
−7.5dB
−15dB
−22.5dB
−30dB
super−cardioid
0dB
−7.5dB
−15dB
−22.5dB
−30dB
Figure 3.3: Directional characteristics of first-order spatio-temporal derivatives of a
plane-wave sound field for different ratios ρuut , as given in Table 3.1.
Higher-order spatio-temporal derivatives
Iterating the operation of taking the spatio-temporal derivative of the sound pressure
field along the same spatio-temporal direction u m times gives its mth order spatio-
3.2 Differential microphone arrays
Response type
Cardioid
Hyper-cardioid
Super-cardioid
ρu 1 /ut1
√ c
( 6√− 1)c
√
√
4− 7+ 8−3 7
√ √ c
√
7−2− 8−3 7
51
ρu 2 /ut2
√ c
( 6√
+ 1)c
√
√
4− 7− 8−3 7
√ √ c
√
7−2+ 8−3 7
∆φ = α1 − α2
0
π
0
Table 3.2: Some well known second-order polar patterns expressed through the ratios
ρu /ut and angle differences ∆α of the spatio-temporal gradient.
temporal derivative. It has the following form:
∂m
p(x, y, z, t) = (−ik)m (ρu cos α + cut )m p(x, y, z, t) .
∂um
(3.7)
Generalizing further, derivatives along a single direction u can be replaced by
spatio-temporal derivatives along multiple directions. Consider an n-tuple of spatiotemporal directions defined by
U = (u1 , · · · , um )
(3.8)
where each vector ui from the n-tuple is given by
T
ui = [ρui cos φui sin θui ρui sin φui sin θui ρui cos θui uti ] .
(3.9)
The mixed spatio-temporal derivative of a plane-wave sound pressure field along directions given by U has the form
m
Y
∂m
m
(ρui cos αi + cuti ) ,
p(x, y, z, t) = (−ik) p(x, y, z, t)
∂u1 . . . ∂um
i=1
(3.10)
where αi is the angle between directions (θ, φ) and (θui , φui ).
Like spatial derivatives, the spatio-temporal derivatives have a high-pass frequency
characteristic (−ik)m of an mth-order differentiator, shown in Figure 3.2(a).
The directional characteristic is proportional to a linear combination
of spatial
Qm
gradients of different orders, which is seen by expanding the product i=1 (ρui cos αi +
cuti ) in (3.10). The same can be said for the term (ρu cos α + cut )m in (3.7), which is
a special case of (3.10).
As with the first order, the shape of the directional characteristic of a high-order
spatio-temporal derivative of a plane-wave sound field is determined by the choice of
vectors ui , i.e., the parameters ρui , θui , φui , and uti . Some well known second-order
ρ
ρ
directional patterns, resulting from different choices of ratios uut 1 and uut 2 , and angle
1
2
differences ∆α = α1 − α2 , are given in Table 3.2 and shown in Figure 3.4.
3.2.3
Differential microphone arrays
The previously presented theoretical analysis of spatio-temporal derivatives of a planewave sound field serve as a basis for designing gradient and differential microphone arrays. Practical differential microphone arrays are based on approximating derivatives
Microphone Arrays For Directional Sound Capture
52
cardioid
0dB
−7.5dB
−15dB
−22.5dB
−30dB
hyper−cardioid
0dB
−7.5dB
−15dB
−22.5dB
−30dB
super−cardioid
0dB
−7.5dB
−15dB
−22.5dB
−30dB
Figure 3.4: Directional characteristics of second-order spatio-temporal derivatives of
a plane-wave sound field for different ratios ρuut and angle differences ∆α, as given in
Table 3.2.
with finite differences. They combine values of the sound pressure field in closelyspaced points in space and time,3 either acoustically (pressure at two faces of a diaphragm, where the acoustic paths to the two faces of a diaphragm have different
lengths) or electronically (pressure at different microphones of a microphone array
combined with delay elements).
In the remainder of this section, we present a few practical differential microphone
array realizations which are based on the previous spatio-temporal gradient analysis.
First-order differential microphone arrays: cardioid, hyper-cardioid, and
super-cardioid
Figure 3.5: First-order differential microphone realization using two pressure microphones and a delay element.
The first-order directional characteristics of sound field’s spatio-temporal derivatives can be approximated using a finite difference in space and time of a sound
pressure field. The concept is illustrated in Figure 3.5, where signals from two closelyspaced microphones, spaced at a distance d, are combined with a delay element. The
illustrated device is the simplest first-order differential microphone array.
3 Points are separated by a distance much shorter than the wavelength, and a time much shorter
than the period of a plane wave.
3.2 Differential microphone arrays
53
The response to a plane wave (3.1) of the first-order differential microphone array
from Figure 3.5 is given by
k
td
pd (t) = −2i sin
(d cos α + ctd ) p r, t −
,
(3.11)
2
2
where k is the wave number, d the inter-microphone distance, td the used delay,
and r the position of the microphone array’s center (mid-point between the two
microphones). At low frequencies, (3.11) can be approximated by
td
.
(3.12)
pd (t) ≈ −ik (d cos α + ctd ) p r, t −
2
From (3.12), one can see that the ratio d/td determines the directional response of
a practical differential microphone array in the same way the ratio ρu /ut determines
the directional characteristic of the spatio-temporal derivative of a plane wave, given
in (3.6).
cardioid
0dB
−7.5dB
−15dB
−22.5dB
−30dB
super−cardioid
hyper−cardioid
0dB
−7.5dB
−15dB
−22.5dB
−30dB
0dB
−7.5dB
−15dB
−22.5dB
−30dB
f=300
f=3000
f=7000
Figure 3.6: Directional responses at different frequencies of first-order differential
microphone arrays√ realized as shown in Figure 3.5, with d = 2 cm and td = d/c
3−1)
√
(cardioid), td = d(
(super-cardioid) and td = d/3c (hyper-cardioid).
c(3− 3)
Figure 3.6 shows directional responses of the practical cardioid, super-cardioid,
and hyper-cardioid microphones realized with the microphone combination shown in
Figure 3.5, with d = 2 cm.
From Figure 3.6, it can be seen that the shape of directional responses of first-order
microphone arrays is frequency-dependent, and that it corresponds to the desired responses, shown in Figure 3.3, only at low frequencies. Above the aliasing frequency,4
the directional responses deviate from the desired ones, as can be observed in Figure 3.6 for the frequency f = 7000 Hz.
Second-order differential microphone arrays
In this part, it is shown how clover-leaf directional responses sin 2α and cos 2α can
be obtained in two different ways based on the analysis from the beginning of this
section.
4 The aliasing frequency of a first-order differential microphone array is dependent on the intermicrophone distance d and the used delay td .
54
Microphone Arrays For Directional Sound Capture
Clover-leaf response sin 2α: quadrupole microphone array. The clover-leaf
directional response sin 2α can be represented as a product of directional responses of
two plane-wave sound field’s spatial derivatives: the spatial derivative along the axis
x, which has a directional response cos α, and the spatial
derivative along the axis
y, which has a directional response sin α (or cos α − π2 ). As such, the directional
response sin 2α can be realized as a cascade of two spatial derivative approximations:
first along the axis x, and then along the axis y (or vice versa).
Figure 3.7: Quadrupole microphone array used for obtaining a clover-leaf directional
response of the form sin 2α.
Figure 3.7 illustrates a configuration of four pressure microphones used to approximate the previously described cascade of spatial derivatives of a sound field.
Figure 3.8 shows the directional responses at various frequencies of the quadrupole
microphone array shown in Figure 3.7, when the inter-microphone distance d = 2 cm
is used.
0dB
−7.5dB
−15dB
−22.5dB
−30dB
f=300 Hz
f=3000 Hz
f=15000 Hz
Figure 3.8: Directional responses at various frequencies of the quadrupole microphone array shown in Figure 3.7, with inter-microphone distance d = 2 cm.
Clover-leaf response cos 2α: three-microphone line array. The clover-leaf
directional response of the form cos 2α can be represented as
or equivalently, as
cos 2α = 2 cos2 α − 1 ,
(3.13)
√
√
cos 2α = ( 2 cos α − 1)( 2 cos α + 1) ,
(3.14)
3.2 Differential microphone arrays
55
which is a product of directional characteristics of two first-order spatio-temporal
derivatives of a plane-wave sound pressure field. Consequently, the response cos 2α can
be√obtained by a combination of two√spatio-temporal derivatives: one with ρu /ut =
− 2, and the other with ρu /ut = √2, or equivalently, two spatio-temporal
finite√
differences: the first with d/td = − 2c, and the second with d/td = 2c (or vice
versa), as shown in Figure 3.9.
Figure 3.9: A line-array with three microphones used to obtain the clover-leaf directional response cos 2α.
0dB
−7.5dB
−15dB
−22.5dB
−30dB
f = 300 Hz
f = 3000 Hz
f = 7000 Hz
Figure 3.10: Directional responses at various frequencies of the microphone array
shown in Figure 3.9, with inter-microphone distance d = 2 cm and inter-microphone
delay td = √d2c .
Figure 3.10 shows the directional responses at different frequencies of the microphone array shown in Figure 3.9, with the inter-microphone distance d = 2 cm and
delay td = √d2c .
Like the first-order differential microphone arrays, the second-order differential
microphone arrays have a frequency-dependent directional response. It is a good
approximation of the desired response only below the aliasing frequency. This can be
observed in Figure 3.10, which shows how the shape of the directional response of the
microphone array from Figure 3.9 deforms at the frequency f = 7000 Hz.
Note that the directional response of the form sin 2α can also be obtained by
rotating by 45◦ the microphone array from Figure 3.9.
Microphone Arrays For Directional Sound Capture
56
3.3
Directional microphone arrays as acoustic beamformers
With the advent of powerful computational devices and the advances in the field
of numerical mathematics, the virtue of discretizing continuous problems made it
possible to solve, at least approximately, the problems previously considered as hard.
The same holds for the problem of microphone array design.
We have already seen how beamforming is used for obtaining a desired directional
response with an array of transducers, each having its own directional response, possibly different from the others.
Suppose that one is given an array of microphones with a task of capturing sound
with a desired directional response in the xy-plane. Using the presented spatiotemporal gradient analysis for differential microphone arrays, and assuming directional response’s symmetry relative to the array’s main axis,5 one would need to
factor the trigonometric polynomial that describes the desired directional response
d(α) = P (cos α) =
N
X
an (cos α)n ,
(3.15)
n=0
and build a differential microphone array cascade that approximates the desired response. In addition to building a said cascade of microphones and delay elements,
one needs to post-equalize its high-pass frequency characteristic.
Instead of the mentioned set of steps, one could choose to assemble M microphones
into a desired topology,6 compute or measure responses from each microphone toward
N control points on a large circle enclosing the array,7 and solve the following array
design problem
minimize
k G(ω) H(ω) − D k2
(3.16)
subject to
|H(ω)| Hmax 1M ×1 ,
which can be expressed as a quadratically constrained quadratic program and solved
using interior point method (Lebret, 1996). In (3.16), G(ω) is an N × M matrix
with frequency responses between microphones and control points, D an N × 1 vector
containing the desired directional response discretized in the control points, Hmax the
maximum allowed filter gain, and H(ω) an M ×1 vector with microphone array beamformer post-filters. Provided that the number of control points is sufficiently large,
the obtained microphone post-filters optimally synthesize d(α) given the constraints.
In this section, we show how the latter approach for computing a set of beamformer filters can be used to obtain various directional responses with a simple linear
microphone array consisting of three uniformly-spaced pressure microphones, shown
in Figure 3.11.
The microphone array from Figure 3.11 is used to synthesize the following four
5 Symmetry
6 For
7 The
relative to the array’s axis makes the directional response an even function.
instance, microphones can form a linear of rectangular grid, or be distributed on a circle.
number of control points should be larger than the angular bandwidth of d(θ) and also of
directional responses of the used microphones.
3.3 Directional microphone arrays as acoustic beamformers
57
Figure 3.11: Directional sound acquisition with a linear microphone array through
beamforming.
d 1 (φ )
90 2
120
d 2 (φ )
90 2
60
1
150
120
30
180
30
180
0
210
330
240
1
150
0
210
60
300
330
240
270
300
270
f=300Hz
f=3000Hz
f=7000Hz
d 3 (φ )
90 2
120
d 4 (φ )
90 2
60
1
150
120
30
180
30
180
0
210
330
240
1
150
0
210
60
300
330
240
270
300
270
Figure 3.12: Directional responses of a three-element line microphone array from
Figure 3.11 obtained with an optimized beamformer.
directional responses:
d1 (φ) = 1
d2 (φ) = cos φ
d3 (φ) = cos(2φ)
d4 (φ) = (cos φ)2 .
(3.17)
Each directional response is specified relative to the location of the middle microphone
(microphone 2). Beamformer filters are computed using (3.16), with Hmax = 20 dB
and N = 7 uniformly-spaced azimuthal control directions.
The resulting directional responses at different frequencies are shown in Fig-
Microphone Arrays For Directional Sound Capture
58
ure 3.12, while the same directional responses normalized by their maximum values
are shown in Figure 3.13.
d 1 (φ )
d 2 (φ )
0dB
−7.5dB
−15dB
−22.5dB
−30dB
0dB
−7.5dB
−15dB
−22.5dB
−30dB
f=300Hz
f=3000Hz
d 3 (φ )
0dB
−7.5dB
−15dB
−22.5dB
−30dB
f=7000Hz
d 4 (φ )
0dB
−7.5dB
−15dB
−22.5dB
−30dB
Figure 3.13: Directional responses of a three-element line microphone array from
Figure 3.11 obtained with an optimized beamformer. Directional responses are normalized such that the maximum value corresponds to 0 dB.
As one would expect, the best way to obtain the omnidirectional response d1 (φ)
in the center of the array is by taking the signal from the middle microphone, which
is what the beamformer optimization procedure ends up doing. This is verified in
Figure 3.14(a). The obtained directional response corresponds to the desired one at
all frequencies.
The dipole response d2 (φ) in the center of the array can be obtained by combining the outer two pressure microphones. Inspecting the frequency responses of the
microphone post-filters H1 (ω), H2 (ω), and H3 (ω) in Figure 3.14(b), one can see that
the beamformer design effectively does this. The high-frequency directional response
degenerates due to the high-order aliased harmonic terms, and the aliasing frequency
is determined by the distance 2d between microphones 1 and 3.
The second order circular harmonic response d3 (φ) requires combining all three
microphones, as with the differential microphone array described in Section 3.2.3. This
can be seen in Figure 3.14(c). Figure 3.13 shows that the obtained directional response
has a correct shape at all inspected frequencies. However, directional response at low
frequencies is highly attenuated due to the maximum filter gain constraint, as can be
3.4 Conclusions
59
observed in Figure 3.12.
Finally, the directional response d4 (φ) is matched well in shape at high frequencies.
Since it can be decomposed into a sum of an omnidirectional and a second-order term,
(cos φ)2 =
1 + cos(2φ)
,
2
(3.18)
the low-frequency directional response obtained with (3.16) can only match the omnidirectional part due to the maximum-gain constraint, which is apparent in Figure 3.12.
From Figure 3.14(d), one can see that all three microphones are used, and that the
constraint is effective up to above 1.5 kHz.
3.3.1
Discussion
In the examples shown for a simple three-microphone line array, it is evident that one
can approach the problem of directional sound acquisition in the discrete-space or
discrete-direction domain, and obtain optimal solutions with a convex optimization
solver (e.g., CVX by Grant et al. (Grant et al., 2011)). The advantage of stating the
problem in a discrete form lies in the added flexibility. For instance, one can add
physical constraints, such as a limit on filters’ gains to prevent high noise sensitivity,
and obtain the optimal solution under these constraints. Additionally, the discrete
problem requires knowledge of microphones’ directional responses in a few points,
which can be obtained with an anechoic measurement procedure. In this way, one
avoids sensitivity to microphones’ imperfect characteristics and miscalibration.
3.4
Conclusions
This chapter presented an analysis of a plane-wave sound pressure field as a multivariate function of spatial location and time. Under the light shed by this analysis,
gradient and differential microphone arrays show up as devices for approximating
sound pressure field’s spatio-temporal derivatives. Therefore, they conceptually realize the same functionality, with the former being a special case of the latter.
The presented analysis framework enables not only analyzing the response of a
given gradient or differential microphone array, but it can also be used for designing differential microphone arrays. The appropriate adjustment of the microphone
array’s parameters—such as the array orientation and shape, the inter-microphone
distances, and microphone signal delays—enables meeting the microphone array’s
desired response requirements.
We also showed how the traditional directional microphone array design can be
stated as a space- and frequency-discrete beamformer design problem and optimally
solved using efficient numerical procedures. The beamformer design problem offers
additional flexibility, such as relying on measured rather than theoretical microphone
directional responses, or constraining the gains of microphone post-filters in order to
prevent high noise sensitivity.
Microphone Arrays For Directional Sound Capture
60
(a)
10
| H | [ dB]
0
−10
H(1)
1
H(1)
2
−20
H(1)
3
−30
2
10
3
4
10
10
f [ Hz ]
(b)
20
| H | [ dB]
10
0
−10
−20
H(2)
1
H(2)
2
H(2)
3
−30
2
10
3
4
10
10
f [ Hz ]
(c)
| H | [ dB]
20
10
0
−10
H(3)
1
H(3)
2
H(3)
3
−20
2
10
3
4
10
10
f [ Hz ]
(d)
| H | [ dB]
20
10
0
−10
−20
2
10
H(4)
1
H(4)
2
H(4)
3
3
4
10
10
f [ Hz ]
Figure 3.14: Beamformer filters used to synthesize an omnidirectional response
d1 (φ) = 1 (a), dipole response d2 (φ) = cos φ (b), clover-leaf response d3 (φ) = cos(2φ)
(c), and squared-cosine response d4 (φ) = (cos φ)2 (d).
Chapter 4
Microphone Arrays For Sound
Field Capture
4.1
4.1.1
Introduction
Background
This chapter treats microphone arrays which have received substantial attention in
recent years, with a wide range of applications commonly termed sound field capture.
One obvious application of sound field capture is recording for high-fidelity spatial
sound reproduction, and the microphone arrays built for this purpose are commonly
denoted as sound field microphones. Loosely speaking, the Blumlein XY microphone
pair is a predecessor of sound field microphones, since it has the property that the
combination of the two microphones’ directional responses—the so-called microphone
encoding functions (Poletti, 1996)—encode the captured sources in a way that naturally translates to the two-channel intensity stereo (Lipshitz, 1986). The first more
comprehensive microphone array for sound field capture was the Soundfield microphone (Gerzon, 1975; Farrar, 1979), commonly associated with Ambisonics and the
efforts to decouple the recording and reproduction stages in the production of spatial
sound (Furness, 1990). Sound field microphones of higher order have been proposed
more recently by various authors (e.g, see Cotterell, 2002; Abhayapala and Ward,
2002; Meyer and Elko, 2002; Bertet et al., 2006). However, due to their limited capabilities as wide-band devices, higher-order sound field microphones are not widely
used in live recording practice.
Another application of microphone arrays for sound field capture is the analysis of
auditory scenes (Teutsch and Kellermann, 2005, 2006) and acoustic spaces (Rafaely
et al., 2007a) in terms of directions and number of active or passive1 sound sources.
The decomposition of a sound field in terms of circular or spherical harmonic components, discussed in Section 4.2, serves as the front-end of the aforementioned analysis.
It turns the sound localization problem into a formulation equivalent to the harmonic
1 For
instance, wall reflections are considered to be passive sound sources.
61
Microphone Arrays For Sound Field Capture
62
spectral analysis (Stoica and Moses, 1997), which is then solved using some of the
well-known spectral estimation methods.
Finally, sound field microphones can be used for directional sound field capture
discussed in Chapter 3. In essence, once the captured sound field is decomposed into a
set of circular or spherical harmonic components, one can combine these components
with the so-called modal beamforming (Meyer and Elko, 2002) to obtain an arbitrary
directional response of the order defined by the fidelity of the microphone array.
4.1.2
Chapter outline
In Section 4.2, we present ways to decompose two-dimensional (“horizontal”) and
three-dimensional sound fields in terms of circular and spherical harmonics components, respectively. As it turns out, these decompositions possess a straightforward
relation with directional sound capture, and give a way to record a sound field using differential, circular, or spherical microphone arrays, with an angular resolution
controlled by the complexity of the used microphone array.
Section 4.3 shows ways to capture harmonic components of a horizontal sound
field using microphone arrays built from gradient microphone capsules.
Section 4.4 presents circular microphone arrays, both with and without a baffle,
as devices for sampling a sound field on a circle and thereby capturing its circular
harmonic components.
Similarly, Section 4.5 presents unbaffled and baffled spherical microphone arrays,
which are used for obtaining spherical harmonic components of a three-dimensional
sound field.
Section 4.6 gives an analysis that focuses on the array-processing aspect of microphone arrays for sound field capture, where responses of different microphones
are combined to optimally achieve a desired sound field capture. We show that the
advantages offered by the array-design approach, underlined in Chapter 3, hold for
sound field microphone arrays, and we show a design of Soundfield microphone noncoincidence correction filters that corroborate this observation.
Conclusions are given in Section 4.7.
4.2
Wave field decomposition
The task of recording a sound field is to capture its representation which is informative
enough to allow for analyzing its spatial characteristics, like the number and distribution of active sound sources. It should also give rise to a decoding procedure that
would, based on the used reproduction setup, produce signals to be fed to the used
loudspeakers, such that the captured sound field is reproduced with high accuracy
over a desired listening area.
The microphone setup most appropriate for capturing a sound field is dependent on
the particular spatial acoustic scenario. For instance, if sound sources are distributed
towards arbitrary directions in 3D space, spherical microphone array topologies seem
most suited. For planar problems, where sound sources are coplanar, circular arrays
provide the most intuitive option.
In Chapter 2, a common pattern appeared in all the sound radiation problems,
be it in different coordinates, in free field or rooms. Each geometry gave rise to a
4.2 Wave field decomposition
63
set of eigenfunctions of the wave equation. The particular solutions, resulting from
the radiation of active sound sources in a considered geometry, were expressed as a
superposition of the mentioned eigenfunctions.
Without the knowledge of the problem geometry and the distribution of sound
sources, nothing can be told about the sound field in a region of interest unless one
covers the entire region with microphones and acquires the sound pressure field p(r, t).
Knowledge of the geometry goes a step further, allowing for sampling the sound field
more economically while retaining sufficient information for its perfect reconstruction.
This usually requires microphone apertures of continuous type, which to date have
not been built in practice. In order to be practical, one has to use an array of discrete
microphone elements. The topology of the array is dictated by the geometry of the
acoustic problem, while the sound field acquisition fidelity is dependent on the number
of used microphones, number of active sources, and the frequency range of interest.
4.2.1
Horizontal sound field decomposition
In the problems of sound capture and reproduction, it is very common to focus only on
the horizontal plane analysis. It is a characteristic of the first stereo systems (Blumlein, 1931; Snow, 1955), Quadraphonisc (Bauer, 1974), some matrix surround systems (e.g., see ITU-775, 1994), and planar Ambisonics (Furness, 1990) and Wave
Field Synthesis (Berkhout et al., 1993). The same can be told for the sound recording
techniques used for providing content for the mentioned multichannel systems.
In many cases, the assumption that a single-plane analysis of sound suffices is
not far from reality. Whether listeners are in the open air or a room, most sound
sources are usually to within a small height difference of their ears. Thus, the main
sound localization task one is faced with in everyday life happens roughly in what
is referred to as the horizontal plane, where the human localization ability is indeed
most effective (Blauert, 1997).
In the following analyses, the sources are assumed to be sufficiently far away from
the analyzed region for the far-field conditions to hold, i.e., the sources can be modeled
as plane-wave radiators. Since the sound field is analyzed in the horizontal, xy-plane,
it can safely be assumed that propagation happens perpendicularly to the z-axis, and
that z-component of wave vectors vanishes (kz = 0).
Angular spectrum
The angular spectrum, described in Section 2.4.2, gives a plane wave description of a
sound field in a plane, and is a viable tool for analyzing a horizontal sound field. Let
P (kx , ky , ω) denote the angular spectrum in the xy-plane.2 The sound pressure field
expressed through the angular spectrum, given by (2.30), simplifies in this case to
Z ∞Z ∞
1
P (kx , ky , ω)e−i(kx x+ky y) dkx dky .
(4.1)
P (x, y, ω) =
4π 2 −∞ −∞
Note the sign change in the complex exponential due to the direction reversal of the
wave number k such that it points toward the direction of sound arrival.
2 Note
that the z-coordinate has been omitted, since a horizontal sound field is independent of it.
Microphone Arrays For Sound Field Capture
64
Let the angular spectrum be expressed in polar coordinates,
with kr2 =
ω2
c2 .
kx
= kr cos ϕ
ky
= kr sin ϕ ,
If the same is done with the sound pressure field, using
x
= r cos φ
y
= r sin φ ,
(4.1) becomes
1
P (r, φ, ω) =
4π 2
Z
2π
0
Z
∞
P (kr , ϕ, ω)e−ikr r cos(φ−ϕ) kr dkr dϕ .
(4.2)
0
Furthermore, due to the one-to-one relation between the frequency ω and the radial
wave vector kr , the angular spectrum P (kr , ϕ, ω) takes a non-zero value only when
kr = ωc . This can be represented by
ω
P (kr , ϕ, ω) = 2π δ kr −
P (ϕ, ω) .
(4.3)
c
We call P (ϕ, ω) the horizontal angular spectrum. Finally, the exponential term
e−ikr r cos(φ−ϕ) admits the Jacobi-Anger expansion (Abramowitz and Stegun, 1976)
∞
X
e−ikr r cos(φ−ϕ) =
(−i)n Jn (kr r) ein(φ−ϕ) .
(4.4)
n=−∞
Substituting (4.3) and (4.4) into (4.2) leads to
P (r, φ, ω) =
∞
ω X
(−i)n Pn (ω) Jn (kr r) einφ ,
c n=−∞
(4.5)
where Pn (ω) is the nth Fourier series or circular harmonic3 coefficient of the 2πperiodic horizontal angular spectrum P (φ, ω), given by
1
Pn (ω) =
2π
2π
Z
with
P (φ, ω) =
P (ϕ, ω) e−inϕ dϕ ,
(4.6)
0
∞
X
Pn (ω)einφ .
(4.7)
n=−∞
Let us first come back to the notion of a directional microphone. In Section 2.8, it
was mentioned that the directional response of a microphone d(θ, φ) determines with
which (complex) gain the microphone reacts to a plane wave coming from a given
direction (θ, φ). If a directional microphone captures a sound field composed of a
3 The
complex exponential functions einφ are also called circular harmonics.
4.2 Wave field decomposition
65
continuum of plane waves, it actually integrates the angular spectrum weighted by its
directional response. For a horizontal sound field, the signal captured by a directional
microphone can thus be expressed by
S(ω) =
2π
Z
P (ϕ, ω) d(ϕ) dϕ ,
(4.8)
0
where d(ϕ) is the directional response of the microphone in the horizontal plane.
One can now make the following two observations about the horizontal sound field
expressed in (4.5):
• The Fourier series coefficients Pn (ω) of the angular spectrum are a complete
representation of a horizontal sound field.
• A comparison between (4.6) and (4.8) reveals that for obtaining a full description of a horizontal sound field, it suffices to have coincident recordings from
directional microphones whose directional responses are of the form e−jnφ , for
n ∈ Z. An equivalent and physically more meaningful combination of directional
responses involves cosines and sines, cos(nφ) and sin(nφ), with n ∈ Z+ ∪ {0}. In
that case, the Fourier series expansion of the horizontal angular spectrum takes
the form
P (φ, ω) = A0 (ω) +
∞
X
An (ω) cos(nφ) +
n=1
∞
X
Bn (ω) sin(nφ) ,
(4.9)
n=1
with
An (ω)
Bn (ω)
n
= n
=
=
1
π
1
π
Z
Z
0
2π
0
1
2 ,
1,
2π
P (ϕ, ω) cos(nϕ) dϕ
P (ϕ, ω) sin(nϕ) dϕ
n=0
.
n>0
(4.10)
Helical wave spectrum
Under the scenario characteristic of a horizontal sound field, cylindrical waves arise as
a natural analysis framework. In particular, the interior value problem in cylindrical
coordinates, described in Section 2.5.1, corresponds exactly to the horizontal sound
field of far-field sound sources.
Recall the equation (2.38), which expresses the sound field inside an infinite sourcefree cylinder of radius b, whose axis coincides with the z-axis:4
P (r, φ, z, ω) =
Z
∞
1 X inφ ∞
e
Cn (kz , ω) e−ikz z Jn (kr r) dkz ,
2π n=−∞
−∞
(4.11)
4 Again, due the direction reversal of the wave vector k, the signs inside the complex exponentials
are changed.
Microphone Arrays For Sound Field Capture
66
A horizontal sound field is identical in any plane normal to the z-axis. This
makes the term inside the integral in (4.11) non-zero only when kz = 0, which can be
represented by
Cn (kz , ω) = 2π δ(kz ) Cn (ω) .
(4.12)
Replacing (4.12) into (4.11) yields the following cylindrical wave expansion:
P (r, φ, ω)
=
=
∞
X
n=−∞
∞
X
Cn (ω) Jn (kr r) einφ
(4.13)
Pn (r, ω) einφ ,
(4.14)
n=−∞
with kr = ωc and Pn (r, ω) = Cn (ω) Jn (kr r). Similar to the helical wave spectrum
defined for the exterior boundary value problem in Section 2.5.2, we denote Pn (r, ω)
as the horizontal helical wave spectrum.
Note the separation between the spatial variables r and φ in (4.13). The expansion
coefficients Cn (ω) are independent from the two spatial coordinates, and provide a
sufficient description of a horizontal sound field.
There is a high similarity between the representation obtained from the angular
spectrum, expressed in (4.5), and the horizontal helical wave spectrum (4.13). Both
express a circular harmonic expansion, and the two sets of coefficients, Cn (ω) and
Pn (ω), are related through
Cn (ω) = (−i)n
ω
Pn (ω) .
c
(4.15)
Capturing of the coefficients Cn (ω) requires a continuous circular aperture of an
arbitrary radius b, where each point on the aperture acts as an omnidirectional pressure sensing element. Using the orthogonality of the circular harmonics einφ , the
measurement amounts to applying a circular harmonic weighting function and summing the contributions along the aperture:
Cn (ω) =
1
1
2π Jn (kr b)
Z
2π
P (b, ϕ, ω) e−inϕ dϕ .
(4.16)
0
Note that there is a problem with the microphone equalization factor given by
Bessel functions Jn (· ) have oscillatory behavior around zero, meaning that
for some frequencies, the expansion coefficients Cn (ω) are unobtainable or they can
be obtained with high sensitivity to noise. This problem, as described later in the
chapter, can be circumvented by using directional apertures (i.e., with a cardioid
directional response) or mounting microphones on a rigid cylindrical baffle. Another
way of addressing the problem is a combination of concentric microphone apertures
(e.g., two concentric apertures).
1
Jn (kr b) .
4.2.2
Three-dimensional sound field decomposition
In the most general case, nothing can be assumed about locations of sound sources,
apart from the fact that they enclose the volume where the sound field is analyzed,
Measuring a horizontal sound field with gradient microphones
67
i.e., the listening volume. These circumstances do not provide any simplifications,
such as those in the case of a horizontal sound field. From the perspective of sound
field capture, the most appropriate analysis framework involves spherical waves, and
more specifically, the interior value problem in spherical coordinates, described in
Section 2.6.1.
Recall that the sound field inside a source-free sphere of radius b centered at the
origin is given by
∞ X
n
X
P (r, θ, φ, ω) =
Cmn (ω) jn (kr) Ynm (θ, φ) ,
(4.17)
n=0 m=−n
with k = ωc and r ≤ b.
In order to come up with the full description of the sound field inside the sphere,
it suffices to obtain the expansion coefficients Cmn (ω). Similarly to the acquisition
of the helical wave spectrum of a horizontal sound field, the spherical spectrum of a
three-dimensional sound field can be captured with a continuous, spherical aperture
of radius b, where each point on the aperture acts as a sound pressure sensor. Using
the orthogonality of spherical harmonics Ynm (θ, φ),
Z π Z 2π
1
n = n0 , m = m0
m
m0 ∗
Yn (θ, φ) Yn0 (θ, φ) sin θ dφ dθ =
,
(4.18)
0
otherwise
0
0
the expansion coefficients Cmn (ω) can be obtained by applying spherical harmonic
weighting functions Ynm (θ, φ) and summing the contributions around the aperture:
1
Cmn (ω) =
jn ( ωc b)
Z
0
π
Z
0
2π
P (r, θ, φ, ω) Ynm ∗ (θ, φ) sin θ dφ dθ .
(4.19)
Note that the expansion coefficients Cmn (ω), obtained by (4.19), can be used to
extrapolate the sound field outside the sphere of radius b, as long as the volume where
the sound field is extrapolated is free from sound sources.
Similarly to the case of circular apertures, the expansion coefficients Cnm (ω) obtained from spherical apertures with (4.19) suffers from ill-conditioning at frequencies
where the spherical Bessel function jn (· ) takes zero or very small values. This problem is circumvented by using directional apertures or mounting an aperture on a rigid
baffle, as described later in this chapter. Another way to avoid the ill-conditioning
involves using two concentric spherical apertures (Rafaely et al., 2007a).
4.3
Measuring a horizontal sound field with gradient
microphone arrays
Conceptually, a gradient microphone of order n has a directional response proportional to (cos φ)n , where φ is the angle of arrival of an incoming wave relative to
the microphone’s look direction. As mentioned in Chapter 3, gradient microphone
arrays are a particular case of differential microphone arrays, described in Section 3.2.
Their directional response approximates the desired (cos φ)n up to a given aliasing
frequency, which depends on the inter-microphone spacing.
68
Microphone Arrays For Sound Field Capture
In the previous section, it was shown how one can record a horizontal sound field
with coincident microphones whose directional responses are sin(nφ) and cos(nφ). In
this section, we show how gradient microphones of different orders can be used to
achieve the same goal, i.e., to acquire the Fourier series coefficients An (ω) and Bn (ω)
of the horizontal angular spectrum P (φ, ω).
For gradient microphones of orders zero and one, the directivity patterns are equal
(and hence equivalent) to the circular harmonics of orders zero and one. For orders
higher than one, one can use the definition of Chebyshev polynomials of the first
kind (Abramowitz and Stegun, 1976),
Tn (cos θ) = cos (nθ) ,
(4.20)
for obtaining the representation of circular harmonics in terms of gradient microphones’ directional responses (cos θ)m . This observation is formalized in the following
two propositions:
Proposition 1. The directivity pattern of the form cos (nθ) can be obtained as a linear
combination of directivity patterns (cos θ)m of different orders m, where m ≤ n.
Proof. The proof follows directly from the definition of the Chebyshev polynomial of
the first kind, given in (4.20).
Proposition 2. The directivity pattern of the form sin (nθ) can be obtained as a linear
π
3π m
combination of the directivity patterns (cos (θ − 2n
))m or (cos (θ + 2n
)) of different
orders m, where m ≤ n.
Proof. Using the identity
sin θ = cos θ −
π
2
sin (nθ) = cos nθ −
π
2
,
(4.21)
sin (nθ) can be expressed as a cosine:
= cos n θ −
π
2n
.
(4.22)
.
(4.23)
Equivalently, sin (nθ) can be expressed as:
sin (nθ) = cos nθ +
= cos n θ
3π
2
3π
+ 2n
π
Applying Proposition 1 to the right side of (4.22) or (4.23) for the angle (θ − 2n
)
3π
or (θ + 2n ), respectively, gives
π
sin (nθ) = Tn cos θ − 2n
(4.24)
and
sin (nθ) = Tn cos θ +
3π
2n
,
(4.25)
which completes the proof.
3π m
π
It should be noted that (cos (θ − 2n
))m and (cos (θ + 2n
)) are directivity patterns of gradient microphones of order m, whose main axis is rotated relative to the
π
3π
x-axis by 2n
and − 2n
, respectively.
By now, it should be clear how the circular harmonic coefficients An (ω) and Bn (ω)
can be obtained by the use of gradient microphones:
Measuring a horizontal sound field with gradient microphones
69
• The circular harmonic coefficient An (ω) can be obtained by linearly combining
the outputs of gradient microphones of orders up to and including n, whose axes
lie along the x-axis. The contribution of each gradient microphone is obtained
from the definition of the Chebyshev polynomial of the first kind, given in (4.20).
• The circular harmonic coefficient Bn (ω) can be obtained by linearly combining
the outputs of gradient microphones of orders up to and including n, whose axes
lie along the line that goes through the origin (microphone’s center) and forms
3π
π
or the angle 2n
with the x-axis. The contribution of each gradient
the angle 2n
microphone is obtained by applying the expression (4.24) or (4.25), respectively.
4.3.1
Gradient-based horizontal sound field microphones
In order to show how gradient microphone arrays can be utilized for realizing higherorder horizontal sound field microphones, three horizontal sound field configurations—
two of the second and one of the third order, are described in detail here. First-order
horizontal sound field microphone arrays (e.g., see Elko and Pong, 1997; Merimaa,
2002; Pulkki and Faller, 2006; Kolundžija, 2007) are not discussed here.
In the following, we denote by mi (t) and Mi (ω) the time- and frequency-domain
representation of the signal captured by the microphone with index i.
(a) Second-order horizontal sound field microphone array with omnidirectional microphones
Figure 4.1: Second-order horizontal sound field microphone array consisting of five
pressure microphone elements.
The configuration shown in Figure 4.1 can be used for capturing the Fourier coefficients An (ω) and Bn (ω) of orders up to two in the point O. These signals are
captured in the following way:
• The sound pressure signal A0 (ω), or the zero-order harmonic, is taken from
microphone 1,
A0 (ω) = M1 (ω) .
(4.26)
• The first-order Fourier coefficient A1 (ω), which corresponds to the circular harmonic cos θ, is obtained by the following combination of signals from microphones 2 and 4:
A1 (ω) = (M2 (ω) − M4 (ω)) H1 (ω) ,
(4.27)
Microphone Arrays For Sound Field Capture
70
where H1 (ω) is a filter used for equalizing a high-pass frequency characteristic
of a first-order gradient microphone array with inter-microphone distance of 2d.
• The first-order Fourier coefficient B1 (ω), which corresponds to the circular harmonic sin θ, is obtained by combining signals from microphones 2, 3, 4 and 5.
Namely, combining signals from microphones 3 and 5 in the way given in (4.27)
gives the signal
C1 (ω) = (M5 (ω) − M3 (ω)) H1 (ω) ,
(4.28)
whose directional response in the working frequency range is of the form ec1 (θ) =
cos (θ − π4 ). Furthermore, for obtaining the desired response of the form sin θ,
signals A1 (ω) and C1 (ω) need to be combined in the following way:
√
(4.29)
B1 (ω) = 2 C1 (ω) − A1 (ω) .
Note that first-order Fourier series coefficients A1 (ω) and B1 (ω) can be obtained
by combining microphone pairs (1, 2) and (1, 5). This approach, described in
more detail in (Kolundžija, 2007), would allow increasing the aliasing frequency
(due to shorter inter-microphone distance) at a cost of lower low-frequency sensitivity5 and response deviations at high frequencies due to non-coincidence.
• The second-order Fourier coefficients A2 (ω) and B2 (ω), which correspond to the
circular harmonics cos 2θ and sin 2θ, are obtained using the identities
cos 2θ = 2(cos θ)2 − 1
sin 2θ = 2 cos θ −
π
4
2
− 1,
(4.30)
which relate the circular harmonics cos 2θ and sin 2θ, and directivity patterns
of a pressure microphone, e0 (θ) = 1, and second-order gradient microphones,
eg2,1 (θ) = (cos θ)2 and eg2,2 (θ) = (cos (θ − π4 ))2 , respectively. The response
e0 (θ) is obtained from microphone 1, the response eg2,1 (θ) by combining microphones 1, 2 and 4, and the response eg2,2 (θ) by combining microphones 1, 5 and
3. The desired signals are given by
A2 (ω) = 2 [M2 (ω) − 2M1 (ω) + M4 (ω)] H2 (ω) − M1 (ω)
B2 (ω) = 2 [M5 (ω) − 2M1 (ω) + M3 (ω)] H2 (ω) − M1 (ω) ,
(4.31)
where H2 (ω) is an equalization filter for correcting a high-pass frequency characteristic of a second-order gradient microphone array with inter-microphone
spacing of d.
(b) Third-order horizontal sound field microphone array with omnidirectional microphones
The configuration shown in Figure 4.2 can be used for capturing the Fourier series
coefficients of the horizontal angular spectrum, An (ω) and Bn (ω), of orders up to
three in the point O. These signals are captured as follows:
5 Lower
sensitivity translates to higher sensitivity to noise or lower signal-to-noise ratio.
Measuring a horizontal sound field with gradient microphones
71
Figure 4.2: Third-order horizontal sound field microphone configuration with 11
pressure microphone capsules.
• Fourier coefficients A0 (ω), A1 (ω), A2 (ω) and B2 (ω) can be captured in the same
way as described for the second-order sound field microphone.
• The first-order Fourier coefficient B1 (ω) is captured more simply by combining
signals from microphones 6 and 7, in the same way as the signal A1 (ω) in (4.27):
B1 (ω) = (M7 (ω) − M6 (ω)) H1 (ω) .
(4.32)
• The third-order Fourier coefficients A3 (ω) and B3 (ω) are obtained using the
identities
cos 3θ = 4(cos θ)3 − 3 cos θ
3
sin 3θ = 4 cos θ + π2
+ 3 sin θ ,
(4.33)
which relate the circular harmonics cos 3θ and sin 3θ, and directivity patterns of
first-order (eg1,1 (θ) = cos θ and eg1,2 (θ) = sin θ) and third-order (eg3,1 (θ) =
(cos θ)3 and eg3,2 (θ) = (cos (θ + π2 ))3 ) gradient microphones. The response
eg3,1 (θ) is obtained by combining microphones 2, 4, 8 and 10, and the response
eg3,2 (θ) by combining microphones 6, 7, 9 and 11; the ways to obtain responses
eg1,1 (θ) and eg1,2 (θ) are given in (4.27) and (4.32). The desired signals are then
given by
A3 (ω) = 4 (M8 (ω) − 3M2 (ω) + 3M4 (ω) − M10 (ω)) H3 (ω)
− 3 (M2 (ω) − M4 (ω)) H1 (ω)
B3 (ω) = 4 (M9 (ω) − 3M6 (ω) + 3M7 (ω) − M11 (ω)) H3 (ω)
+ 3 (M7 (ω) − M6 (ω)) H1 (ω) .
(4.34)
72
Microphone Arrays For Sound Field Capture
where H3 (ω) is an equalization filter for correcting the high-pass frequency
characteristic of a third-order gradient microphone array with inter-microphone
spacing of d.
(c) Second-order horizontal sound field microphone array with omnidirectional and bidirectional microphones
Figure 4.3: Second-order horizontal sound field microphone array consisting of one
omnidirectional and four bidirectional microphone elements.
Figure 4.3 shows a configuration which is similar to the one shown in Figure 4.1,
but which uses one pressure microphone capsule in the center O and four bidirectional
(with a figure-of-eight directional response) capsules around the center. Capturing
the Fourier series coefficients An (ω) and Bn (ω) of orders up to two is done in a way
similar to the previous horizontal sound field microphones:
• The pressure signal A0 (ω), or the zero-order harmonic, is taken from microphone
1,
A0 (ω) = M1 (ω) .
(4.35)
• The first-order Fourier coefficient A1 (ω) is obtained by combining signals from
microphones 2 and 4 as follows:
A1 (ω) = (M2 (ω) + M4 (ω)) G1 (ω) ,
(4.36)
where G1 (ω) is an equalization filter for correcting the frequency characteristic
of the first-order gradient approximation obtained by averaging responses of two
bidirectional microphones spaced at the distance 2d.
• The first-order Fourier coefficient B1 (ω) is obtained by combining signals from
microphones 2, 3, 4 and 5. Combining signals from microphones 3 and 5 in the
way given in (4.36) gives the signal
C1 (ω) = (M5 (ω) + M3 (ω)) G1 (ω) ,
(4.37)
whose directivity pattern is of the form ec1 (θ) = cos (θ − π4 ). Furthermore,
signals A1 (ω) and C1 (ω) are combined in the following way:
√
B1 (ω) = 2 C1 (ω) − A1 (ω) .
(4.38)
4.4 Circular microphone arrays
73
• The second-order Fourier coefficients A2 (ω) and B2 (ω) are obtained using the
identities (4.30). The response e0 (θ) is obtained from microphone 1, the response eg2,1 (θ) by combining microphones 2 and 4, and the response eg2,2 (θ) by
combining microphones 5 and 3. The desired signals are given by
A2 (ω) = 2 (M2 (ω) − M4 (ω)) H2 (ω) − M1 (ω)
B2 (ω) = 2 (M5 (ω) − M3 (ω)) H2 (ω) − M1 (ω) ,
(4.39)
where G2 (ω) is an equalization filter used for correcing the frequency characteristic of a second-order gradient microphone array built from two bidirectional
microphones spaced at the distance 2d.
The second-order configuration from Figure 4.3 provides better signal-to-noise
ratio than the one shown in Figure 4.1, even though they both use the same number
of microphone capsules.
4.4
Circular microphone arrays
In Section 4.2, we briefly described how continuous circular microphone aperture of
radius a can be used for capturing a sufficient representation of a horizontal sound
field, i.e., the coefficients Cn (ω) of the helical wave spectrum (4.13). In practice,
obtaining a continuous distribution of sound pressure on a circle is not possible, and
a finite number of microphones is used instead.
As mentioned earlier, microphones are modeled as devices for directionally capturing sound waves in a single point. Thus, a circular array of microphones models the
sampling of a continuous circular aperture. Additionally, if the used microphones are
directional, than the circular microphone array models the sampling of a continuous
circular aperture of the same directional response.6
The sampling operation is usually associated with band-limited signals, or the
signals which can be made band-limited by analog filtering prior to sampling. The
sound field of a single plane wave, both in free-field (4.4) and in the presence of a
rigid cylinder (2.61), contains an infinite number of circular harmonics. Hence, it can
not be considered band-limited in the angular domain, and even more—there are no
acoustical filters that would make it such.7
When a signal that is not band-limited needs to be sampled, one can talk about the
effective angular bandwidth. The following definition formulates the effective bandwidth of periodic functions that is used in the rest of this section.
Definition 1. A 2π-periodic function f (φ) ∈ L2 ([−π, π]), with the Fourier series
expansion
∞
X
f (φ) =
Fn einφ
(4.40)
n=−∞
6 It should be stressed that this only holds if all the microphones have the same directional response
and they point outward in the radial direction.
7 Strictly speaking, one can assume that some averaging takes place across the membrane of a
practical microphone, which effectively serves as a low-pass filter. However, there is no good model
for this effect and it can not be used as a reliable way of band-limiting a sound field in the angular
domain.
Microphone Arrays For Sound Field Capture
74
has -effective angular bandwidth Neff if and only if
P∞
2
2
l=Neff +1 (|F−l | + |Fl | )
P∞
2
m=−∞ |Fm |
≤ .
(4.41)
As an example, assume that a horizontal sound field has a −20dB-effective angular
bandwidth Neff on a circle of radius r. When this sound field is uniformly sampled
on the given circle with at least 2Neff samples, the aliasing error is below −20dB.
For deciding on a sampling scheme, or the number of microphones in the present
context, one needs to decide on the fidelity of sound field acquisition and the amount
of aliasing error incurred by sampling. In this case, fidelity denotes the number of
circular harmonics of a horizontal sound field.
4.4.1
Continuous circular microphone apertures
Before we present three different circular microphone array types, we show a more
general notion of a continuous circular microphone aperture. In particular, we show
three different types of circular microphone apertures:
• Unbaffled omnidirectional circular microphone aperture, where any point along
the aperture acts as an omnidirectional pressure microphone.
• Unbaffled first-order circular microphone aperture, whose points act as firstorder microphones facing radially outward.
• Baffled omnidirectional circular microphone aperture, which is mounted on the
surface of an infinite rigid cylindrical baffle. Any point along the aperture acts
as an omnidirectional pressure microphone.
(a) Unbaffled omnidirectional circular microphone aperture
Figure 4.4: Unbaffled continuous circular microphone aperture
Let a continuous omnidirectional circular microphone aperture of radius a be centered at the origin, as shown in Figure 4.4. Furthermore, let a horizontal sound field
be composed of a single plane wave with unit magnitude and frequency ω, coming
from the direction defined by azimuth ϕ,
P (r, φ, ω) = e−ikr r cos(ϕ−φ) ,
with kr =
ω
c.
(4.42)
4.4 Circular microphone arrays
75
The sound pressure along the aperture is given by the Jacobi-Anger expansion (4.4),
and has the form
P (a, φ, ω) =
∞
X
(−i)n Jn (kr a) ein(ϕ−φ) .
(4.43)
n=−∞
The sound pressure field on the aperture is composed of countably many circular
harmonics ein(ϕ−φ) . The circular harmonic coefficients are given by the horizontal
helical wave spectrum
Pn (a, ω) = (−i)n Jn (kr a) .
(4.44)
Figure 4.5 shows the dependence of circular harmonic coefficients of different orders
on the value of kr a.
| P n (k r a )| [ dB]
0
−20
−40
n=0
n=1
n=2
n=3
n=4
−60
−80
−100
−1
10
0
1
10
10
k ra
Figure 4.5: Magnitude of the circular harmonic coefficients Pn (a, ω) of a single plane
wave along a circle of radius a.
From Figure 4.5, it is apparent that the sound pressure field exhibits more rapid
angular changes along larger apertures and at higher frequencies. In other words, loworder harmonics dominate at low frequencies (for small values of the product kr a),
while higher-order harmonics reach prominence as the frequency increases.
At low frequencies, the gain of each circular harmonic coefficient Pn (a, ω) grows at
the rate of 6n dB/oct. Additionally, the high-frequency behavior of circular harmonic
coefficients is described by the large-argument behavior of Bessel functions, given
by (Abramowitz and Stegun, 1976)
r
2
nπ π Jn (x) ∼
cos x −
−
.
(4.45)
πx
2
4
Hence, at high frequencies, the magnitude of circular harmonic coefficients exhibit
ripples, whose envelope decays as ∼ √k1 a .
r
It is also of interest to analyze the effective angular bandwidth of the horizontal
helical wave spectrum at different frequencies. Figure 4.6 shows the −20dB-effective
angular bandwidth8 of the sound pressure field (4.42) for different values kr a. From
8 Note
that the effective angular bandwidth Neff is integer-valued.
Microphone Arrays For Sound Field Capture
76
30
25
N e ff
20
15
10
5
0
5
10
15
k ra
20
25
30
Figure 4.6: −20dB-effective angular bandwidth of a plane-wave sound field on a
circle of radius a centered at the origin.
Figure 4.6, it can be observed that the effective angular bandwidth increases linearly
with frequency. At low frequencies, most of the power is contained within the first
few circular harmonics, and as the frequency or the circle radius increase, one needs
more circular harmonic components to be able to represent the captured sound field
faithfully.
(b) Unbaffled first-order circular microphone aperture
Consider the same plane-wave sound field given by (4.42). Let a circular microphone
aperture of radius a and centered at the origin have a first-order, cardioid-type directional response. It implies that an infinitesimal element of the aperture, located at
(a, φ), has a directional response d(ϕ) = α + (1 − α) cos(ϕ − φ), and that the signal
along the aperture is given by
= α + (1 − α) cos(ϕ − φ) e−ikr a cos(ϕ−φ)
∂
e−ikr a cos(ϕ−φ) .
=
α − (1 − α) i
∂(kr a)
K(a, φ, ω)
(4.46)
Substituting (4.4) in (4.46) gives
K(a, φ, ω) =
∞
X
(−i)n [α Jn (kr a) + (1 − α) i Jn0 (kr a)] ein(ϕ−φ) .
(4.47)
n=−∞
As in the case of omnidirectional circular microphone aperture, the signal captured
along the first-order one is composed of infinitely many circular harmonics. Figure 4.7
shows the gains of circular harmonic coefficients Kn (a, ω) = (−i)n (α Jn (kr a) + (1 −
α) i Jn0 (kr a)).
From Figure 4.7, it can be observed that the circular harmonic coefficients Kn (a, ω)
do not have ripples at high frequencies. This circumvents the ill-conditioning of the
equalization needed for obtaining the cylindrical wave expansion coefficients Cn (ω).
Additionally, both zero- and first-order circular harmonics have constant low-frequency
4.4 Circular microphone arrays
77
0
P n (k r a ) [ dB]
−20
−40
n=0
n=1
n=2
n=3
n=4
−60
−80
−100
−1
10
0
1
10
10
k ra
Figure 4.7: Magnitude of the circular harmonic coefficients Kn (a, ω) of a single
plane wave captured with a cardioid circular microphone aperture of radius a and
parameter α = 0.5.
gains, avoiding the noise sensitivity when extracting these two circular harmonics at
any frequency.
The −20dB-effective angular bandwidth is very similar to that of the omnidirectional circular microphone aperture.
(c) Baffled omnidirectional circular microphone aperture
Figure 4.8: Continuous circular microphone aperture mounted on a rigid cylindrical
baffle
Sound scattering from rigid cylinders was presented in Section 2.5.5. Recall that
scattering was analyzed for a single-plane-wave radiation with kz = 0, which matches
the description of a horizontal sound field. If the plane wave arrives from angle ϕ,
the sound field in the presence of a rigid cylindrical scatterer of radius a is given by
Microphone Arrays For Sound Field Capture
78
(2.61). It then follows that the sound pressure field observed on an omnidirectional
circular microphone aperture mounted on the surface of an infinite rigid cylinder of
radius a, shown in Figure 4.8, is given by9
∞
X
J 0 (ka)
P (a, θ, φ, ω) =
(−i)n Jn (ka) − n0
Hn (ka) ein(ϕ−φ) .
(4.48)
Hn (ka)
n=−∞
Compared to the free-field case (4.44), the circular harmonic coefficients
J 0 (ka)
Pn (a, ω) = (−i)n Jn (ka) − Hn0 (ka) Hn (ka)
n
0
Jn
(ka) Hn (ka)
0 (ka)
Hn
, which denotes the contribution of the scatcontain an additional term
tered sound field. Figure 4.9 illustrates the magnitude of circular harmonic coefficients
Pn (a, ω) for different values of the wave-number-radius product kr a.
0
K n (k r a ) [ dB]
−20
−40
n=0
n=1
n=2
n=3
n=4
−60
−80
−100
−1
10
0
1
10
10
k ra
Figure 4.9: Magnitude of the circular harmonic coefficients Pn (a, ω) of a single plane
wave on the surface of a rigid cylindrical baffle.
From Figure 4.9, it is apparent that the circular harmonic coefficients do not oscillate at high frequencies, but roughly follow the Bessel function envelope ∼ √1r .
As in the case of a cardioid aperture, these functions circumvent the ill-conditioning
associated with equalizing Bessel functions when extracting the cylindrical wave expansion coefficients Cn (ω). This makes baffled circular apertures more convenient for
capturing a horizontal sound field.
Figure 4.10 shows the difference between the −20 dB-effective angular bandwidths
of baffled and unbaffled circular omnidirectional microphone apertures,
baffled
unbaffled
∆Neff (kr a) = Neff
(kr a) − Neff
(kr a) .
From Figure 4.10, it can be seen that the effective angular bandwidth of the sound
pressure field along the baffled aperture is slightly increased at all frequencies. It also
supports the argument that a baffle increases the effective radius of the microphone
aperture (Teutsch and Kellermann, 2006).
9 Note the sign change in the term in , which is a consequence of changing the direction of wave
vector k.
4.4 Circular microphone arrays
79
2
∆N e ff
1.5
1
0.5
0
5
10
15
k ra
20
25
30
Figure 4.10: The difference ∆Neff (kr a) between the −20dB-effective angular bandwidth of a plane-wave sound field on a circular omnidirectional microphone aperture
of radius a with and without a rigid cylindrical baffle.
4.4.2
Sampling circular microphone apertures
The three previously described continuous circular microphone apertures serve as a
basis for analyzing the following three types of microphone arrays:
• Circular omnidirectional microphone arrays in free field.
• Circular cardioid microphone arrays in free field.
• Circular omnidirectional microphone arrays mounted on a rigid cylindrical baffle.
In essence, these microphone arrays are sampled versions of the corresponding continuous circular microphone apertures.
Let a continuous circular microphone aperture of any presented type have radius
a and be centered at the origin. Generally put, the microphone aperture captures
the horizontal sound field P (a, φ, ω) in a direction-dependent fashion. Denote by
X(a, φ, ω) the signal observed along the aperture.10
Let a microphone array of N uniformly-spaced microphones sample the circular
aperture. The sampling operation can be modeled as multiplication between the
signal X(a, φ, ω) observed along the aperture, and the sampling function (Poletti,
2000)
∞
∞
X
2π X
2π
δ φ−
m =
eimN φ .
(4.49)
∆N (φ) =
N m=−∞
N
m=∞
10 In
the three analyzed cases, X(a, φ, ω) can take the form (4.42), (4.46), or (4.48).
Microphone Arrays For Sound Field Capture
80
The circular harmonic coefficients Pns (a, ω) obtained by a sampled circular aperture are then given by
Z 2π
1
s
Xn (a, ω) =
X(a, φ, ω) ∆N (φ) e−inφ dφ
2π 0
Z 2π
∞
2π X
1
2π
X(a, φ, ω)
=
δ φ−
m e−inφ dφ
2π 0
N m=−∞
N
N −1
2mnπ
2mπ
1 X
X a,
(4.50)
, ω e−i N .
=
N m=0
N
The expression for the nth sampled circular harmonic coefficient Xns (a, ω) corresponds
to taking the Discrete Fourier Transform (DFT) of the uniformly-spaced samples of
the sound pressure field around a circle of radius a. The circular microphone arrays
are sometimes referred to as DFT arrays (Poletti, 2000).
Equivalently, one can use the leftmost sum in (4.49) to obtain the following expression for the sampled circular harmonic coefficient Xns (r, ω):
Xns (r, ω)
=
=
1
2π
Z
2π
X(r, φ, ω)
0
∞
X
∞
X
eimN φ e−inφ dφ
m=−∞
Xn−mN (r, ω) .
(4.51)
m=−∞
In (4.51), apart from the desired circular harmonic coefficient Xn (r, ω) obtained for
m = 0, the sampled circular harmonic coefficient Xns (r, ω) contains the aliasing terms
Xn−mN (r, ω). Thus, in order to keep the power of the aliasing terms low, one needs to
choose the number of microphones large enough or decrease the radius of the aperture.
There is, however, a caveat to the latter: the smaller the aperture is, the weaker is the
magnitude of circular harmonic coefficients, and the more susceptible to noise they
are.
A DFT microphone array can measure a finite number of circular harmonic coefficients, given by
N
N = 2l
2 −1
M=
(4.52)
N −1
N = 2l + 1
2
4.5
Spherical microphone arrays
Section 4.2 presented a 3D sound field analysis framework based on the interior boundary value problem in spherical coordinates. It was briefly mentioned how continuous
spherical microphone apertures could be used for capturing a complete representation
of a 3D sound field of far-field sound sources.
Similarly to Section 4.4 on circular microphone arrays, this section gives an analysis of a 3D sound field capture with continuous spherical apertures, including free-field
and baffled omnidirectional apertures, and a free-field cardioid aperture. The spherical microphone arrays can be seen as setups used for sampling the corresponding
continuous spherical apertures.
4.5 Spherical microphone arrays
81
Before moving onto the analysis of spherical microphone apertures, we introduce
a notion of effective bandwidth in the spherical harmonic domain. Similarly to the
case of a horizontal sound field, the motivation for defining the effective bandwidth
follows from the fact that a far-field source has countably many spherical harmonic
components, as seen in (2.81). However, for a limited range of frequencies, it turns
out that most of the sound fields’ power is contained in a finite—even only a few—
spherical harmonics, making it nearly band-limited in the spherical harmonic domain.
Definition 2. A function f (θ, φ) on a unit sphere, with the spherical harmonic expansion
f (θ, φ) =
∞ X
n
X
Fmn Ynm (θ, φ) ,
(4.53)
n=0 m=−n
has -effective spherical harmonic bandwidth of Neff if and only if
P∞
Pl
2
l=Neff +1
m=−l |Fml |
P∞ Pl
2
l=0
m=−l |Fml |
≤ .
(4.54)
Similarly to the effective bandwidth on a circle, one can associate the effective
bandwidth on a sphere with the power of the aliasing error introduced by sampling.
In other words, if a function f (θ, φ) has -effective spherical harmonic bandwidth of
Neff , than the power of the error due to aliasing when uniformly sampling the unit
sphere with at least M = (Neff + 1)2 samples11 is bounded by . The decision on
the number of sampling point should be guided by the number of spherical harmonics
one seeks to capture, and the tolerable amount of aliasing error.
4.5.1
Continuous spherical microphone apertures
Similarly to Section 4.4, this section focuses on three types of spherical microphone
arrays, namely with omnidirectional and cardioid microphones in free field, and with
omnidirectional microphones mounted on a rigid sphere. For analyzing these microphone arrays, it is instructive to consider continuous generalizations, i.e., continuous
spherical microphone apertures:
• Unbaffled omnidirectional spherical microphone aperture, where any point on
the aperture acts as an omnidirectional pressure microphone.
• Unbaffled first-order spherical microphone aperture, whose points act as firstorder microphones facing radially outward.
• Baffled omnidirectional spherical microphone aperture, which is mounted around
the surface of a perfectly rigid spherical baffle. Any point on the aperture acts
as an omnidirectional pressure microphone.
Microphone Arrays For Sound Field Capture
82
Figure 4.11: Continuous spherical microphone aperture
(a) Unbaffled omnidirectional spherical microphone aperture
Let a continuous omnidirectional spherical microphone aperture of radius a be centered at the origin, as shown in Figure 4.11. Furthermore, let the captured sound field
be composed of a single plane wave with unit magnitude and frequency ω, coming
from the direction defined by (ϑ, ϕ). The sound pressure field is given by
P (r, θ, φ, ω) = e−ikr r(sin θ sin ϑ cos(ϕ−φ)+cos θ cos ϑ) ,
(4.55)
with kr = ωc .
We have seen in Section 2.6 that the sound pressure on the spherical aperture
admits the circular harmonic expansion (2.81) of the form
P (a, θ, φ, ω) = 4π
∞
X
n=0
(−i)n jn (kr a)
n
X
Ynm (θ, φ) Ynm (ϑ, ϕ)∗ .
(4.56)
m=−n
In words, the plane-wave sound pressure field on a sphere is composed of countably
many spherical harmonics Ynm (θ, φ), whose strength is given by the spherical harmonic
coefficients
Cmn (a, ω) = 4π (−i)n jn (kr a) Ynm (ϑ, ϕ)∗ .
(4.57)
In order to analyze the effective bandwidth on a sphere of a sound field due to
a single plane wave, one can use the angular power spectrum. The angular power
spectrum Sn (kr a) quantifies the aggregate power of circular harmonics of degree n,
and is given by
n
1 X
Sn (kr a) =
|Cmn (a, ω)|2 .
(4.58)
4π m=−n
11 It is well known that uniformly sampling a sphere is possible only with specific topologies defined
by the so-called platonic solids. However, there are schemes which come very close to uniform
sampling for arbitrary number of points (Sloane et al.).
4.5 Spherical microphone arrays
83
Using Unsöld’s theorem (Unsöld, 1927; Arfken et al., 1985)
n
X
Ynm (θ, φ)Ynm (θ, φ)∗ =
m=−n
2n + 1
4π
(4.59)
in (4.58), the angular power spectrum evaluates to
Sn (kr a) = (2n + 1) jn (kr a)2 .
(4.60)
Figure 4.12 shows the angular power spectrum for different values kr a.
| S n (k r a )| [ dB]
0
−20
−40
n=0
n=1
n=2
n=3
n=4
−60
−80
−100
−1
10
0
1
10
10
k ra
Figure 4.12: Angular power spectrum Sn (kr a) of a plane wave sound field on a
sphere of radius a at different frequencies.
From Figure 4.12, it is apparent that the sound pressure field exhibits more rapid
changes along larger spherical apertures and at higher frequencies. In other words,
low-order harmonics dominate at low frequencies (for small values of the product kr a),
while higher-order spherical harmonics become equally powerful as the frequency increases. In addition, the low-frequency gain of each angular power spectral coefficient
Sn (a, ω) grows at the rate of 6n dB/oct.
Figure 4.13 shows the −20dB-effective spherical harmonic bandwidth of the sound
pressure field (4.42) for different values kr a. From Figure 4.13, it can be observed
that the effective angular bandwidth increases linearly with frequency, similarly to
circular microphone apertures. Furthermore, at low frequencies most of the power
is contained within the first few spherical harmonics, and the number of important
spherical harmonics increases as the frequency increases.
(b) Unbaffled first-order spherical microphone aperture
Consider the same plane-wave sound field given by (4.55). Let now a free-field spherical aperture of radius a and centered at the origin have a first-order, cardioid-type
directional response. It implies that an infinitesimal element of the aperture, located
at (a, θ, φ), has a directional response
d(ϑ, ϕ) = α + (1 − α)(sin ϑ sin θ cos(ϕ − φ) + cos ϑ cos θ) ,
(4.61)
Microphone Arrays For Sound Field Capture
84
40
N e ff
30
20
10
0
5
10
15
20
k ra
25
30
35
40
Figure 4.13: −20dB-effective spherical harmonic bandwidth of a plane-wave sound
field on a sphere of radius a centered at the origin.
and similarly to the first-order circular aperture, the signal captured on the aperture
is given by
∂
K(a, θ, φ, ω) = α + (1 − α) i
e−ikr a(sin ϑ sin θ cos(ϕ−φ)+cos ϑ cos θ) . (4.62)
∂(kr a)
Using the spherical harmonic expansion of a plane wave (4.56) in (4.62) gives
K(a, θ, φ, ω) = 4π
∞
X
(−i)n [α jn (kr a) + (1 − α) i jn0 (kr a)]
n=0
n
X
Ynm (θ, φ) Ynm (ϑ, ϕ)∗ .
m=−n
(4.63)
As in the case of an omnidirectional spherical microphone aperture, the signal
captured on the first-order one is composed of infinitely many spherical harmonics.
The angular power spectrum (4.58), given by
2
Sn (kr a) = 4(2n + 1) |α jn (kr a) + (1 − α) i jn0 (kr a)| ,
is shown in Figure 4.14.
From Figure 4.7, it can be observed that the angular power spectral coefficients
Sn (kr a) do not exhibit high-frequency ripples. This circumvents the ill-conditioning
problems when extracting spherical harmonic components of a wave field, which is
characteristic of unbaffled omnidirectional spherical microphone apertures. Additionally, both the zero- and first-degree spherical harmonics have constant low-frequency
gains, thus avoiding noise sensitivity when extracting these spherical harmonics.
The −20dB-effective bandwidth is very similar to that of the omnidirectional
spherical microphone aperture.
(c) Baffled omnidirectional spherical microphone aperture
Sound scattering from rigid spheres was analyzed in Section 2.6.3, and the analysis
was given for an incoming sound field composed of a single plane wave arriving from
direction (ϑ, ϕ).
4.5 Spherical microphone arrays
85
0
S n (k r a ) [ dB]
−20
−40
n=0
n=1
n=2
n=3
n=4
−60
−80
−100
−1
10
0
1
10
10
k ra
Figure 4.14: Angular power spectrum Sn (kr a) of a plane wave sound field captured
by a cardioid spherical microphone aperture of radius a and parameter α = 0.5 at
different frequencies.
The sound field around a rigid spherical scatterer of radius a is given by (2.85).
Thus, if an omnidirectional spherical microphone aperture is mounted on the surface
of a rigid sphere, the sound pressure observed on the aperture is given by
∞
X
X
n
jn0 (kr a)
P (a, θ, φ, ω) = 4π
(−i) jn (kr a) − 0
hn (kr a)
Ynm (θ, φ) Ynm (ϑ, ϕ)∗ .
h
(k
a)
r
n
m=−n
n=0
(4.64)
Compared to the case of unbaffled spherical microphone aperture, the spherical harmonic coefficients of the sound pressure field on the baffled omnidirectional aperture,
given by
j 0 (k a)
Cmn (a, ω) = 4π (−i)n jn (kr a) − hn0 (krr a) hn (kr a) Ynm (ϑ, ϕ)∗ ,
n
n
j 0 (k a) h (k a)
contain an additional term n hr0 (krna) r , which denotes the contribution of the scatn
tered sound field.
The angular power spectrum (4.58) of the sound filed resulting from scattering
from a rigid spherical baffle is given by
Sn (kr a) = (2n + 1) |jn (kr a) −
0
jn
(kr a)
h0n (kr a)
hn (kr a)|2 ,
and it is shown in Figure 4.15.
From Figure 4.15, it is apparent that the angular power spectral coefficients
Sn (kr a) do not oscillate at high frequencies. Viewed as filters, these functions have
better behaved inverses, making baffled apertures more convenient for capturing the
spherical harmonics of a 3D sound field.
Figure 4.16 shows the difference between the −20 dB-effective spherical harmonic
bandwidths of baffled and unbaffled spherical omnidirectional microphone apertures,
baffled
unbaffled
∆Neff (kr a) = Neff
(kr a) − Neff
(kr a) .
Microphone Arrays For Sound Field Capture
86
0
S n (k r a ) [ dB]
−20
−40
n=0
n=1
n=2
n=3
n=4
−60
−80
−100
−1
10
0
1
10
10
k ra
Figure 4.15: Angular power spectrum Sn (kr a) of a plane wave sound field on a rigid
spherical baffle of radius a at different frequencies.
2
∆N e ff
1.5
1
0.5
0
5
10
15
20
k ra
25
30
35
40
Figure 4.16: The difference ∆Neff (kr a) between the −20dB-effective spherical harmonic bandwidth of a plane-wave sound field on a spherical omnidirectional microphone aperture of radius a with and without a rigid spherical baffle.
From Figure 4.16, it can be seen that the effective spherical harmonic bandwidth of
the sound pressure field on the baffled aperture is slightly increased at all frequencies.
This fact also follows from the known observation that a spherical baffle provides a
virtual enlargement of the microphone aperture.
4.5.2
Sampling spherical microphone apertures
The analysis of the effects of sampling on a sphere, characteristic of spherical microphone arrays, are much more involved than in the case of uniform sampling on a circle
used with circular apertures. It is partly due to the non-existence of general uniform
sphere sampling strategies, other than the five special cases of the so-called platonic
solids.
There is a number of different sphere sampling schemes, and they differ by the
number of sampling points and implementation complexity. Here we only briefly
4.5 Spherical microphone arrays
87
present three sampling schemes, known under the names of equiangle sampling, Gaussian sampling, and spherical t-design sampling (Rafaely, 2005).
All the presented sampling strategies assume that the sampled function is bandlimited in the spherical harmonic domain. Denoting by N the spherical harmonic
bandwidth of a function f (θ, φ), this implies that Fmn = 0 for n > N .
(a) Equiangle sampling
The equiangle sampling, described in (Driscoll and Healy, 1994), requires M = 4(N +
1)2 samples on a sphere. It is thus to within a constant factor from the theoretical
minimum of (N + 1)2 . Both polar angles, θ and φ, are sampled uniformly in 2N + 1
points:
θj
=
φl
=
jπ
,
2N + 1
2lπ
,
2N + 1
j = 0, · · · , 2N + 1
l = 0, · · · , 2N + 1 .
(4.65)
The discrete approximation of the spherical harmonic transform with equiangle
sampling takes the form
Fmn =
2N
+1 2N
+1
X
X
j=0
αj f (θj , φl ) Ynm (θj , φl )∗ ,
(4.66)
l=0
where the weights αj are dependent on the θ-coordinate of a sampling point, and the
values of αj are provided in (Driscoll and Healy, 1994).
The downside of this sampling scheme is the excess number of required samples.
However, it gives rise to efficient processing algorithms (Driscoll and Healy, 1994).
(b) Gaussian sampling
Gaussian sampling is slightly more efficient—but still suboptimal—and it requires
M = 2(N + 1)2 samples. The angle φ is sampled uniformly in 2(N + 1) points, while
the angle θ is sampled non-uniformly in N + 1 points, which are extracted from the
zeros of the Legendre polynomial PN +1 (cos θ). More precisely, if {ν0 , · · · , νN } are the
zeros of the Legendre polynomial PN +1 (x), the Gaussian sampling points are defined
by
θj
φl
=
cos−1 νj , j = 0, · · · , N
lπ
=
, l = 0, · · · , 2N .
N
(4.67)
The approximation of the spherical harmonic transform for Gaussian sampling
takes the form
N 2N
+1
X
X
Fmn =
αj f (θj , φl ) Ynm (θj , φl )∗ ,
(4.68)
j=0 l=0
where the weights αj are dependent on the θ coordinate of a sampling point, and they
can be taken from tables given in (Arfken et al., 1985).
Microphone Arrays For Sound Field Capture
88
(c) Spherical t-design sampling
Perfectly uniform sampling of a sphere is possible only for M = {4, 6, 8, 12, 20}. It was
already mentioned that for sampling a function with spherical harmonic bandwidth
of N , one needs at least M = (N + 1)2 samples.
Hardin and Sloane (1996) used a notion of a spherical t-design, which denotes a
set of M points {(θ1 , φ1 ), · · · , (θM , φM )} on a unit sphere such that the identity
Z
π
Z
0
2π
f (cos θ, φ) sin θ dφ dθ =
0
M
1 X
f (θl , φl ) ,
M
(4.69)
l=1
holds for any polynomial f of degree not greater than t. Using numerical optimization,
they were able to obtain sets of points which satisfy (4.69) with high accuracy for
different M , and those are tabulated in (Sloane et al.).
For functions of spherical harmonic bandwidth of N , spherical t-design sampling
schemes with ∼ 1.5(N + 1)2 points provide a good approximation of the sampling
condition (Rafaely, 2005)
M
X
0
αj Ynm0 (θj , φj ) Ynm (θj , φj )∗ = δn−n0 δm−m0 .
(4.70)
j=1
This is still not close to the theoretical optimum, but in terms of sampling efficiency,
it is the best of the three techniques mentioned here.
4.6
Sound field microphones as acoustic beamformers
In the preceding sections, it was shown how microphone arrays are designed starting from an analytical solution of the wave equation for different geometries. Each
microphone array design used microphone elements with perfect omnidirectional, bidirectional, or cardioid directional responses at all frequencies.
Baffled circular and spherical microphones have some commonalities. On the one
hand, a baffle provides better conditioning of the circular and spherical harmonics
by eliminating the ripples at high frequencies, and provides a virtual aperture enlargement by increasing the low-frequency gain of the said harmonic components. On
the other hand, a baffle acts as a shadow, and more so as the frequency of the incoming sound waves increases. The baffle-microphone combination can thus be seen
as a microphone with a frequency-dependent directional response, whose directivity
increases with frequency. As an example, Figure 4.17 shows directional responses of
a baffled pressure microphone at different frequencies.
In previous sections, the way baffled microphone array design was considered is
best described as a sampling problem. The function being sampled on a circle or
sphere is a sound field whose properties are changed by the presence of a baffle.
However, one could equally view the problem of a baffled microphone array design
as that of achieving the same design goals with a set of microphones with frequencydependent directional responses. Even more, this perspective can in many cases be
4.6 Sound field microphones as acoustic beamformers
(a)
89
(b)
0dB
−7.5dB
−15dB
−22.5dB
−30dB
0dB
−7.5dB
−15dB
−22.5dB
−30dB
f=150Hz
f=1500Hz
f=15000Hz
Figure 4.17: Directional response of a microphone mounted on (a) cylindrical and
(b) spherical baffle at different frequencies.
more practical, as one never has ideal pressure or directional microphones, and even
less a perfectly rigid and ideally shaped rigid object to serve as a baffle.
In this section, we focus on the latter perspective, and show a microphone array
design approach that relies on fewer assumptions about the used microphones or
enclosures. The design relies on obtaining responses of the array elements on a set
of points (possibly far-field) enclosing the array, which can be a circle or sphere,
depending on the array geometry and the spatial characteristic of the captured sound
field. These responses can be obtained using a theoretical model, some of which have
already been shown, or through anechoic measurements.
We show two examples of designing microphone array filters by optimally synthesizing a given directional response on a set of control points. The first example
involves designing microphone array filters for a circular microphone array mounted
on a perfectly rigid infinite cylindrical baffle, and is optimized for directional acquisition of a horizontal sound field. The second example presents a way to design the
so-called Soundfield microphone non-coincidence compensation filters.
There are a number of reasons to use array techniques for designing microphone
arrays. The theory presented in Sections 4.4 and 4.5 gives closed form solutions in
some very specific, idealized cases, but when one builds a microphone array, it is not
necessarily well modelled with any of them. In other words, one can usually have
only a general idea about the microphone array and the way it interacts with a sound
field, such as for instance an idea about the effective circular or spherical harmonic
bandwidth of the measured sound field at different frequencies.
Additionally, even when there is a good model of the system, an imperfect manufacturing process can render the model less accurate.
Therefore, array design which takes into account the knowledge about the effective circular bandwidth of the measured sound field, together with a set of acoustic
channel responses from control points to microphones,12 is more flexible and performs
comparably or better to the model-based theoretical designs described earlier in this
chapter.
12 The responses are preferably obtained through anechoic measurements in order to circumvent
model inaccuracies.
90
4.6.1
Microphone Arrays For Sound Field Capture
Filter design for a circular microphone array mounted
on a rigid cylinder
Consider an array of M = 7 pressure microphones assembled in a uniform circular
topology at the surface of an infinite, perfectly rigid cylindrical baffle. The microphone
array is designed to capture and analyze a horizontal sound field, and thus needs to
be able to extract its circular harmonic components.
As a rule of thumb (e.g., see Van Trees, 2002), which can also be verified in
Figure 4.6, the effective bandwidth of a circular aperture is approximately given by
Neff (r, ω) ≈
ω
r.
c
(4.71)
This implies that the given microphone array can effectively capture circular harmonic
components of the third and lower orders up to frequency fmax ≈ 3.3 kHz.
Considering any of the expressions for a sound field on a circular aperture,13 one
can notice that the function is symmetric with respect to angles (ϑ, ϕ) and (θ, φ)
of plane wave incidence and analysis point, respectively. Taking a different perspective and focusing on microphones’ directional response, where the variable angles are
(ϑ, ϕ), the function stays the same and all the properties of a sound field on a circular aperture, such as the effective angular bandwidth, are unchanged. Therefore,
the sampling of microphone directional responses, or the decision on the number of
control points, needs to take the same effective bandwidth Neff into account. In light
of the fact that some aliasing is inevitable, the robustness of the microphone array
filter design is improved by oversampling. Hence, in this example we use N = 13
uniformly spaced directions in the angular interval [0, 2π].
Figure 4.18 shows the directional responses corresponding to circular harmonics
cos(nφ) of orders n = 0, 1, 2, 3, obtained by solving the following array optimization
problem:
minimize
k |G(ω) H(ω)| − |D| k2
(4.72)
subject to
|H(ω)| Hmax 1M ×1 ,
with Hmax = 20 dB. In (4.72), G(ω) is an N × M matrix for microphones’ directional
responses in the control directions, H(ω) is the vector with microphone post-filters,
and D is the desired directional response. Note that (4.72) is solved using Algorithm 2.1.
Apart from the highest-order harmonic components at low frequencies, the synthesized directional responses correspond well to the desired responses. At low frequencies, the highest-order harmonics cannot be well approximated due to the maximumgain constraints; had these constraints been less stringent, the approximation would
have been better.
Table 4.1 shows relative magnitude errors of directional responses obtained by
optimization and with a DFT circular array. From Table 4.1, one can observe that the
error between the directional response obtained through optimization is comparable
to the error obtained when using a theoretical (i.e., optimal) DFT circular microphone
array. Thus, in addition to being more flexible, the optimization-based microphone
array filter synthesis achieves high accuracy.
13 The
same can be said for a spherical aperture.
4.6 Sound field microphones as acoustic beamformers
d (φ ) = 1
91
d (φ ) = c os φ
0dB
−7.5dB
−15dB
−22.5dB
−30dB
0dB
−7.5dB
−15dB
−22.5dB
−30dB
f=500Hz
f=1500Hz
f=3000Hz
d (φ ) = c os 2φ
d (φ ) = c os 3φ
0dB
−7.5dB
−15dB
−22.5dB
−30dB
0dB
−7.5dB
−15dB
−22.5dB
−30dB
Figure 4.18: Circular harmonic responses of different orders n of a circular microphone array of M = 7 pressure microphones mounted of a rigid cylindrical baffle of
radius a = 5 cm.
f
500 Hz
1500 Hz
3000 Hz
f
500 Hz
1500 Hz
3000 Hz
d(φ) = 1
−169.30 dB
−86.10 dB
−41.57 dB
d(φ) = cos φ
−134.78 dB
−77.41 dB
−32.77 dB
d(φ) = cos(2φ)
−109.44 dB
−57.19 dB
−28.35 dB
d(φ) = cos(3φ)
−9.40 dB
−28.44 dB
−16.61 dB
d(φ) = 1
−170.59 dB
−87.05 dB
−42.35 dB
d(φ) = cos φ
−134.78 dB
−77.86 dB
−33.85 dB
d(φ) = cos(2φ)
−121.58 dB
−58.02 dB
−30.16 dB
d(φ) = cos(3φ)
−38.94 dB
−27.09 dB
−13.41 dB
(a)
(b)
Table 4.1: Directional response relative magnitude error for a baffled circular microphone array obtained through the optimization procedure (4.72) (a) and a theoretical
DFT circular microphone array (b).
Microphone Arrays For Sound Field Capture
92
Figure 4.19: Arrangement of cardioid capsules in Soundfield microphone (image
source: (Farrar, 1979)).
4.6.2
Soundfield microphone non-coincidence correction filter
design
Soundfield microphone, shown in Figure 4.19, is a device built from four outwardpointing cardioid capsules arranged at the tips of a regular tetrahedron. The directional response d(θ, φ) = α + (1 − α) sin θ cos φ of the used cardioid capsules can have
various values α ∈ (0, 1), but those described in (Farrar, 1979) have a sub-cardioid
response with α = 32 .
The coordinates of the four microphone capsules are given by
rLF
=
rRB
=
rLB
=
rRF
=
d
√ [1 1 1]T
3
d
√ [−1 − 1 1]T
3
d
√ [−1 1 − 1]T
3
d
√ [1 − 1 − 1]T ,
3
where according to (Gerzon, 1975), d = 1.47 cm.
The signals from the four capsules, SLF (ω), SRB (ω), SLB (ω), and SRF (ω), are often denoted A-format. However, Soundfield microphone usually provides four signals
W (ω), X(ω), Y (ω), and Z(ω), known under the name of B-format, and characterized
4.6 Sound field microphones as acoustic beamformers
93
by the following directional responses:
dW (θ, φ)
=
dX (θ, φ)
=
dY (θ, φ)
=
dZ (θ, φ)
=
1
√
√
√
2 sin θ cos φ
2 sin θ sin φ
2 cos θ .
The usual way of converting the four capsule signals to B-format is through a
linear combination represented by a matrix-signal product
 1




1
1
1
α
α
α
α
W (ω)
SLF (ω)
√
√
√
√
6
6
6
6

 X(ω)  1 
− 1−α
−√
 1−α
  SRB (ω) 
1−α
1−α
√
√
√



(4.73)
6
6
6
6   S
 Y (ω)  = 4 

−√
− 1−α
 1−α

LB (ω)
1−α
1−α
√
√
√
6
6
6
6
Z(ω)
SRF (ω)
− 1−α
− 1−α
1−α
1−α
with α = 23 , followed by the so-called B-format non-coincidence correction filters (see
Gerzon, 1975; Faller and Kolundžija, 2009).
Instead of following the conventional approach, we start the presentation of noncoincidence filters by noting that Soundfield microphone is an exact model of a sampled cardioid spherical microphone aperture described in Section 4.5.1. Furthermore,
since a regular tetrahedron is a platonic solid, the sampling performed by Soundfield
microphone is uniform.
It follows immediately from (4.63) that in order to obtain the signal W (ω), the
four capsule signals need to be pre-filtered by
s
HW
(ω) =
1
,
4(α j0 (kr d) + i(1 − α)j00 (kr d))
(4.74)
with kr = ωc .
For obtaining the signal X(ω),14 the four capsule signals need to be weighted
by the directional response dX (θ, φ) evaluated in their direction15 and additionally
pre-filtered with
i
s
HX
(ω) =
.
(4.75)
4(α j1 (kr d) + i(1 − α)j10 (kr d))
We also present an alternative solution for obtaining Soundfield capsule filters,
which is based on microphone array optimization. As in the circular array case, we
note that the signal of a spherical cardioid microphone aperture (4.62) is symmetric
in arguments (ϑ, ϕ) and (θ, φ). This implies that the directional response of a single
microphone on the aperture has the same angular dependence as (4.62), and the analysis of the effective spherical harmonic bandwidth from Section 4.5.1 applies equally
to microphones’ directional responses. Therefore, it follows from Figure 4.1316 that
14 Signals
Y (ω) and Z(ω) are obtained in a similar way to X(ω).
15 For
instance, for obtaining X(ω), the signal SLF (ω) is weighted by d(θLF , φLF ), where (θLF , φLF )
are angular spherical coordinates of vector rLF .
16 The effective bandwidth of a cardioid spherical aperture is very similar to that of an omnidirectional spherical aperture.
Microphone Arrays For Sound Field Capture
94
the effective bandwidth of microphones’ directional responses is approximately given
by
Neff = kr d ,
(4.76)
with kr = ωc . The same observation is found in literature on spherical microphones
(e.g., see Rafaely, 2005), and the corresponding aliasing frequency is thus given by
fmax ≈ 3.71 kHz.
For recovering spherical harmonic directional responses of orders zero and one,
the minimum number of control directions covering a full sphere is four, but we use
N = 12 instead. This uniform sphere sampling with an icosahedron is a slightly
oversampled sampling scheme that makes the optimization procedure more robust to
aliasing errors.
The microphone pre-filters HW (ω) and HX (ω) are obtained by solving the following optimization problem at different frequencies:
minimize
kG(ω) H(ω) − Dk2 ,
(4.77)
where G(ω) is a matrix containing frequency responses of acoustic channels between
microphones and control directions, H(ω) is the vector of microphone filters being
computed, and D is a vector with values of the directional response, dW (θ, φ) or
dX (θ, φ), evaluated in control directions.
s
s
(ω), with
(ω) and HX
Figure 4.20 compares the theoretically derived filters HW
filters HW (ω) and HX (ω) obtained through optimization.
| H W (f )|
15
magnitude [dB]
10
HW
s
HW
5
0
−5
−10
−15
2
10
3
4
10
10
f [Hz]
| H X (f )|
15
magnitude [dB]
10
5
0
−5
−10
HX
−15
s
HX
−20
2
10
3
4
10
10
f [Hz]
Figure 4.20: Soundfield microphone non-coincidence compensation filters obtained
from theory and by microphone array optimization on a full sphere.
4.6 Sound field microphones as acoustic beamformers
95
From Figure 4.20, one can observe good correspondence between the microphone
s
filters HW (ω) and HW
(ω) up to high frequencies, while the characteristics of HX (ω)
s
and HX (ω) bifurcate above the aliasing frequencies fmax . This can be explained by
a higher influence of aliasing errors above fmax on the array optimization procedure.
It is also interesting to observe the obtained directional responses in the horizontal
plane in all four cases, which are shown in Figure 4.21.
H W (ω )
90 2
120
H X (ω )
90 2
60
1
150
120
30
180
30
180
0
210
330
240
1
150
0
210
60
300
330
240
270
300
270
f=300Hz
f=3000Hz
f=10000Hz
s
HW
(ω )
90 2
120
s
HX
(ω )
90 2
60
1
150
120
30
180
330
240
300
270
1
150
0
210
60
30
180
0
210
330
240
300
270
Figure 4.21: Directional responses in the horizontal plane of Soundfield microphone
B-format signals W (ω) and X(ω) obtained by optimization on a full sphere (up) and
from theory (down).
From Figure 4.21, one can observe that the obtained directional responses in all
four cases are good approximations of the desired responses, even well above aliasing frequency fmax . The only difference is in the gain of signals X(ω) and X s (ω),
where the former is excessively attenuated, and the latter excessively amplified in the
horizontal plane.
This surprising consistency in the directional responses well above the aliasing
frequency is a consequence of a smart design. In other words, the particular capsule
configuration used by Soundfield microphone takes advantage of symmetries that
make many higher-order spherical harmonic terms vanish in the horizontal plane. A
careful analysis reveals that this consistency is lost in tilted planes, as shown in (Faller
and Kolundžija, 2009).
To take full advantage of the mentioned symmetries, one can use a microphone
array optimization procedure which uses control directions only in the horizontal
Microphone Arrays For Sound Field Capture
96
h
h
plane. Figure 4.22 shows filters HW
(ω) and HX
(ω) obtained by solving (4.77) with
N = 13 uniformly-spaced control angles covering a full circle in the horizontal plane.
| H W (f )|
15
magnitude [dB]
10
HW
s
HW
5
0
−5
−10
−15
2
10
3
4
10
10
f [Hz]
| H X (f )|
15
magnitude [dB]
10
5
0
−5
−10
HX
−15
s
HX
−20
2
10
3
4
10
10
f [Hz]
Figure 4.22: Soundfield microphone non-coincidence compensation filters obtained
from theory and by microphone array optimization in the horizontal plane.
h
h
From Figure 4.22, it can be observed that the optimized filters HW
(ω) and HX
(ω)
s
s
are more similar to the theoretical filters HW (ω) and HX (ω) than those optimized on
a full sphere.
Figure 4.23 shows the directional responses of signals W h (ω) and X h (ω) obtained
h
h
with filters HW
(ω) and HX
(ω), respectively.
From Figure 4.23, one can see that the optimized microphone filters give rise to
good approximations of the desired responses, both in shape and magnitude, at a
wide range of frequencies.
4.7
Conclusions
This chapter gave a wide treatment of microphone arrays for capturing horizontal and
3D sound fields, that included gradient, circular, and spherical microphone arrays.
We presented orthogonal decompositions of horizontal sound fields that included
the Fourier series expansion of the angular spectrum and horizontal helical wave spectrum. In case of general 3D sound fields, the spherical harmonic decomposition was
described. All these three representations give rise to strategies for capturing sufficient
representations of source-free sound fields using compact microphone apertures.
4.7 Conclusions
97
h
HW
(ω )
90 2
120
h
HX
(ω )
90 2
60
1
150
120
30
180
300
270
30
180
0
210
330
240
1
150
0
210
60
f=300Hz
f=3000Hz
330
240
300
270
f=10000Hz
Figure 4.23: Directional responses in the horizontal plane of Soundfield microphone
B-format signals W h (ω) and X h (ω) obtained by optimization in the horizontal plane.
Microphone arrays were presented as sampling devices able to capture a lowresolution representations of a horizontal or 3D sound field. We saw that both horizontal and 3D sound fields have effective angular or spherical harmonic bandwidth
that linearly increases with frequency. Hence, the low-resolution representation of
sound fields obtained with microphone arrays is sufficiently accurate up to a certain
aliasing frequency, which depends on the array’s size and the number of microphones.
Finally, we presented optimization techniques as a framework for designing microphone arrays for sound field capture, whose directional responses correspond to
angular orthogonal functions, such as circular or spherical harmonics. This approach,
unlike those that follow from theoretical analyses of various idealized circular or spherical apertures and their discrete equivalents, requires only general knowledge about
the directional responses of used microphones. The microphone filters are obtained
from their directional responses in a discrete set of directions, using a constrained
optimization procedure whose objective function quantifies the error between the synthesized and desired directional response. In addition to flexibility, this technique is
able to efficiently obtain highly precise solutions that are similar and in some specific
cases even better than those provided by theory.
98
Microphone Arrays For Sound Field Capture
Chapter 5
Baffled Loudspeaker Array For
Spatial Sound Reproduction
5.1
Introduction
In this chapter, we deal with a sound radiation problem. In particular, we describe a
design of a loudspeaker array which reproduces sound in a directional manner over a
wide range of frequencies. The design combines two principles discussed in Chapter 2.
The first one is sound radiation from a vibrating piston mounted on the surface of a
scattering object, which was touched upon in Sections 2.5.4 and 2.6.2. The system
“radiator-scatterer”, whose radiation pattern becomes increasingly directional with
frequency, can be viewed as a physical frequency-dependent beamformer. The other
principle is a beamformer design, where a number of sound radiators are controlled
with pre-filters in order to achieve a desired directional sound reproduction.
When designing microphone arrays as described in Chapter 4, the goal was to capture a sound field with directional responses corresponding to orthogonal components
of the sound field, such as circular or spherical harmonics. Thus, when using array
techniques, one had clearly defined desired directional responses which needed to be
optimally synthesized using microphones’ post-filters. Unlike microphone arrays, the
loudspeaker arrays described in this section are not required to have that sort of flexibility. As will be explicitly stated, they need to be able to steer sound to a number
of directions—but not a continuum thereof—and do it in a directional fashion that
is consistent over a wide range of frequencies. Additionally, the ultimate receiver of
the reproduced sound—the human listener—is taken into account by incorporating
the hearing logarithmic sensitivity into the directional response error function. The
design also relies on the fact that listeners are not highly sensitive to small spectral
variations in the frequency response of electro-acoustic systems (Bücklein, 1981).
Here is a list of design goals which our procedure aims to address.
• High directivity
The main goal of our loudspeaker array design is highly directional reproduction
of sound. Additionally, high directivity should be kept over a wide frequency
range, such that the desired spatial effects are as frequency-invariant as possible.
99
Baffled Loudspeaker Array For Spatial Sound Reproduction
100
• Steering capability
Not only are we interested in being able to reproduce sound directionally, but
also to do so towards a number of different directions. This property is highly
desirable for public address (PA) systems or surround reproduction in rooms,
and suggests the use of uniformly spaced circular loudspeaker arrays.
• Compact size
Although not a primary goal, having a compact loudspeaker array with only a
few loudspeakers is advantageous for a number of reasons, both from the users’
and designers’ perspective (e.g., saving listening room space and cost).
• Measurement-based design
Sound scattering and propagation properties on symmetric geometries are well
analyzed and understood, but models are often not sufficient for practical system
design. There is a number of reasons for discrepancies between a model and a
real system. These include manufacturing inaccuracies, model simplifications,
equipment miscalibration etc. In order to avoid the mentioned sources of errors,
we rely on measurements of the loudspeaker array in a number of control points
such that those inaccuracies are accounted for.
5.1.1
Background
In various sound reproduction scenarios, it is preferable to have a way of reproducing
sound in a directional manner, and doing it consistently over a wide range of frequencies. These scenarios include various public address (PA) systems, both indoor and
outdoor, and compact multichannel audio reproduction systems, such as sound boxes
and sound bars (e.g., Yamaha, 2011; Sonic Emotion, 2011).
There are various solutions for directional sound reproduction. These include flat,
large-membrane loudspeaker panels, ultrasonic flat loudspeakers, and loudspeaker
arrays.
The flat loudspeaker panels (Holophonics, 2011) use large vibrating plates in order
to generate flat wavefronts and exhibit a plane-wave-like reproduction. Their directional reproduction performance is impressive, but the limited quality of reproduced
sound and the lack of dynamic range render them hardly usable for most applications.
Ultrasonic loudspeakers (e.g., Yoneyama et al., 1983; Nakashima et al., 2006;
Holosonics, 2011; Sennheiser, 2011) use an array of ultrasound transducers which
can be combined with a beamformer in order to form a narrow front-facing radiation
beam. The audio signal is modulated to the ultrasonic frequencies, reproduced with
the transducer array, and rendered back to the audible spectrum by a demodulation
that happens as a result of non-linearities in the air. Similar to the flat loudspeaker
panels, ultrasonic loudspeakers have an outstanding directionality, but at a cost of a
very limited dynamic range and poor reproduction quality.
Finally, the loudspeaker arrays use larger number of loudspeakers and classical
beamforming (Ward et al., 1995; Van der Wal et al., 1996). In order to achieve similar
directional reproduction in a wide range of frequencies, the loudspeaker beamformer
combines different loudspeakers from the array in order to keep the inter-loudspeaker
distance inversely proportional to frequency. Loudspeaker arrays can achieve a good
and consistent directional reproduction of sound in a wide range of frequencies, but
5.2 Acoustical design
101
they require a large number of loudspeakers, are large in size, and cannot maintain
desired directional reproduction up to very high audible frequencies due to the limitations determined by the minimum inter-loudspeaker distance.
5.1.2
Chapter outline
Section 5.2 analyzes the theoretical model of sound radiation from a loudspeaker
mounted on a rigid cylinder and motivates the baffled loudspeaker array design. Section 5.3 describes different aspects of designing optimized beamforming filters for
directional sound reproduction. Design of a beamformer using a simulated baffled
loudspeaker model is presented in Section 5.4. A prototype loudspeaker array and
its performance are described in Section 5.5. Section 5.6 describes how a directional
loudspeaker array can be used for reproducing spatial sound. Conclusions are given
in Section 5.7.
5.2
Acoustical design
We have seen the boundary value problems in cylindrical and spherical coordinates
in Sections 2.5 and 2.6, respectively, and how they were used to derive sound fields
radiated by pressure or velocity distributions on cylindrical or spherical boundaries.
Of particular interest in this chapter is the radiation pattern of a vibrating piston in
a cylindrical baffle, presented in Section 2.5.4. Vibrating piston in a cylindrical baffle
serves as a model of a baffled piston loudspeaker that is the basis of our design.
It was shown in Section 2.5.4 that the directional response (or radiation pattern)
of a baffled vibrating piston changes from being omnidirectional at low frequencies
to being highly directional at high frequencies. From the perspective of our design
goals, this is certainly not desired, as we seek to achieve a directional response that
is invariant over a wide range of audible frequencies. Thus, the use of a single loudspeaker does not suffice, and our design uses a loudspeaker array in a way described
in Section 5.3.
In this part, we show that one can identify two different trends in the radiation
pattern of a baffled piston over audible frequencies. Namely, at low frequencies,
the directional response of a baffled piston changes rapidly from omnidirectional to
highly directional. This happens from the lowest audible frequencies up to a few kHz,
depending on the size of the baffle and piston. As the frequency increases further,
the directional response exhibits slow changes at a macroscopic scale. By this we
mean that the radiation pattern has roughly the same shape, with only small changes
in the front, around the look direction,1 and relatively larger changes in the highlyattenuated back directions.
5.2.1
Baffled loudspeaker model
To support the claim about “bi-modality” of the baffled loudspeaker’s directional
response, we show an example involving a model of a baffled piston loudspeaker.
The model, which was presented in more detail in Section 2.5.4, involves an infinite
1 The
look direction is defined by the radial line connecting the baffle’s center and the piston.
102
Baffled Loudspeaker Array For Spatial Sound Reproduction
cylindrical baffle of radius a and a vibrating piston of length 2L and circumferential
width 2αa. Figure 5.1 illustrates the geometry of the model.
Figure 5.1: Model of a piston loudspeaker mounted on an infinite rigid cylindrical
baffle.
This simplifying baffled loudspeaker model is convenient to illustrate the concept,
as it possesses a closed-form analytical solution which readily serves for analysis. As
will be stressed later, our design does not rely on this particular model, but on a more
general insight it provides. For convenience, we repeat the expression (2.52) for the
sound pressure radiated in the far field by a vibrating piston mounted on an infinite
cylindrical baffle (Williams, 1999):
P (r, θ, φ, ω) ≈
ρ0 c eikr
2π 2 r
N
X
n=−N
(−i)n einφ
4bαL sinc(nα) sinc(kz L)
,
sin θ Hn0 (ka sin θ)
(5.1)
where r, θ, and φ are the standard spherical coordinates depicted in Figure 5.1, ρ0 is
the density of air, c the speed of sound propagation, b the piston’s velocity, k = ω/c
the wave number, and Hn0 (x) the first derivative of the Hankel function of the first
kind. In the following analysis, the piston will have sides of equal length, i.e., L = aα.
Although the far-field response is not easy to predict from (5.1), it can be inferred
that:
• High-frequency directivity grows inversely with piston’s size 2aα for a fixed
baffle radius a.
• High-frequency directivity grows with the baffle radius a, when the piston size
is kept constant.
To confirm the above claims, we illustrate normalized frequency-dependent directional responses of the considered baffled loudspeaker model. Figure 5.2 illustrates
5.2 Acoustical design
103
the influence of the piston’s size, whereas Figure 5.3 shows the influence of the baffle’s radius. Note that the polar diagrams are a bit unusual, as they show normalized
directional responses clipped at a threshold value of −15 dB. The reason is to reveal the dependence of the high-frequency directivity on the two considered system
parameters—piston’s and baffle’s size, where the directivity is related to the directional response’s −15 dB level. It was already mentioned that the directional response
below the chosen threshold, although more varying, is low enough to be considered
less relevant for the foreseen applications.
L=0.01m
L=0.02m
0
0
150
−5
100
−10
50
0
φ [ de g]
φ [ de g]
150
5000
10000
f [ Hz ]
15000
−5
100
−10
50
0
−15
L=0.03m
5000
10000
f [ Hz ]
15000
L=0.04m
0
0
150
−5
100
−10
50
5000
10000
f [ Hz ]
15000
−15
φ [ de g]
φ [ de g]
150
0
−15
−5
100
−10
50
0
5000
10000
f [ Hz ]
15000
−15
Figure 5.2: Frequency-dependent normalized directional responses of a cylindrically
baffled vibrating piston, clipped at −15 dB. The piston side half-lengths aα (equal to
L) are varied, while the baffle radius is kept constant at a = 0.08 m.
From Figure 5.2 and 5.3, one can see that the directivity of a cylindrically baffled
piston is roughly inversely proportional to the piston’s dimension aα and directly
proportional to the baffle’s radius a. Furthermore, as a guide for choosing the dimensions of a piston and a baffle, one can use the observation that the −15 dB threshold
appears around the angle φT = ±75◦ off the piston’s axis when the piston covers the
circumferential angle of 2α = 0.75 rad.
Note, however, that the pressure in (5.1) is approximately proportional to piston’s
area at low frequencies. Thus, even though smaller pistons can be used to increase
loudspeaker’s directivity, their size can not be made too small, as it may violate design
goals for achievable sound pressure levels (SPL) at low frequencies.
Figure 5.2 and 5.3 also confirm the claim of a roughly bi-modal behavior of baffled
Baffled Loudspeaker Array For Spatial Sound Reproduction
104
a=0.08m
a=0.16m
0
0
150
−5
100
−10
50
0
φ [ de g]
φ [ de g]
150
5000
10000
f [ Hz ]
15000
−5
100
−10
50
0
−15
a=0.32m
5000
10000
f [ Hz ]
15000
a=0.64m
0
0
150
−5
100
−10
50
5000
10000
f [ Hz ]
15000
−15
φ [ de g]
φ [ de g]
150
0
−15
−5
100
−10
50
0
5000
10000
f [ Hz ]
15000
−15
Figure 5.3: Frequency-dependent normalized directional responses of a cylindrically
baffled vibrating piston, clipped at −15 dB. The piston size lengths are kept constant,
aα = L = 2 cm, while the radius a is varied.
loudspeaker’s directional responses. Low frequencies, up to a few kHz,2 are characterized by a sharp transition from an omnidirectional to a unidirectional pattern. High
frequencies, starting from a few kHz, are characterized by directional responses that
have little variation with frequency.3
Figure 5.4 provides a different presentation of the same data. For each angle, it
shows the variation of the directional response’s magnitude in the frequency range
5 − 20 kHz. From Figure 5.4, one can see that the directional response in the front
directions does not vary significantly at high frequencies. The more prominent variations happen in the rear, highly attenuated directions, making these variations less
relevant from the perspective of our design goals.
5.3
Beamformer design
From the previous section, we have seen that an adequately sized baffled loudspeaker
design can provide a desired directional sound radiation at high frequencies. In order
to maintain that behavior down to low frequencies, one needs to use beamforming
2 The
“border frequency” between the two modes depends on the baffle’s and piston’s dimensions.
3 Responses
at rear angles are more varying with frequency, but they are highly attenuated such
that their variability is of little significance for the considered applications.
5.3 Beamformer design
105
0
d (φ ) [ dB]
−5
−10
−15
−20
−25
−30
−35
0
20
40
60
80
100
120
140
160
φ
Figure 5.4: Variation of the directional response of a cylindrically baffled vibrating
piston in the frequency range 5 − 20 kHz, with aα = L = 1.5 cm and a = 8 cm.
Light-gray area shows 25 − 75 percentiles, dark-gray area shows 1 − 99 percentiles, and
solid line shows the mean directional response.
with multiple loudspeakers.
Although the sound scattering from a cylindrical baffle is used as a starting point
for loudspeaker array and analysis, the beamformer design procedure does not rely
explicitly on the cylindrical geometry. Similarly, it does not require a very precise
placement of control points (i.e., forcing them to lie exactly on a circle centered at
the array’s center), which is a usual requirement of modal beamforming techniques.
We use a beamformer design that is fully based on the loudspeaker array measurement. It relies on the previously described bi-modal nature of the directional response
of a baffled loudspeaker, and on the observation that a single (front) loudspeaker can
drive the high frequencies alone. Additionally, the beamformer design relies on the
estimated effective angular band-limitedness of the directional response for choosing
the number of control points in order to limit the errors due to aliasing. For deciding
on the number of control points, one can use the analysis of the effective angular
bandwidth (see Definition 1 on page 73) of a loudspeaker model at different frequencies. As an example, Figure 5.5 shows the −20dB-effective angular bandwidth of a
baffled piston with dimensions similar to our loudspeaker array prototype described
later in Section 5.5. Based on Figure 5.5, designing a beamformer up to the frequency
f0 = 5 kHz requires more than 16 control points.
N e ff
15
10
5
0
2000
4000
6000
8000
f [ Hz ]
10000
12000
14000
Figure 5.5: −20dB-effective angular bandwidth of a piston in a cylindrical baffle.
The piston’s dimensions are L = aα = 1.5 cm, while the baffle’s radius is a = 8 cm.
Baffled Loudspeaker Array For Spatial Sound Reproduction
106
5.3.1
Filter design procedure
Figure 5.6: Circular loudspeaker array configuration and control points on a reference
circle used for designing beamforming filters.
We use a baffled circular loudspeaker array with L = 6 loudspeakers, illustrated in
Figure 5.6. The loudspeaker array’s look direction4 coincides with the look direction
of one loudspeaker, which we denote the main loudspeaker. Without loss of generality,
we assign index 1 to the main loudspeaker.
Each loudspeaker’s response is measured in M control points covering a circle
of radius r centered at the array’s center. Measurements are done in free-field or
anechoic conditions. Additionally, without sacrificing generality, we assign index 1 to
the control point lying in the loudspeaker array’s look direction.5
Denote by G(ω) the acoustic channel matrix, where the entry Gij (ω) denotes a
filter representing the transmission path from the j-th loudspeaker to i-th measurement point. Note that each column Gi (ω) of the matrix G(ω) contains the directional
response of the i-th loudspeaker at frequency ω. Further, let the vector H(ω) contain
loudspeaker filters Hi (ω).
Motivated by the previous observations, we select the directional response G1 (ω0 )
of the main loudspeaker at some high frequency ω0 to be the desired directional
response of the array. Additionally, while keeping the shape of the directional response
(polar pattern) unchanged, we scale it with a frequency-dependent factor
C(ω) =
|G11 (ω)|
,
|G11 (ω0 )|
where G11 (ω) and G11 (ω0 ) are the on-axis responses of the main loudspeaker at
frequencies ω and ω0 , respectively. This makes the on-axis desired response of the
4 We
will use on-axis and look direction interchangeably throughout the rest of the chapter.
5 The
control point with index 1 is also in the look direction of the loudspeaker with index 1.
5.3 Beamformer design
107
entire array equal to the on-axis response of the main loudspeaker.6 The frequencydependent desired response is thus given by
D(ω) = C(ω) G1 (ω0 ) .
(5.2)
Above the frequency ω0 , we suppress beamforming and use only the main loudspeaker.
It was shown in Section 2.8 that a beamformer design could be expressed in the
frequency domain as a constrained optimization problem. The goal is to compute a
vector H(ω) of loudspeaker filter complex gains, where the function being optimized
is the error norm
E(ω) = G(ω) H(ω) − D(ω)
(5.3)
between the desired and obtained directional response in the control points, D(ω)
and G(ω) H(ω), respectively. Before we present the design procedure in detail, let
us motivate the particular choices in the beamformer design formulation used for the
problem at hand.
Weighted error
The human auditory system is sensitive to relative changes of sound quantities, and
frequency response irregularities are no exception (Bücklein, 1981). Thus, in a scenario where a desired directional response is highly angle-varying, it is not reasonable
to sum the beamformer response errors at control points equally. It is rather judicious
to penalize the absolute error more at points where the desired response level is low,
and less where it is high. Thus, we look for minimizing the relative error in the control
points by dividing the error in each control point by the desired gain in that point,
Ew (ω) = diag(|D(ω)|)−1 (G(ω)H(ω) − D(ω)) ,
(5.4)
where diag(x) of a vector x denotes a square diagonal matrix whose main diagonal
contains the elements of x, and |x| denotes a vector containing absolute values of the
elements of x.
Robust design
We try to avoid driving any of the loudspeakers with large gain at any frequency.
There are two reasons for such a design decision. First, driving loudspeakers with
large gains to better match the desired pattern effectively decreases the dynamic
range of the loudspeaker array. Second, as described in Section 2.8, the l2 norm of
the beamformer filters’ impulse response is related to the beamformer’s white noise
gain (Cox et al., 1987)
1
,
(5.5)
WNG =
kH(ω)k22
which quantifies its robustness to errors. More specifically, the larger the white noise
gain is, the less sensitive to random (measurement or placement) errors is the designed
beamformer.
6 The goal of the described procedure is not to equalize the on-axis frequency response of the main
loudspeaker. This can be done separately, using conventional loudspeaker equalization approaches.
108
Baffled Loudspeaker Array For Spatial Sound Reproduction
Favoring front loudspeaker
We have shown in Section 5.2 that high-frequency directional response of a baffled
loudspeaker does not vary substantially with frequency. Thus, if the beamformer’s
desired response is set to the directional response of the front loudspeaker at some fixed
high frequency ω0 , then it is to be expected that as the frequency approaches ω0 , the
beamformer tends to mostly use the main loudspeaker. As a consequence, the response
of the beamformer will be dominated by the response of the main loudspeaker, in
terms of both magnitude and phase.
Additionally, our goal is reproducing sound with high directivity, and phase errors
in the synthesized directional response are of little relevance. We saw in Section 2.8
that one can express the beamformer design problem where the error between the
magnitudes of the synthesized and desired response is considered. We also presented
Algorithm 2.1, which is an iterative procedure for solving this problem, but without
guarantees of convergence to the global optimum.
The crucial step in any iterative procedure is the initial solution. The initial
solution selection for Algorithm 2.1 relies on the high-frequency dominance of the
main loudspeaker, and consists of aligning the phase of the desired response to the
phase of the main loudspeaker (Kolundžija et al., 2010a, 2011b),
D̃(ω) = diag(|G1 (ω)|)−1 diag(G1 (ω)) |D(ω)| .
(5.6)
At low frequencies, phase differences between the responses of array’s loudspeakers
are small, and the impact of the phase correction is negligible on the synthesized
directional response. However, the phase alignment improves the desired response
synthesis at high frequencies (Kolundžija et al., 2010a, 2011b). It also enables a
smooth transition between using all to using only the main loudspeaker, which is
highly desirable when designing practical finite impulse response (FIR) filters.
Algorithm for computing the beamformer filters
Putting the above design decisions together, the beamformer design problem is solved
using Algorithm 5.1 for different frequencies ω.
In Algorithm 5.1, x ∈ {2, ∞} is the minimized relative error norm (Euclidean or
min-max), G0 (ω) is the matrix obtained by removing the first row of the matrix G(ω),
D̃ 0 (ω) and D̂ 0 (ω) are vectors obtained by removing the first element of the vectors
D̃(ω) and D̂(ω), respectively, R1T (ω) is the first row of the matrix G(ω), and τ is a
small constant used for controlling deviations of the on-axis frequency characteristic.
If the on-axis frequency response of the beamformer needs to match the desired one
exactly, the parameter τ is to be set to zero.
5.4
Simulations
In order to assess the wide-band performance of the described beamformer, we simulated a model of a six-element loudspeaker array mounted on an infinite cylindrical baffle. The vibrating pistons, modeling loudspeaker membranes, were uniformly
spaced on the baffle’s circumference. The radius of the baffle was a = 8 cm, and the
sides of pistons were 2L = 2aα = 3 cm.
5.4 Simulations
109
Algorithm 5.1 Minimize directional response relative magnitude error norm with
WNG constraints
1. Choose the solution tolerance 2. Compute the initial solution H(ω) by solving the following quadratic program
minimize
subject to
k diag(|D̃ 0 (ω)|)−1 (G0 (ω) H(ω) − D̃ 0 (ω)) kx
|Hj (ω)| ≤ Hmax , j = 1, . . . , L
|R1T (ω) H(ω) − D̃1 | ≤ τ |D̃1 |
3. repeat
4.
E ← k diag(|D(ω)|)−1 (|G(ω) H(ω)| − |D|) kx
5.
Compute D̂(ω) such that ∀j ∈ {1, . . . , N }
|D̂j (ω)| = |Dj |
∠D̂j (ω)
6.
= ∠(G(ω) H(ω))j
Solve the following quadratic program
minimize
subject to
k diag(|D̃ 0 (ω)|)−1 (G0 (ω) H(ω) − D̂ 0 (ω)) kx
|Hi (ω)| ≤ Hmax , i = 1, . . . , L
|R1T (ω) H(ω) − D̂1 | ≤ τ |D̂1 |
7.
E 0 ← k diag(|D(ω)|)−1 (|G(ω) H(ω)| − |D|) kx
8. until |E 0 − E| < The desired directional response was taken to be the response of the main loudspeaker alone at frequency f0 = 5 kHz, which is shown in Figure 5.7.
The control points used for computing the loudspeaker filters were placed uniformly on a circle of radius r = 3 m centered at the array’s center. The number of
reference points, M = 19, was chosen slightly higher than the 20dB-effective angular bandwidth of the loudspeakers’ directional responses (see Figure 5.5) below the
frequency f0 .
Figure 5.7 compares directional response of the beamformer to the desired response
at a number of frequencies ranging from 300 Hz to 12 kHz. We can observe that there
is a good match between the two at all frequencies, in the sense that deviations do
not exceed a few decibels.
To better illustrate the consistency of the obtained directional response over frequencies, Figure 5.8 shows frequency responses at various angles on the reference
circle. We can see that there is no large deviations in the frequency response at most
angles. More prominent variations of the frequency response happen at highly attenuated rear angles, but as mentioned earlier—variations at very low sound levels are
not detrimental in the foreseen applications.
Figure 5.9 shows the frequency responses of the computed beamformer filters. As
expected from the beamformer design procedure, the front loudspeaker drives the
high frequencies independently, and the other loudspeakers effectively help at low
Baffled Loudspeaker Array For Spatial Sound Reproduction
110
f=300Hz
f=1500Hz
0dB
−7.5dB
−15dB
−22.5dB
−30dB
0dB
−7.5dB
−15dB
−22.5dB
−30dB
beamformer
desired
f=5000Hz
0dB
−7.5dB
−15dB
−22.5dB
−30dB
f=12000Hz
0dB
−7.5dB
−15dB
−22.5dB
−30dB
Figure 5.7: Loudspeaker array directional responses at various frequencies, on the
reference circle of radius r = 3 m.
0
0 deg
23 deg
45 deg
68 deg
90 deg
113 deg
135 deg
158 deg
gain [dB]
−5
−10
−15
−20
−25
−30
2
10
3
10
f [Hz]
4
10
Figure 5.8: Loudspeaker array frequency responses at various angles on the reference
circle of radius r = 3 m.
frequencies. Furthermore, we can see a desirable smooth transition between array
beamforming and acoustical beamforming (provided by the baffle).
5.5 Experiments
111
20
gain [dB]
10
0
−10
−20
1
2
3
4
−30
−40
2
3
10
10
4
10
f [Hz]
Figure 5.9: Loudspeaker array beamformer filters’ frequency responses.
5.5
Experiments
In addition to simulations, we tested the proposed approach to directional loudspeaker
design in practice. We assembled a cylindrical array of six small Logitech Z4 loudspeakers, as illustrated in Figure 5.10.
Figure 5.10: Prototype loudspeaker array consisting of six Logitech Z4 loudspeakers
arranged as a cylinder having a radius r ≈ 8 cm.
As stated earlier, our beamformer design procedure is entirely based on measured
loudspeaker responses, and not on a model. The measurement of the assembled
loudspeaker array was made in an anechoic chamber. The loudspeaker array was
fixed on a turntable and measured with a fixed omnidirectional microphone r = 2 m
away from its center. The measurements of all loudspeakers were made at 13 turntable
rotation steps of 2π
13 radians, which is equivalent to measuring the array uniformly at
13 equidistant points on a circle of radius r = 2 m.
Measurements were done using swept sines, covering the frequency range from
300 Hz to 12 kHz. Due to limited access to measurement resources, the measurements
were performed only once, without averaging multiple responses.
Figure 5.11 shows the frequency responses of one loudspeaker from the array at
control points covering the angles in the range [0, π]. Figure 5.12 shows the loud-
Baffled Loudspeaker Array For Spatial Sound Reproduction
112
20
magnitude [dB]
15
0 deg
28 deg
55 deg
83 deg
111 deg
138 deg
166 deg
10
5
0
−5
−10
3
4
10
10
f [Hz]
Figure 5.11: Measured frequency responses of one loudspeaker of the prototype loudspeaker array. The diagram contains frequency responses in seven control points on
the measurement circle of radius r = 2 m.
f=500Hz
0dB
−7.5dB
−15dB
−22.5dB
−30dB
f=3000Hz
0dB
−7.5dB
−15dB
−22.5dB
−30dB
f=1500Hz
0dB
−7.5dB
−15dB
−22.5dB
−30dB
f=6000Hz
0dB
−7.5dB
−15dB
−22.5dB
−30dB
Figure 5.12: Measured directional responses of one loudspeaker of the prototype
loudspeaker array at different frequencies. The diagram contains directional responses
sampled in 13 control points on the measurement circle of radius r = 2 m.
speaker’s directional responses at different frequencies. As expected, the directional
response at high frequencies above 4 kHz is not highly varying with frequency. Also,
the low-frequency directivity, below 2.5 kHz, increases with frequency.
In the frequency range between 2.5 kHz and 4 kHz, the directional response becomes less directive. At frequencies around 3 kHz, the on-axis response is weaker
than the responses at angles up to 90◦ off the look direction. Additionally, the re-
5.5 Experiments
113
sponses towards side and rear directions are much less attenuated than at low and
high frequencies. This mid-frequency behavior is the only example where a practical loudspeaker behavior substantially deviates from the simplified theoretical model
analyzed in Section 5.2.
To compute the beamformer filters, we specified as desired the directional response
of the front loudspeaker at frequency f0 = 5 kHz. The frequency f0 = 5 kHz is also
the frequency above which only the front loudspeaker is used for sound reproduction.
We allowed the on-axis frequency response of the array to deviate by t = 1 dB from
the on-axis frequency response of the front loudspeaker.
The frequency responses of the beamformer at the control points belonging to
the first two quadrants are shown in Figure 5.13.7 The frequency responses at different control points—except for the rear, highly attenuated directions—do not vary
substantially at low frequencies up to 2 kHz, and at high frequencies, above 4 kHz.
However, we were not able to achieve the desired directional response in the frequency
range 2 − 4 kHz. Frequency responses at side directions have a high peak in this frequency range, which can manifest itself as an audible coloration (Bücklein, 1981).
Figure 5.14 illustrate the described frequency-dependent behavior, but using polar
plots.
magnitude [dB]
0
0 deg
28 deg
55 deg
83 deg
111 deg
138 deg
166 deg
−10
−20
−30
3
4
10
10
f [Hz]
Figure 5.13: Normalized beamformer frequency responses (0 dB represents the desired on-axis frequency response) in seven control points on the measurement circle of
radius r = 2 m.
Apart from the mid-frequency anomalies of some of the rear directions, it could be
said that the loudspeaker array achieves good wide-band directivity. For its foreseen
practical applications (described in the following section), the achieved performance
is quite satisfactory.
Figure 5.15 shows the frequency responses of the beamformer filters. As expected,
the frequencies above f0 = 5 kHz are reproduced only by the front loudspeaker. Below
f0 = 5 kHz, all loudspeakers are active, and the frequency responses exhibit smooth
transitions between beamforming and using only the main loudspeaker. This smoothness enables implementing the beamformer with short FIR filters. Conversion of the
filters’ frequency responses to FIR filters is thoroughly described in Section 6.2.4.
7 The
responses in the other two quadrants look very similar.
Baffled Loudspeaker Array For Spatial Sound Reproduction
114
f=500Hz
f=1500Hz
0dB
−7.5dB
−15dB
−22.5dB
−30dB
0dB
−7.5dB
−15dB
−22.5dB
−30dB
w/ beamformer
desired
f=3000Hz
w/o beamformer
0dB
−7.5dB
−15dB
−22.5dB
−30dB
f=10000Hz
0dB
−7.5dB
−15dB
−22.5dB
−30dB
Figure 5.14: Beamformer directional responses in 13 control points on the measurement circle of radius r = 2 m.
0
gain [dB]
−10
−20
1
2
3
4
−30
−40
3
4
10
10
f [Hz]
Figure 5.15: Beamformer filters’ frequency responses shown for four loudspeakers of
the prototype loudspeaker array.
5.6
Applications
As briefly mentioned in the introduction, one can foresee a number of uses for a
loudspeaker system having high broadband directivity.
One possible application is mitigation of adverse effects of a listening room on the
reproduced sound. Although the listening room contributes to the naturalness and
5.6 Applications
115
gives an important sense of space, its low-frequency modes can severely impair the
reproduced sound. Bad interaction of a loudspeaker system and listening room has a
detrimental effect on intelligibility of the reproduced material, be it speech or vocals.
We have seen in Section 2.7 that the critical distance of a room, which can be considered an indicator of room’s detrimental effects on speech intelligibility, is directly
proportional to the directivity of a source. Thus, by using a directional loudspeaker
directed toward the audience, one can expect a reduction of the unwanted coloration
and lengthy reverberation tail, and a consequent increase of speech intelligibility.
Another application where the described loudspeaker array can be useful is targeted sound reproduction, as in public address systems, advertisement displays, or in
exhibition spaces, where it can help reducing “sound pollution”.
Last, but not least, the loudspeaker array with high broadband directivity and
steering capability can be used for reproducing surround (e.g., stereo or 5.1) contents
in rooms. More specifically, using the capability to steer the reproduced sound, it is
possible to “project” channels towards different walls in order to evoke auditory events
outside the loudspeaker array, widen the auditory scene, and generate ambiance.
(a)
(b)
Figure 5.16: Examples of 5.1 surround reproduction with a six-element circular
loudspeaker array. (a) Projecting front channels towards the front wall and surround
channels towards side walls. (b) Similar to (a), but the center channel gets projected
towards the listener in order to position the dialog or vocal at the loudspeaker array.
Figure 5.16 illustrates two examples how the loudspeaker array described in Section 5.5 can be used for reproducing 5.1 surround content. In the example shown
in Figure 5.16(a), the front channels get projected towards the front wall in order
to widen the frontal auditory scene, while the two surround channels get projected
towards the side walls to create ambience and extend the auditory scene towards the
sides. Figure 5.16(b) illustrates a slightly modified reproduction strategy, where the
center channel gets projected towards the listener in order to anchor the dialogue or
vocals to the loudspeaker array.
We have done informal listening tests on various 5.1 contents reproduced using
116
Baffled Loudspeaker Array For Spatial Sound Reproduction
our six-loudspeaker prototype. With both of the previously described strategies, we
were able to generate spatial effects from both the front and side walls of the listening
room. Furthermore, the loudspeaker array did not suffer from noticeable timbral
artifacts at any direction.
5.7
Conclusions
The goal of the work described in this chapter was to design a compact loudspeaker
array having wide-band high directivity, with the ability to steer the sound to a
number of different directions.
As a solution, we proposed a beamformer for a circular loudspeaker array which
relies on two principles. One is the directivity increase with frequency of a loudspeaker when mounted on a rigid cylindrical baffle. Since the baffle makes highfrequency directional response of a loudspeaker approximately frequency-invariant, a
single loudspeaker is sufficient for reproducing high frequencies. The other principle is
magnitude-optimized beamforming, which uses all loudspeakers at low frequencies in
order to synthesize the directional response of a single loudspeaker at high frequencies.
The effectiveness of the proposed approach was verified through simulations using
a model of a baffled piston loudspeaker, and also with a prototype cylindrical loudspeaker array. Informal listening tests showed promising results in applications such
as mitigating adverse room effects and surround sound playback in rooms.
Chapter 6
Reproducing Sound Fields
Using MIMO Acoustic Channel
Inversion
6.1
Introduction
In this chapter, we present an approach denoted as Sound Field Reconstruction (SFR)
(Kolundžija et al., 2009b,c). In essence, SFR is designed to optimally reproduce a
desired sound field in a given listening area for a given finite setup, while keeping
loudspeaker driving signals well behaved in order to respect physical constraints.
SFR has three important aspects:
• Design of a control point grid covering the listening area
• Selection of active loudspeakers and a subset of control points based on position
of the reproduced source and the geometry of the reproduction setup
• Computation of loudspeaker filters using multiple-input multiple-output (MIMO)
channel inversion.
The grid of control points inside the listening area used in SFR is designed following the equivalence of sound reproduction in a continuous and sampled spatial
domain, proved in the next section. Furthermore, SFR uses only those loudspeakers
that mostly contribute to the sound field reproduction, as this strategy is already
known to mitigate high-frequency spatial aliasing problems characteristic of loudspeaker arrays (Verheijen, 1997; Corteel, 2006; Corteel et al., 2008). The active loudspeakers are selected based on geometry, similar to (Verheijen, 1997). Also, in order
to avoid over-fitting, the control points where desired sound field evolution cannot
be locally matched are discarded following similar geometrical considerations. Finally, SFR uses a variant of MIMO channel pseudo-inversion with truncated singular
value decomposition (SVD). This technique allows graceful degradation of sound field
reproduction performance when the MIMO channel matrix is ill-conditioned, while
keeping loudspeaker filters within practical physical system constraints. Being both
117
118
Reproducing Sound Fields Using Acoustic MIMO Inversion
setup and listening area optimized, SFR is able to achieve higher sound field reproduction accuracy than Wave Field Synthesis (Berkhout, 1988), as will be shown with
simulations.
6.1.1
Background
The first spatial sound reproduction systems date back to the work of Blumlein (1931)
on stereo systems in the first half of the last century. The successful two-channel stereo
principle—still used widely today—was extended to the four-channel quadraphonic
system (Scheiber, 1971) with the aim of providing full-circle spatial reproduction, but
it was quickly abandoned, possibly due to unsatisfactory front localization, technical
issues, and format incompatibilities.
Surround systems using a higher number of channels, such as 5.1 (Allen, 1991;
ITU-775, 1994) and 7.1, are based on the observation that accurate localization of
the sound coming from the front is more important, and they use more loudspeakers in
the front of the listener for improved frontal localization. Additionally, loudspeakers
on the side and behind the listening position are used for providing ambience and
side/rear localization.
All previously mentioned surround sound systems create sound fields with correct spatial attributes only within a narrow listening area called the “sweet spot”.
The problem of extending the listening area was addressed by two notable surround
sound systems: Ambisonics (see Gerzon, 1973, 1980b) and Wave Field Synthesis (see
Berkhout, 1988; Berkhout et al., 1992, 1993). Both approaches are based on an attempt to reproduce a desired sound field in an extended listening area.
The theoretical foundations of Ambisonics were laid down in the 1970s (see Cooper
and Shiga, 1972; Gerzon, 1973, 1980b), primarily for circular and spherical loudspeaker arrangements. At the heart of the ambisonic reproduction technique is the
sound field mode matching in one, central listening spot. In this particular case, mode
matching implies matching orthogonal components, such as cylindrical or spherical
harmonics, of desired and reproduced sound fields. The early ambisonic systems
suffered from a limited sweet-spot size, particularly at medium and high frequencies (Bamford and Vanderkooy, 1995), due to the use of modes of low order, an
insufficient number of loudspeakers, and far-field loudspeaker models. Later, Daniel
et al. (2003) provided extensions to the initial works on Ambisonics, considering higher
order modes and modeling loudspeakers as point sources to more accurately account
for propagation effects. Near-field higher-order Ambisonics was shown to have comparable performance to WFS for enclosing loudspeaker configurations (Daniel et al.,
2003), but for practical systems, it lacks recording support in the form of a wide-band
high-order sound field microphone.
Wave Field Synthesis (WFS) systems, on the other hand, are based on the Helmholz
integral equation (HIE), presented in Section 2.3. Recall that HIE shows how a desired
sound field in a closed source-free (listening) domain can be reproduced by a continuous distribution of secondary monopole and dipole sources on the domain boundary.
In the initial works (e.g., see Berkhout, 1988; Berkhout et al., 1993), WFS was derived starting from Rayleigh’s I and II integrals. Recall that for Rayleigh’s I and II
integrals, the listening domain is a half-space and secondary sources—monopole in
the former and dipole in the latter—are distributed on the bounding plane. Since
6.1 Introduction
119
the reproduction in the horizontal plane is far more important than in the vertical plane (Blauert, 1997), the creators of WFS focused on linear loudspeaker setups
and to approximate the performance of planar source distributions. Using stationary
phase approximation, they were able to derive the so-called 21/2 -dimensional WFS,
which is able to approximately reproduce a desired sound field in the listening plane,
while reproducing it exactly on a reference listening line. The initial WFS concept
was extended by Start (1996) to include curved loudspeaker distributions, and Verheijen (1997) and de Vries (1996) to reproduce arbitrarily directive sources and use
directional loudspeakers, respectively.
More recently, Ahrens and Spors (2010) proposed an approach called Spectral Division Method (SDM), which formulates the sound field reproduction using planar and
linear loudspeaker distributions as a spatio-temporal spectral inversion. They have
shown that in some particular cases, such as the reproduction of plane waves using
monopole sources, one is able to obtain a correct closed-form solution for loudspeaker
driving signals.
However powerful as theoretical tools, the mentioned approaches for sound field
reproduction need to cope with limitations imposed by systems used in practice.
Namely, they need to be applied to discrete loudspeaker distributions of limited spatial support, varying directivity and possibly multi-path propagation, while listening
domains are of finite size. Some of these issues have been addressed by Verheijen
(1997), who proposed the use of geometry-based loudspeaker subset selection and
spatial tapering towards the loudspeaker array edges, to mitigate the impairments
due to spatial aliasing and diffraction, respectively. Corteel (2006, 2007) used a similar loudspeaker selection method, while Spors (2007) proposed a slightly different
approach based on the direction of sound intensity vectors.
More recent approaches for computing loudspeaker filters for sound field reproduction use a discretization of the listening area and numerically solve a discrete
optimization problem. This is partly due to unsatisfactory performance of analytical solutions in practical problems, and partly to avoid restricting the reproduction
setup (e.g., having calibrated loudspeakers of prescribed directivity, free-field conditions etc.). One of the earliest numerical approaches, by Kirkeby and Nelson (1993),
addresses reproduction of plane waves based on pseudo-inversion of a multichannel
acoustic propagation matrix. Similar to (Kirkeby and Nelson, 1993), but applied
to WFS for the purpose of room equalization and directive source reproduction, are
the works (Corteel, 2006, 2007), where the control points are arranged on four listening lines. Gauthier and Berry (2007) use only four control points arranged as a
quadrupole to compute loudspeaker filters that optimize a cost function consisting of
the reproduction error and deviation from the initial WFS driving signals.
6.1.2
Chapter outline
Section 6.2 describes theoretical and practical aspects of SFR. Section 6.3 presents
extensive evaluation of SFR and its comparison with WFS. Practical considerations
for realizing SFR systems are discussed in Section 6.4. Conclusions are presented in
Section 6.5.
120
6.2
Reproducing Sound Fields Using Acoustic MIMO Inversion
Sound Field Reconstruction
As mentioned in the introduction, Sound Field Reconstruction (SFR) is a spatial
sound reproduction approach which is based on the spectral properties of the plenacoustic function shown by Ajdler et al. (2006). More particularly, it is based on the
essential spatial band-limitedness1 of the sound field that emanates from temporally
band-limited sound sources. This section provides a description of sampling and interpolation of the plenacoustic function, and shows how these can be used for sound
field reproduction with arbitrary reproduction setups. It also describes practical extensions that help to improve the sound field reproduction with SFR in specific finite
domains, and briefly presents the design of discrete-time loudspeaker filters for SFR.
6.2.1
Plenacoustic sampling and interpolation
In the most general sense, the plenacoustic function p(r, t) describes a sound field
in space and time, irrespective of the sources evoking it. In a particular case where
the sound field is evoked by a point source located at r 0 emitting a Dirac pulse, the
plenacoustic function equals the time-dependent Green’s function g(r, t|r 0 ),2 i.e., the
spatio-temporal impulse response of the acoustical medium from point r 0 to point r.
The changes of the plenacoustic function in space at a temporal frequency ω can
not happen at an arbitrary rate, but are limited by ω according to the relation (Ajdler
et al., 2006)
2
(6.1)
k 2 ≤ ωc2 ,
where k is the spatial frequency and c is the speed of sound propagation.3
Based on the observation (6.1), one can define a minimum spatial sampling frequency for a sound field of limited temporal bandwidth. If the maximum temporal
frequency of the sources evoking the sound field is equal to ωm , then a spatial sampling frequency of ks = 2ωm /c is sufficient for representing the sound field. This
observation extrapolates to a large extent to finite spatial segments (e.g., finite-length
lines and finite-area rectangles), as shown in (Ajdler et al., 2006).
The possibility of sampling a sound field has an implication that is useful in the
context of SFR. Namely, it suggests that correct reproduction of a sound field on a grid
of points can guarantee correctness of reproduction between the grid points. Without
loss of generality, we show this result for the xy-plane in Theorem 1. However, we
first give a lemma which simplifies proving said theorem.
Lemma 1. If two functions f (x, y, t) and h(x, y, t), both band-limited in spatiotemporal frequency domain, with the maximum temporal frequency ωm and maximum
2
spatial frequency km = ωm /c (k 2 = kx2 + ky2 ≤ km
), are identical on a 2D grid
(n∆x, m∆y) ,
n, m ∈ Z ,
1 Essentially band-limited function in this context refers to a function which has most of its energy
in a finitely-supported spectral region, while the energy outside of that region decays exponentially.
time-dependent Green’s function has the form g(r, t|r 0 , t0 ), but we implicitly assume that
the starting time is t0 = 0 and omit it for brevity.
2 The
3 The given limit on the spatial variations of sound pressure is not entirely correct, but it is to a
large extent when sources are not inside of or very close to the considered spatial domain.
6.2 Sound Field Reconstruction
121
where
∆x ≤
km
π ,
∆y ≤
km
π
,
then they are identical everywhere.
Proof. The proof follows from the fact that a band-limited function (e.g., a sound
field) is uniquely defined by its samples on a grid satisfying the Nyquist criterion.
Since the functions f (r, t) and h(r, t) are both band-limited with the same spectral
support, and have identical values on a sampling grid satisfying the Nyquist criterion,
they must be identical everywhere.
Theorem 1. Consider two sets of sound sources, S1 = {s11 (t), ..., s1k (t)} and S2 =
{s21 (t), ..., s2l (t)}, where each source is band-limited with the maximum temporal frequency ωm . Denote by rij the location of source sij (t). Assume that the spectral
support of each Green’s function g(r, t|rij ), evaluated in the xy-plane, is confined to
the double cone
2
k 2 = kx2 + ky2 ≤ ωc2 .
The sound field in the xy-plane evoked by source sij (t) is given by
Z
pij (x, y, t) = g(x, y, τ |rij ) sij (t − τ ) dτ .
Further, let P1 (x, y, t) and P2 (x, y, t) be the superposed sound fields of the sources in
S1 and S2 , respectively, given by
X
Pi (x, y, t) =
pij (x, y, t) .
j
The two sound fields, P1 (x, y, t) and P2 (x, y, t), are identical in the entire xy-plane if
they are identical on a grid given by
(n∆x, m∆y) ,
with
∆x ≤
n, m ∈ Z ,
(6.2)
ωm
ωm
, ∆y ≤
.
cπ
cπ
Proof. The sound field of each source is band-limited in time and space, with maximum temporal and spatial frequencies ωm and km = ωm /c, respectively. Consequently, the sound fields P1 (x, y, t) and P2 (x, y, t), being superpositions of functions
band-limited in space and time, are also band-limited with the same maximum frequencies. Their equality follows from Lemma 1.
Note also that even if the spatio-temporal spectrum of Green’s function is not
confined to the double cone defined by k 2 ≤ ω 2 /c2 , its propagating part is.4 Therefore,
it follows that the propagating parts of two sound fields are equal if they are equal
on the grid given by (6.2).
4 As mentioned previously, the propagating part contains essentially the entire energy of a sound
field.
122
Reproducing Sound Fields Using Acoustic MIMO Inversion
Based on this observation, the problem of reproducing the sound field that emanates from temporally band-limited sources with maximum temporal frequency ωm
is equivalent to reproducing the sound field on a grid of control points spaced at or
above the Nyquist spatial sampling frequency ks = 2ωm /c. In the case of practical sound field reproduction with an array of loudspeakers, the listening area, and
thus also the control grid, are finite. Consequently, sound field reproduction can be
expressed as a MIMO problem.
6.2.2
Sound Field Reconstruction using MIMO channel inversion
MIMO channel inversion is a standard problem that reappears in many multichannel
sound applications, such as multi-point room equalization, sound field reproduction,
and beamforming (see Kirkeby and Nelson, 1993; Kirkeby et al., 1998; Corteel, 2006).
For the sake of completeness, we will present the MIMO channel inversion problem
and the particular solution used in SFR.
The problem of MIMO channel inversion in the context of sound field reproduction
is illustrated in Figure 6.1. The reproduction setup includes an array of L loudspeakers
and a grid of M control points covering the listening area, illustrated in Figure 6.1(a).
In addition, as shown in Figure 6.1(b), there is a desired acoustic scene that contains
N sound sources that would evoke the desired sound field in the listening area.
Positions of loudspeakers, control points, and desired sources are known. The
transfer function Aij (ω) denotes the sound propagation channel between the jth desired source and ith control point. Similarly, Gik (ω) denotes the sound propagation
channel between the kth loudspeaker and the ith control point. Both Aij (ω) and
Gik (ω) are known for all pairs desired source-control point and loudspeaker-control
point, respectively, either through a theoretical model or through measurement.
The goal of the MIMO channel inversion in the context of SFR is the reproduction
of the desired sound scene in M control points, i.e., computation of the loudspeaker
driving signals that evoke the same signals at the control points as the original sound
scene.
desired
sources
loudspeaker
array
listening
area
control
points
(a) Reproduction setup
listening
area
control
points
(b) Desired sound scene
Figure 6.1: Multichannel inversion problem overview.
Note that the problem of multichannel inversion can be represented as a superposition of N independent sub-problems, each involving a single desired source. The
6.2 Sound Field Reconstruction
123
loudspeaker signals can then be obtained by summing the contributions for each
single-source sub-problem. Thus, without loss of generality, the following MIMO
channel inversion analysis is presented only for the first desired source.
Denote by S1 (ω), Xj (ω), and Yk (ω) the Fourier transforms of signals of the desired
source, the output of the jth loudspeaker, and the sound pressure at the kth control
point, respectively. Furthermore, denote by Dl (ω) the signal at the lth control point
in the desired sound scene containing only the first desired source.
The signals Di (ω) are determined by the effects of the sound propagation paths
from the desired source to the control points, and are described by the following
product:
D(ω) = A(ω) S1 (ω) ,
(6.3)
where
T
D(ω) = [D1 (ω) D2 (ω) . . . DM (ω)]
T
A(ω) = [A11 (ω) A21 (ω) . . . AM 1 (ω)] .
The signals produced by the loudspeakers at the control points are determined by
the sound propagation effects on the loudspeaker signals, and are given by
Y (ω) = G(ω) X(ω) ,
(6.4)
where
T
Y (ω) = [Y1 (ω) Y2 (ω) . . . YM (ω)]


G11 (ω) G12 (ω) . . . G1L (ω)
 G21 (ω) G22 (ω) . . . G2L (ω) 


G(ω) =  .

..
. . ..
 ..

.
.
.
GM 1 (ω) GM 2 (ω) . . . GM L (ω)
T
X(ω) = [X1 (ω) X2 (ω) . . . XL (ω)] .
The task of the multichannel inversion is to compute the signals Xj (ω) using the
desired signal S1 (ω), i.e.,
X(ω) = H1 (ω) S1 (ω) ,
(6.5)
where
T
H1 (ω) = [H11 (ω) H21 (ω) . . . HL1 (ω)] ,
such that the difference (error) between the vector Y (ω) and vector D(ω), corrected
by a constant delay ∆ accounting for the propagation time differences or the modeling
delay, is minimized. The multichannel inversion problem is illustrated in Figure 6.2.
The solution which minimizes the error power, i.e., the mean squared error (MSE)
solution, is given by (e.g, see Nelson and Elliott, 1992)
H1 (ω) = e−iω∆ G+ (ω) A(ω) .
(6.6)
Since it uses a pseudo-inverse G+ (ω) of the transfer matrix G(ω), finding the pseudoinverse of the matrix G(ω) becomes the central problem of MIMO channel inversion.
Reproducing Sound Fields Using Acoustic MIMO Inversion
124
Figure 6.2: Block diagram illustrating the MIMO channel inversion problem.
The classical full-rank pseudo-inverse expression is given by
G+ (ω) = GH (ω) G(ω)
−1
GH (ω) ,
(6.7)
where the matrix GH (ω) is the conjugate-transpose of the matrix G(ω). At low frequencies, where the condition number of the matrix G(ω) is large (making it effectively
low-rank), (6.7) gives filters with gains beyond the physical limitations of practical
loudspeakers. The regularized pseudo-inversion used in (Kirkeby et al., 1998; Corteel,
2006) is also of limited use, as it does not allow easy control of the trade-off between
the reproduction accuracy and maximum filter gains.
Like a number of MIMO inversion solutions in acoustics (e.g., see Hannemann and
Donohue, 2008), we use a pseudo-inversion method based on the truncated singular
value decomposition (SVD), which prunes singular values that are below a defined
threshold (see Golub and Kahan, 1965). In particular, if
G(ω) = U (ω) Σ(ω) V H (ω)
(6.8)
is the SVD of the matrix G(ω), then the pseudo-inverse of the matrix G(ω) is given
by
G+ (ω) = V (ω) Σ+ (ω) U H (ω) ,
(6.9)
where the matrix Σ+ (ω) is obtained from Σ(ω) by first setting to zero the singular
values whose absolute values are below a defined threshold , replacing the other
singular values by their reciprocal, and taking the matrix transpose in the end (Golub
and Kahan, 1965). The threshold can be adapted to the matrix G(ω), i.e., it can
be set to a fraction of the largest singular value of G(ω).5
At high frequencies, where all singular values of the matrix G(ω) are larger than
the threshold, this procedure gives the result identical to (6.7). However, at low
frequencies, it gives near-optimal solutions while keeping the loudspeaker filter gains
within practical limits. A more detailed treatment of this MIMO channel inversion
problem is given in (Kolundžija et al., 2009b).
6.2.3
Practical extensions of Sound Field Reconstruction
Filter correction through power normalization on the reference line
All sound field reproduction approaches give loudspeaker driving signals that do not
provide correct sound field reproduction accuracy above a certain aliasing frequency.
5 In the simulations presented in Section 6.3, the threshold was 20 dB below the largest singular
value of G(ω).
6.2 Sound Field Reconstruction
125
Although coming from the same physical limitations as in approaches such as
WFS, and which are inherent to the geometry of the used loudspeaker array and location of the reproduced source, the high-frequency problems of SFR can be explained
from another perspective. Namely, at high frequencies, where the constructive interference of sound fields of different sources can not be achieved, the least mean squared
error solution is biased towards highly attenuating all signals, such that the reconstruction error approaches the desired signal.6
A way of avoiding the aforementioned problems—although not providing the correct reproduction in the wide listening area—is normalizing the filters’ gains at all frequencies such that on a grid of control points, the average power of the reproduced field
is equal to the average power of the desired field. In particular, if A1 (ω), . . . , AM (ω)
and Y1 (ω), . . . , YM (ω) are respectively the amplitudes of the desired and the reproduced sound field at frequency ω in M control points, then each loudspeaker filter is
corrected by
H̃i (ω) = cf (ω) Hi (ω) ,
(6.10)
where cf (ω) is a correction factor given by
qP
M
i=1
A2i (ω)
i=1
Yi2 (ω)
cf (ω) = qP
M
.
(6.11)
Loudspeaker subset selection
While it might seem beneficial to use all loudspeakers for reproduction with SFR, there
are many cases where using only a subset of loudspeakers can give better reproduction
provided the optimization is done for a specific finite listening area. This observation
for WFS was given by Verheijen (1997), and was later used by various authors (see
Corteel, 2006, 2007; Corteel et al., 2008), where it was shown how based on the
location of the primary (reproduced) source and the listening area, one can select
a sub-array of loudspeakers which physically contribute the most to the sound field
reproduction.
There is a plausible explanation for such a selection. Considering the case where
an impulsive sound arrives from the primary source, one expects that at all locations in the listening area, the received sound is of similar duration and consequently
without significant spectral impairments. However, using all loudspeakers makes a
combination of the impulse responses—due to different delays—more spread in time
and more varying across different positions than in the case when only a subset of
loudspeakers is used, causing both temporal and spectral deviations.
The selection procedure considers only those loudspeakers that are inside—extended
by a predefined selection margin—the cone defined by the primary source and the
boundaries of the listening area, as shown in Figure 6.3. The rationale behind such a
choice is twofold: first, it uses the loudspeakers whose contribution is largest when all
loudspeakers are used, preserving most of the reconstruction accuracy, and second,
6 This is a known phenomenon in Wiener filtering, where at low SNRs, the gain of the Wiener
filter approaches zero.
126
Reproducing Sound Fields Using Acoustic MIMO Inversion
Loudspeaker line
Selection margin
Primary source
Visible
loudspeakers
Selected
loudspeakers
Listening area
Selection margin
Figure 6.3: Illustration of loudspeaker selection based on the primary source position.
The visible loudspeakers are inside of the angle subtended by the listening area to the
primary source. Loudspeakers within a selection margin of the visible loudspeakers are
also selected.
the active loudspeakers have lowest delay spreads due to differences in propagation
distance and their position relative to the sound wavefront.
Figure 6.4 illustrates how reproduction accuracy of SFR does not change notably
when only a subset of six (out of 18) loudspeakers is used for reproducing a sinusoid
at frequency f = 500 Hz. The selected loudspeakers lie in the minimal cone centered
at the position of the primary source that contains the listening area. Additionally,
loudspeakers outside the minimal cone but within a selection margin can also be
selected.
Control points selection
One also needs to be careful with control point selection, since due to physical limitations set by the loudspeaker array and primary source locations, in some parts of the
listening area it is impossible to reproduce the evolution of the desired sound field.
Thus, it is physically justified to place control points at locations where the sound
wave fronts from the primary source and loudspeakers roughly move towards the same
direction.
In the case of primary point sources, control points form a subset of the reference
grid which lies inside of a cone defined by the primary source and the active loudspeakers, as shown in Figure 6.5(a). For plane wave sources, control points form a subset
of the reference grid which lies inside of a stripe defined by the active loudspeakers
and the plane wave propagation direction, as shown in Figure 6.5(b).
It should also be noted that the selected control points should not lie near the
loudspeaker array or the primary source in order to avoid the solution’s sensitivity to
evanescent (non-propagating) waves. Evanescent waves (Williams, 1999) are a local
phenomenon which does not persist with increased distance. Thus, taking sound
6.2 Sound Field Reconstruction
127
(b)
4
3
3
y [m]
y [m]
(a)
4
2
1
0
2
1
4
6
x [m]
8
0
4
xl
6
x [m]
8
(d)
(c)
−13.5
4
−14
y [m]
3
−14.5
2
−15
−15.5
1
desired
SFR full
SFR subset
−16
0
4
xl
6
x [m]
8
−16.5
0
1
2
3
4
Figure 6.4: Comparison of SFR with the entire loudspeaker array from Figure 6.9
and SFR with only a sub-array of six selected loudspeakers used to reproduce a point
source with frequency f = 500 Hz located at rm = (3 m, 1 m). The used loudspeakers
are marked with squares. (a) Snapshot of the desired sound field; (b) snapshot of
the sound field reproduced using all loudspeakers; (c) snapshot of the sound field reproduced with the selected loudspeaker sub-array; (d) magnitude response of the three
sound fields on the reference line at frequency f = 500 Hz.
propagation functions that contain significant evanescent wave energy amounts to
model over-fitting and compromises the sound field reproduction accuracy in a larger
listening area.
6.2.4
Designing discrete-time filters for Sound Field Reconstruction
The frequency-domain SFR filter design procedure uses a non-linear step of discarding
small singular values of the system matrix G(ω). The resulting frequency response
and the distribution of singular values at different frequencies are shown in Fig 6.6.
Apparently, SFR filters H̃k (ω) have abrupt changes around frequencies where singular
values cross the predefined threshold . As a consequence, filters H̃k (ω) have long
impulse response, which is the main obstacle for designing practical, short discretetime SFR filters.
However, it turns out that filters h̃k (t), despite of being piecewise smooth functions with a few discontinuities, are well localized in time and most of their energy
is concentrated around one main pulse, as shown in Figure 6.7. Therefore, shorten-
Reproducing Sound Fields Using Acoustic MIMO Inversion
128
Reference grid
Reference grid
Loudspeaker line
Loudspeaker line
Primary
source
Selected
points
Selected
points
(a)
(b)
Figure 6.5: Illustration of control points selection based on positions of the used
loudspeakers and position or direction of the primary source. The selected control
points lie inside of the cone or stripe defined by the loudspeaker positions and primary
source position or direction, respectively. (a) Point source reproduction; (b) plane
source reproduction.
(a)
8
σ1
σ2
6
σ3
σ
σ4
σ5
4
σ6
ε
2
0
200
400
600
800
1000
f [Hz]
1200
1400
1600
1800
(b)
filter gain [dB]
−8
H1
−10
H
2
−12
H3
H4
−14
H
5
−16
H6
−18
−20
0
200
400
600
800
1000
f [Hz]
1200
1400
1600
1800
Figure 6.6: (a) Singular values of the loudspeaker propagation matrix G(ω) at different frequencies ω; (b) magnitude response of SFR filters Hk (ω) obtained from the
SFR frequency-domain filter calculation procedure.
6.2 Sound Field Reconstruction
129
ing h̃k (t) does not severely affect the reproduction accuracy, and enables designing
efficient discrete-time filters as combinations of a pure delays δNk [n] and a short FIR
filters hk [n] (Kolundžija et al., 2009c).
Time-domain
aliasing
Figure 6.7: Conceptual illustration of discrete-time SFR filter design. (1) removing
delay dk ; (2) frequency sampling using IDFT; (3) shortening of the filter H̃k (ω), given
in (6.10).
The SFR discrete-time filter design procedure, illustrated in Figure 6.7, use the
following three steps:
• Delay removal: The main peak of filters h̃k (t) can have a long delay for sources
far away from the loudspeaker array. In order to avoid using excessively long
filters that can accommodate a wide range of different delays, the filters’ delays
are extracted and realized separately. The delay dk of the main peak of the
SFR filter h̃k (t) is extracted considering source-loudspeaker distance and using
regression of the phase characteristics of the filter H̃k (ω) (Kolundžija et al.,
2009c).
• Frequency sampling: At the same time, the problem of frequency sampling of
filters H̃k (ω) needs to be solved. In other words, it is necessary to choose the
length NT of the inverse discrete Fourier transform (IDFT) used to obtain the
discrete-time impulse response h̃k [n]. NT needs to be large enough to give a low
time-domain aliasing error. In the setup we used for evaluations, described in
Reproducing Sound Fields Using Acoustic MIMO Inversion
130
the next section, NT = 2048 turned out to be long enough for avoiding notable
aliasing artifacts at sampling frequency fs = 48 kHz.
• Impulse response windowing and delaying: In the end, h̃k [n] is shortened with a
tapering window w[n] of length NF (NF < NT ) and delayed by NF /2 in order
to make it causal.
Figure 6.8 shows an SFR filter of length NF = 512 samples obtained by the
described procedure with IDFT length NT = 2048, for the sampling frequency fs =
48 kHz.
0.3
hdi [n]
0.2
0.1
0
−0.1
0
50
100
150
200
250
sample
300
350
400
450
500
Figure 6.8: SFR filter of length NF = 512 samples obtained from frequency response
H̃k (ω) using a DFT of length NT = 2048.
6.3
Evaluation
SFR was evaluated with simulations of the sound reproduction setup shown in Figure 6.9. The performance of SFR was compared with two different variants of WFS:
• WFS I : Basic WFS, as proposed in the initial works on WFS (Berkhout, 1988;
Berkhout et al., 1992).
• WFS II : WFS that uses loudspeaker selection procedure described in Section 6.2.3, variants of which were proposed by Verheijen (1997) and Corteel
(2006); Corteel et al. (2008).
In both variants of WFS, we used a double-sided frequency-independent half-cosine
tapering window to mitigate the edge effects. The length of the taper on each end
was 15 % of the loudspeaker array length. WFS filters were computed starting from
the loudspeaker driving function formulas found in WFS literature (Berkhout et al.,
1993; Verheijen, 1997). Additionally, a common correction filter cf (ω) was computed
and applied to all active loudspeakers in order to achieve the desired average power
on the points on the reference line. This procedure was described in Section 6.2.3.
The reproduction setup, which is shown in Figure 6.9, consists of 18 loudspeakers
spaced at 15 cm. Loudspeakers are modeled as point sources emitting spherical waves.
The reproduced primary sources, on the other hand, are modeled as both point and
plane sources, emitting spherical and plane waves, respectively.
6.3 Evaluation
131
Loudspeaker
line
Reference
line
Control
points
RE
Primary source
2cm
RC
4m
Listening
area
15cm
72cm
4m
6m
RS
8m
10m
Figure 6.9: A sound field reproduction setup using a linear loudspeaker array of 18
loudspeakers spaced at ∆l = 15 cm. The listening area is a square of size 4 m, 2 m
in front of the loudspeaker array.
Two different sets of simulations were performed. The first set gives insight into
the reproduction accuracy of the tested approaches both in the frequency and time
domain through sound field snapshots. The second set of simulations gives a more
thorough quantitative performance analysis of the tested approaches. It does so by
exhaustively analyzing magnitude frequency responses and group delay errors for a
large number of reproduced sources and a large number of listening positions.
6.3.1
Sound field snapshot analysis
Sinusoidal sources
The first simulation analyzes the spatial accuracy of reproduction of a sinusoidal
(single-frequency) point source. It compares snapshots of the desired sound field and
sound fields reproduced with the three tested approaches. Figures 6.10 and 6.11
show comparisons for the reproduction of a sinusoidal point source at frequencies
f1 = 500 Hz and f2 = 2 kHz, respectively.
Low-frequency reproduction, as can be observed in Figure 6.10, is accurate with
all three simulated approaches. However, as the frequency increases, aliasing artifacts
begin to appear. Figure 6.11 shows the difference in the aliasing artifacts between the
three approaches. While WFS I has visible aliasing artifacts along the entire listening
area (visible as zero responses along multiple directions), SFR and WFS II have only
few directional nulls at the periphery of the listening area, and thus preserve spatial
reproduction accuracy up to higher frequencies.
Reproducing Sound Fields Using Acoustic MIMO Inversion
132
(b)
4
3
3
y [m]
y [m]
(a)
4
2
1
0
2
1
4
6
x [m]
0
8
4
xl
4
3
3
2
1
0
8
(d)
4
y [m]
y [m]
(c)
6
x [m]
2
1
4
xl
6
x [m]
8
0
4
xl
6
x [m]
8
Figure 6.10: Comparison of WFS and SFR in reproducing a point source with frequency f1 = 500 Hz located at rm = (3 m, 1 m). The used loudspeakers are marked
with squares. Sound field snapshots: (a) desired, (b) WFS I, (c) WFS II, and (d)
SFR.
Low-pass filtered pulse train
The second simulation shows differences between WFS and SFR from a different perspective. Namely, while the first simulation focused on spatial reproduction accuracy
as a function of frequency, this simulation focuses on the spatial reproduction accuracy
in a wide range of frequencies.
The reproduced primary source is a plane source at angle φ = 180◦ emitting a
train of low-pass filtered pulses p(t) spaced in time by Tp = 4 ms. The shape of a
single pulse is shown in Figure 6.12.
Figure 6.13 shows snapshots of the desired sound field and the sound fields reproduced with SFR and the two variants of WFS. Note that in this scenario loudspeaker
selection has no effect and as a consequence all three approaches use all loudspeakers
and the two variants of WFS are identical. From the snapshots of the reproduced
fields, it is apparent that the shape of sound wave fronts is accurately reproduced
across the listening area with both WFS and SFR. However, observing the amplitude
of the emitted pulses across the listening area, one can see that with WFS amplitude
notably decreases towards the sides. SFR, on the other hand, does not suffer from
this problem.
Figure 6.14, which shows magnitude frequency responses in the center and at
both ends of the listening line (located four meters in front of the loudspeaker array),
corroborates the previous visual observation from the sound field snapshots. In par-
6.3 Evaluation
133
(b)
4
3
3
y [m]
y [m]
(a)
4
2
1
2
1
0
4
6
x [m]
0
8
4
xl
6
x [m]
(d)
4
4
3
3
y [m]
y [m]
(c)
8
2
1
2
1
0
4
xl
6
x [m]
0
8
4
xl
6
x [m]
8
Figure 6.11: Comparison of WFS and SFR in reproducing a point source with frequency f1 = 2 kHz located at rm = (3 m, 1 m). The used loudspeakers are marked
with squares. Sound field snapshots: (a) desired, (b) WFS I, (c) WFS II, and (d)
SFR.
p(t)
0.1
0.05
0
0
0.5
1
1.5
2
2.5
t [ms]
Figure 6.12: A low-pass pulse with a cut-off frequency fc = 3 kHz used for constructing a pulse train.
ticular, it shows how SFR’s low-frequency magnitude response at both ends of the
listening line is flatter and less attenuated relative to the desired characteristics when
compared to WFS.
At this point, some preliminary conclusions can be drawn about the advantages
of SFR over WFS. Compared to WFS I, SFR provides more graceful degradation
of reproduction accuracy across the listening area as the frequency increases. This
effectively means that compared to WFS I, SFR increases both the aliasing frequency
Reproducing Sound Fields Using Acoustic MIMO Inversion
134
(b)
4
3
3
y [m]
y [m]
(a)
4
2
1
0
2
1
4
6
x [m]
0
8
4
xl
4
3
3
2
1
0
8
(d)
4
y [m]
y [m]
(c)
6
x [m]
2
1
4
xl
6
x [m]
8
0
4
xl
6
x [m]
8
Figure 6.13: Comparison of WFS and SFR in reproducing a plane source located at
an angle φ = π that emits a train of low-pass pulses with a period of Tp = 4 ms. The
used loudspeakers are marked with squares. Sound field snapshots: (a) desired, (b)
WFS I, (c) WFS II, and (d) SFR.
margin and enlarges the effective listening area. Furthermore, loudspeaker subset
selection in WFS II helps decreasing the aliasing artifacts as the frequency increases,
but as will be shown next, this improvement comes at the cost of increasing average
magnitude spectral deviations across the listening space.
6.3.2
Impulse response analysis
In order to remove the influence of particular source and listening positions and make
a more general observation about the performance of WFS and SFR, we performed
a number of simulations involving multiple primary source and listening positions.
The simulated primary sources—30 altogether—were divided into three different categories, with each category containing ten sources (see Figure 6.15):
• Type I : Focused, frontal point sources located inside of a triangle whose vertexes
coincide with two outer loudspeakers and the point C(6 m, 2 m), regularly
spaced along the x axis and with y coordinates chosen uniformly at random
within the triangle boundaries.
• Type II : Point sources located closely behind the loudspeaker array. In the
simulations, these sources were regularly spaced along the x axis between x = 0
and x = 4 m, and their y coordinates were chosen uniformly at random between
y = 0 and y = 4 m.
6.3 Evaluation
135
(a)
Y / Y d [ dB]
10
0
−10
−20
−30
2
10
3
4
10
10
f [Hz]
(b)
Y / Y d [ dB]
10
0
−10
−20
−30
2
10
WFS
SFR
3
4
10
10
f [Hz]
(c)
Y / Y d [ dB]
10
0
−10
−20
−30
2
10
3
4
10
10
f [Hz]
Figure 6.14: Normalized magnitude frequency responses of WFS and SFR in the
control points (a) RS (8 m, 0), (b) RC (8 m, 2 m), and (c) RE (8 m, 4 m) relative to
a plane wave source located at an angle φ = π.
• Type III : Point sources far away from the loudspeaker array, which were modeled
as sources emitting plane waves. In the performed simulations, these plane wave
sources were positioned at ten regularly-spaced directions between 165◦ and
180◦ .
The three categories of simulated primary sources are illustrated in Figure 6.15
The impulse responses for each simulated primary source were computed on a
finite rectangular grid of listening points spaced at 1 m along the x axis and 50 cm
along the y axis, shown in Figure 6.15.
For each primary source category, we formed aggregated plots containing statistics
of normalized magnitude frequency responses and group delay errors for all listening
points and all primary sources in the category.
The normalized magnitude frequency responses are given by
Yn (f ) =
Y (f )
,
Yd (f )
(6.12)
where Y (f ) is the reproduced field’s magnitude response in a listening point and Yd (f )
is the desired magnitude response in that point.
The group delay error eτ (f ) is given by the difference between the group delay
τg (f ) of the reproduced impulse response in a listening point and the group delay
τgd (f ) of the desired impulse response in that point
eτ (f ) = τg (f ) − τgd (f ) .
(6.13)
136
Reproducing Sound Fields Using Acoustic MIMO Inversion
Listening points
Loudspeaker line
Type III
Type II
Type I
15cm
4m
1m
72cm
4m
C
50cm
2m
4m
Figure 6.15: Sound field reproduction setup showing three categories of simulated
primary sources: Type I, Type II, and Type III, and a grid of listening points covering
the listening area, where responses are computed.
The plots contain 5 − 95 percentiles, 25 − 75 percentiles, and the median value of
the said quantities across the audible frequency range.
The aggregated statistical plots of magnitude and group delays provide insight
not only into how accurate the tested approaches are on average, but they also show
to what extent reproduction accuracy varies across space.
Statistical magnitude frequency response plots
Figures 6.16, 6.17, and 6.18 show the previously described magnitude frequency response statistical plots for primary sources of Type I, II, and III, respectively. It
can be seen that with SFR, the 25 − 75 percentiles of magnitude frequency responses
are within 2 dB of the desired responses up to around 4 kHz for all three primary
source categories. The median of SFR’s normalized magnitude response lies at 0 dB
across low frequencies, as opposed to the median magnitude responses of the two
WFS approaches, which vary around 0 dB. Although 5 − 95 percentiles exhibit variations around the median up to around 10 dB, meaning that for some source-listening
position pairs the reproduced impulse response differs significantly from the desired
one, they are notably smaller than the variations of the corresponding percentiles of
magnitude frequency responses of the two WFS approaches. Above the frequency of
4 kHz, the three approaches perform similarly due to spatial aliasing.
It should be noted that WFS II exhibits more spectral magnitude artifacts in the
extended listening area compared to WFS I. It can thus be said that the previously
observed improvement of aliasing performance, apparent in Figure 6.11, comes at the
cost of reducing the listening area size. This observation was also reported by Corteel
et al. (2008).
6.3 Evaluation
137
WFS I
Y / Y d [ dB]
5
0
−5
−10
−15
2
10
3
10
f [Hz]
WFS II
4
10
Y / Y d [ dB]
5
0
−5
−10
−15
2
10
3
10
f [Hz]
SFR
4
10
Y / Y d [ dB]
5
0
−5
−10
−15
2
10
3
10
f [Hz]
4
10
Figure 6.16: Normalized magnitude frequency responses of WFS I, WFS II, and
SFR for focused point sources (Type I) on a grid of listening points. Light-gray area
shows 25 − 75 percentiles, dark-gray area shows 5 − 95 percentiles, and solid line shows
the median.
Statistical group delay error plots
Figures 6.19, 6.20, and 6.21 show the group delay error statistical plots for primary
sources of Type I, II, and III, respectively.
From Figures 6.19 and 6.21, it can be observed that the two WFS approaches
have virtually the same group delay performance for focused (Type I) and plane
(Type III) sources. This is not surprising, as for most of the simulated focused and
plane wave sources, both WFS approaches use all loudspeakers. For focused and
plane sources, SFR’s group delay performance is on average better or comparable to
both WFS approaches, as can be observed from 25 − 75 group delay error percentiles.
SFR’s group delay error is more variable in the extreme cases in the frequency range
500 − 2000 Hz, which is apparent from observing 5 − 95 percentiles. Note, however,
that the group delay errors in the frequency range 500 − 2000 Hz are below the
group delay discrimination threshold of 2 ms, as found by Flanagan et al. (2005).7
Therefore, a slightly higher group delay variance of SFR should not be a cause of
notable perceptual artifacts.
Figure 6.20 shows that the group delay performance of WFS II is superior to WFS
I when reproducing point sources behind the loudspeaker array (Type II sources).
SFR, on the other hand, has group delay errors which are similar to WFS II: on
7 See
also (Blauert and Laws, 1978).
Reproducing Sound Fields Using Acoustic MIMO Inversion
138
WFS I
Y / Y d [ dB]
5
0
−5
−10
−15
2
10
3
10
f [Hz]
WFS II
4
10
Y / Y d [ dB]
5
0
−5
−10
−15
2
10
3
10
f [Hz]
SFR
4
10
Y / Y d [ dB]
5
0
−5
−10
−15
2
10
3
10
f [Hz]
4
10
Figure 6.17: Normalized magnitude frequency responses of WFS I, WFS II, and SFR
for point sources behind the loudspeaker array (Type II) on a grid of listening points.
Light-gray area shows 25 − 75 percentiles, dark-gray area shows 5 − 95 percentiles, and
solid line shows the median.
average, SFR is slightly better, but also slightly more variable in the frequency range
500 − 2000 Hz.
6.3.3
Discussion
From the frequency-domain analysis of the impulse responses on a grid of listening
points, three observation can be made.
• It is apparent that the low-frequency response of SFR exhibits little spectral
deviations up to almost 4 kHz for all categories of simulated primary sources
and all listening points. Both WFS variants, on the other hand, suffer from
higher coloration artifacts across space in the low-frequency range.
• It can be seen that more notable spectral deviations across space start at higher
frequencies with SFR when compared to both WFS approaches.
• The average group delay performance of SFR is slightly better than the two WFS
variants, but slightly more variable in the low-frequency range 500 − 2000 Hz.
Nevertheless, the range of group delay error variations in this frequency range is
below the group delay discrimination threshold of 2 ms (Flanagan et al., 2005).
The presented extensive comparisons confirm the previous observation that SFR
provides an effective extension of the listening area with correct sound field reproduc-
6.4 Practical considerations
139
Y / Y d [ dB]
WFS I
0
−10
−20
2
Y / Y d [ dB]
10
4
10
0
−10
−20
2
10
Y / Y d [ dB]
3
10
f [Hz]
WFS II
3
10
f [Hz]
SFR
4
10
0
−10
−20
2
10
3
10
f [Hz]
4
10
Figure 6.18: Normalized magnitude frequency responses of WFS I, WFS II, and
SFR for plane wave sources at directions α ∈ [165◦ , 180◦ ] (Type III) on a grid of
listening points. Light-gray area shows 25−75 percentiles, dark-gray area shows 5−95
percentiles, and solid line shows the median.
tion. Additionally, it raises the reproduction aliasing frequency when compared with
both variants of WFS, as described in Section 6.2.3.
6.4
6.4.1
Practical considerations
Computational complexity
The presented approach to reproducing physical sound fields has an appealing performance in terms of reproduction accuracy. However, it comes at a cost of increased
complexity that stems from solving an MIMO inversion problem in the frequency
domain. For each virtual source, the reproduction system needs to perform SVD of
the M × L matrix G(ω) at N2F + 1 frequencies.8 Since the number of control points
M is usually larger than the number of loudspeakers L, the complexity of obtaining
SFR filters for one virtual source is given by Θ(M 2 LNF ).
High computational complexity makes real-time calculation of loudspeaker filters
with SFR, like with most numerical approaches, difficult. Instead, one can produce
a database of reproduction filters offline, which can then be read in real time for
sound field rendering purposes. This would entail dividing the reproduction zone
8 Real-valued
filters have conjugate-symmetric spectra.
Reproducing Sound Fields Using Acoustic MIMO Inversion
140
τ g − τ gd [ ms]
WFS I
1
0
−1
−2
−3
2
τ g − τ gd [ ms]
10
3
10
f [Hz]
WFS II
4
10
1
0
−1
−2
−3
2
τ g − τ gd [ ms]
10
3
10
f [Hz]
SFR
4
10
1
0
−1
−2
−3
2
10
3
10
f [Hz]
4
10
Figure 6.19: Group delay errors eτ (f ) = τg (f ) − τgd (f ) of WFS I, WFS II, and SFR
for focused point sources (Type I) on a grid of listening points. Light-gray area shows
25 − 75 percentiles, dark-gray area shows 5 − 95 percentiles, and solid line shows the
median.
with a polygonal mesh and pre-computing the filters corresponding to every element
of the mesh. In the simplest case of a uniform rectangular mesh, the SFR filters for
a single virtual source can be obtained in constant time based on its position. If the
rectangular mesh is non-uniform, the filters are obtained in the time it takes to locate
the rectangle that contains the virtual source. This complexity is Θ(log Nx ), where
Nx is the number of rectangles along the dimension that contains more rectangle
“stripes”. For a general mesh, the use of space partitioning data structures, such as
kd-trees (Bentley, 1975), makes the complexity of obtaining SFR filters Θ(log NM ),
where NM is the mesh size.
The previously mentioned filter pre-computation methods allow for real-time sound
field rendering, and have already been proposed and used in practical multichannel
sound field reproduction systems.
6.4.2
Performing system measurements
It has already been stressed that SFR does not put limiting constraints on the reproduction setup. It works irrespectively of loudspeaker or desired source directivity,
loudspeaker calibration, or sound propagation characteristics, as long as one is able
to obtain the MIMO acoustic channel involving a dense grid of control points. In a
practical reproduction system, this requirement might be too strict, as it is hard to
imagine that one would measure loudspeaker responses on a fine grid, especially in
6.5 Conclusions
141
τ g − τ gd [ ms]
WFS I
4
2
0
2
3
10
f [Hz]
WFS II
4
10
τ g − τ gd [ ms]
10
4
2
0
2
3
10
f [Hz]
SFR
4
10
τ g − τ gd [ ms]
10
4
2
0
2
10
3
10
f [Hz]
4
10
Figure 6.20: Group delay errors eτ (f ) = τg (f ) − τgd (f ) of WFS I, WFS II, and SFR
for point sources behind the loudspeaker array (Type II) on a grid of listening points.
Light-gray area shows 25 − 75 percentiles, dark-gray area shows 5 − 95 percentiles, and
solid line shows the median.
larger venues.
Instead, one could compromise by at least measuring the system on a contour
in the reproduction plane that encloses the listening area. By doing so, one trades
off some reproduction accuracy for practicality. Simulation experiments involving
enclosing contours instead of covering grids, not presented in this chapter for the sake
of space, show that SFR does not suffer from a noticeable performance loss with the
said simplification.
6.5
Conclusions
We described Sound Field Reconstruction—a technique for reproducing sound fields
in an extended listening area using an array of loudspeakers. SFR is based on a
numerical optimization procedure for MIMO channel inversion. The control points
covering the listening area, used by the MIMO channel inversion procedure, are spaced
below the Nyquist criterion to avoid aliasing. Additionally, SFR uses geometry-based
loudspeaker and control points selection to mitigate artifacts due to aliasing and
over-fitting.
SFR is a flexible sound field reproduction approach applicable to loudspeaker
arrays with different topologies and directivities. It also enables reproducing directive
sources, and does not restrict applications to free-field or anechoic sound propagation
Reproducing Sound Fields Using Acoustic MIMO Inversion
142
WFS I
τ g − τ gd [ ms]
4
2
0
2
10
3
10
f [Hz]
WFS II
4
10
τ g − τ gd [ ms]
4
2
0
2
10
3
10
f [Hz]
SFR
4
10
τ g − τ gd [ ms]
4
2
0
2
10
3
10
f [Hz]
4
10
Figure 6.21: Group delay errors eτ (f ) = τg (f ) − τgd (f ) of WFS I, WFS II, and
SFR for plane wave sources at directions α ∈ [165◦ , 180◦ ] (Type III) on a grid of
listening points. Light-gray area shows 25−75 percentiles, dark-gray area shows 5−95
percentiles, and solid line shows the median.
conditions.
We showed that, compared to Wave Field Synthesis, which is the state of the
art technique for sound field reproduction using loudspeaker arrays, SFR achieves
better average reproduction accuracy in an extended listening area and preserves
reproduction accuracy up to higher frequencies.
Chapter 7
Multichannel Room
Equalization Considering
Psychoacoustics
7.1
Introduction
Knowledge about interaction of sound with architectural objects goes to the very
distant past, and it has been used in design of many venues, including concert halls,
churches, opera houses, theatres etc. However, only the last century has seen the
development of electro-acoustic systems for audio recording and playback, which have
revolutionized the way people consume audio content. Loudspeaker systems have
become omnipresent—used in listening spaces that range from the above mentioned
venues to living rooms and cars.
The combination of a sound reproduction system with the room in which sound
reproduction takes place can in some cases have undesired effects. This can be due
to various reasons, such as an interior design that has aesthetic or some other criteria
as a priority, but can also be due to the characteristics of the audio content being
played. For instance, it is well known that the room acoustic properties considered
favorable for classical music, such as long reverberation time, can be detrimental for
playback of speech.
The acoustical treatment of a room, which is used in big halls to cater to the
acoustical requirements of a specific event, offers little flexibility. On the other hand,
the development of signal processing, especially digital, has given a flexible way to
enhance the acoustic properties of a room, or to mitigate some of the detrimental
effects it has on the reproduced sound.
We have seen in Section 2.7 that low-frequency room acoustics is dominated by
usually clearly separated resonances or room modes. We have also seen that the
density of room resonances increases with the square of frequency (see (2.100) on page
29), and that starting from the Schroeder frequency, room modes overlap and combine
in a complex, location-dependent way that is best modeled by a statistical theory of
room acoustics. The Schroeder frequency is dependent on the room’s geometry. For
143
144
Multichannel Room Equalization Considering Psychoacoustics
concert halls, it is on the lower end of audible frequencies; in listening rooms, which
we focus on in this chapter, it is roughly between 100 Hz and 200 Hz; and it can go
up to several hundred Hertz in cars.
The low-frequency room resonances below the Schroeder frequency are thus characteristic of the entire room’s listening area, and they define the room’s bass performance. For the playback of music, strong resonances affect the timbre in an audible
way, while for speech, the long reverberation tails associated with strong resonances
blur the syllables and decrease the intelligibility. Thus, both in music and speech
reproduction, it is important to reduce the effect of excessive resonances in order to
improve the listening experience.
Above the Schroeder frequency, the high spatial variation of the resonant structure
of room impulse responses (RIRs) makes the resonance control a highly locationdependent effort. This is the reason why some systems for wide-area room equalization
do not correct the room beyond the Schroeder frequency, and why we focus in this
chapter only on equalizing low frequencies.
The first works on room equalization go back to 1960s, and they were in the spirit
of controlling room modes. Namely, Boner and Boner (1965) proposed the use of
equalization to attenuate the resonances in sound systems. Groh (1974) analyzed
the low-frequency modal behavior of a room and performed equalization by finding
adequate placement for a loudspeaker within a room. Similar, albeit more systematic
approaches to optimizing the placement and number of low-frequency loudspeakers,
have been investigated more recently by a number of authors (see Celestinos, 2006;
Welti and Devantier, 2006). There have also been a fair amount of works recently
which focus on correcting the low-frequency modal behavior of a room using infinite
impulse response (IIR) filters (see Mäkivirta et al., 2003; Welti and Devantier, 2006;
Wilson et al., 2003).
In this chapter, we present an approach for multiple-loudspeaker low-frequency
room correction in an extended listening area, based on convex optimization. Our
approach resembles some optimization-based multiple-point RIR equalization approaches (e.g., see Elliott and Nelson, 1989). However, it is more general, since it
allows one to systematically incorporate physical and psychoacoustical aspects relevant to RIR correction through convex constraints. We focus on the problem of equalizing the response of one loudspeaker with the help of the remaining loudspeakers.
We argue that psychoacoustical phenomena of particular interest for room equalization are temporal masking and the precedence effect, and we show how they can be
incorporated through convex constraints on the time-domain profile of the equalization filters’ impulse responses. Excessively driving loudspeakers at some frequencies,
characteristic of the efforts of correcting deep notches, is prevented by limiting the
maximum gain of equalization filters over frequency.
7.1.1
Chapter outline
Section 7.2 presents the room equalization problem and gives a detailed description
of our multiple-loudspeaker room equalization approach. In Section 7.3, we show
the effectiveness of our approach with a simulation of a five-channel surround setup,
where one loudspeaker is equalized with the help of the remaining four. Conclusions
are given in Section 7.4.
7.2 Proposed room equalization
7.2
7.2.1
145
Proposed room equalization
Problem description
Figure 7.1: Equalization of a five-channel loudspeaker setup in a room of dimensions
(6m, 4m, 2.5m). In the illustration, the response of loudspeaker S1 is equalized in the
listening area around the central control point C1 .
Consider a listening room with a multichannel loudspeaker setup, consisting of L
loudspeakers S1 , . . . , SL . An example setup, which is considered later in Section 7.3,
is shown in Figure 7.1. Note that we are focusing on equalizing the response of one
loudspeaker, denoted by the main loudspeaker, with the help of the remaining, for
this task called auxiliary loudspeakers. Without loss of generality, we assign index 1
to the main loudspeaker.
As the first step, one needs to measure RIRs of all loudspeakers in N control
points C1 , . . . , CN which cover the listening area where RIRs are being equalized (gray
rectangular area in Figure 7.1). The placement of control points can be systematic
or random, and as few as N = 4 control points can capture with high accuracy the
3D sound energy in a room, as reported in (Pedersen, 2007). We denote by Gij (ω)
the frequency response of the RIR between loudspeaker j and control point i, and by
G(ω) the N × L matrix containing the frequency responses Gij (ω).
One also needs to decide on the length Nh of equalization filters hi [n]. Note
that working with highly downsampled signals allows using relatively short filters,
and multirate filtering (e.g., see Crochiere and Rabiner, 1983) offers savings in the
computational complexity. Nh can be up to the order of the length of RIRs, which
is still short in the downsampled domain corresponding to a highly reduced sampling
frequency fS0 . Let the vector
T
hi = [hi [0] . . . hi [Nh − 1]]
contain the samples of the equalization filter of loudspeaker i, and the vector
T
h = hT1 . . . hTL
(7.1)
(7.2)
146
Multichannel Room Equalization Considering Psychoacoustics
contain the samples of all loudspeaker filters stacked together. Note that vector h is
what we are trying to obtain.
Since our design procedure considers the equalized RIRs in the frequency domain,
we discretize the frequency axis into Nf uniformly-spaced normalized frequencies
ω0 , . . . , ωNf −1 , where ω0 = 0, and ωNf −1 = π corresponds to the Nyquist frequency
fS0 /2. The frequency spacing needs to be of the order of room resonances’ bandwidth,
which can go down to around 1 Hz (Kuttruff, 2000).
A vector
T
H(ωi ) = [H1 (ωi ) . . . HL (ωi )] ,
(7.3)
containing the frequency responses of the equalization filters, is obtained by the following product:
H(ωi ) = V (ωi ) h ,
(7.4)
where

V (ωi )
v(ωi )


= IL×L ⊗ v(ωi ) = 

=
h
v(ωi )
0Nh ×1
..
.
0Nh ×1
v(ωi )
..
.
...
...
..
.
0Nh ×1
0Nh ×1
..
.
0Nh ×1
iT
0Nh ×1
...
v(ωi )
1 ejωi . . . ej(Nh −1)ωi
T




(7.5)
,
IL×L is an L × L identity matrix, and ⊗ denotes the Kronecker product.
The frequency responses at the normalized frequency ωi of the equalized RIRs in
the control points are given by
T
Y (ωi ) = [Y1 (ωi ) . . . YN (ωi )] = G(ωi ) H(ωi ) .
(7.6)
In order to jointly consider the equalized frequency responses in all control points
C1 , . . . , CN and at all frequencies ω0 , . . . , ωNf −1 , vectors Y (ωi ) are stacked into one
long vector defined by

 

Y (ω0 )
GT (ω0 ) V (ω0 )

 

..
..
Y =
=
 h.
.
.
Y (ωNf −1 )
GT (ωNf −1 ) V (ωNf −1 )
Essentially, the goal of equalization is to make the equalized RIRs as close as
possible to the desired responses, discussed next, in all control points at all considered
frequencies.
7.2.2
Desired response calculation
One of the main challenges when designing room equalizers is specifying the desired
frequency response D(ω) the equalized system needs to achieve, and there is no wide
consensus on this issue. On the other hand, it has long been suggested that an equalization procedure should not undo the effect of a room and make the reproduced
sound artificially anechoic, but it should sensibly correct the room’s undesired features, usually associated with strong low-frequency resonances (e.g., see Genereux,
1992).
7.2 Proposed room equalization
147
As briefly mentioned at the beginning of this section, the listening area is sampled with control points in order to capture the essential properties of a sound field
developed in the room. At the same time, sampling multiple points allows to avoid
position-dependent anomalies, such as deep frequency-response notches associated
with nodes of some of the room modes.
It was shown in (Pedersen, 2007) that the root mean square (RMS) value of the
room magnitude frequency response taken over several measurement points gives a
stable estimate of the room’s spectral power profile. Hence, for defining the desired
frequency characteristic, we combine the mentioned spatial power averaging with magnitude frequency response smoothing in fractional octave bands. More specifically,
the desired response D(ωi ) is obtained as follows:
v
u
N
u1 X
(G̃m1 (ωi ))2 ,
D(ωi ) = t
N m=1
(7.7)
where G̃m1 (ω) is fractional-octave (e.g., 1/3-octave) smoothed magnitude characteristic of the RIR between the main loudspeaker and control point m.1
Since the same desired frequency response is taken for each control point, the
vector of desired responses at frequency ωi is given by D(ωi ) = D(ωi ) 1N ×1 , where the
column-vector 1N ×1 contains N ones. In order to compare the equalized with desired
frequency responses in control points, vectors D(ωi ) need to be stacked together into
T
a column vector D = D T (ω0 ) . . . D T (ωNf −1 ) .
7.2.3
Choice of a cost function
The goal of an optimization procedure is to minimize a cost function, which in the case
of our equalization filters design is a function of the difference between the equalized
and desired frequency responses, Y and D, respectively, taken over all control points.
But before defining the cost function, we first define a resonance detection vector
T
R = R(ω0 ) . . . R(ωNf −1 )
(7.8)
R(ωi ) = |maxm (|Gm1 (ωi )|) − D(ωi )| + ,
(7.9)
by
where is a predefined minimum weight that can be given to a spectral magnitude
error. Note that R is designed to peak at resonant frequencies.
For a cost function, we choose to use a weighted magnitude error that penalizes
more the errors at resonant frequencies, and it is given by
J = kW ( | Y | − | D | )k2 ,
(7.10)
where W = diag(R) ⊗ IN ×N , and diag(·) makes a diagonal matrix from a vector,
with vector’s entries on the main diagonal.
1 Fractional
smoothing is thoroughly described in (Hatziantoniou and Mourjopoulos, 2000).
148
7.2.4
Multichannel Room Equalization Considering Psychoacoustics
Equalization filter constraints
Here we present some psychoacoustical and physical considerations that are used to
constrain the computed filters in order to avoid the location-sensitivity, characteristic
of some other room equalization approaches.
Temporal-masking constraints
Temporal masking is a phenomenon where a sound stimulus renders inaudible sounds
which precede (backward masking) and follow it (forward masking) (Moore, 1989).
Figure 7.2: Illustration of temporal masking for a single pulse.
Backward masking, which is not entirely understood, varies greatly between trained
and untrained listeners (Moore, 1989; Zwicker and Fastl, 1999), and can go up to
20 ms for the latter. Raab (1961) investigated backward masking of clicks by clicks,
and determined that it vanishes above 15–20 ms. Thus, we decide to model backward
masking by an exponential curve of the form
mb (t) = b1 (−t)b2
in the interval (−∞, −1 ms). The parameters b1 and b2 are determined such that
mb (t)|t=−1ms
=
0 dB
mb (t)|t=−15ms
=
−60 dB .
Forward masking is the better understood of the two phenomena, and it is highly
dependent on the relationship between the masking and masked signal. Moore (1989)
explained forward masking as a phenomenon that starts as simultaneous masking, and
then decays as an exponential function up to 100–200 ms. Fielder (2003) proposed
to simplify the findings of several forward masking investigations involving different
types of masking and masked signals (Raab, 1961; Zwicker and Fastl, 1999; Jesteadt
et al., 1982), and model forward masking by a combination of simultaneous masking
that extends up to 4 ms, followed by an exponential curve
mf (t) = f1 tf2
7.2 Proposed room equalization
149
in the interval (4 ms, ∞), whose parameters f1 and f2 are defined by
mf (t)|t=4ms
=
0 dB
mf (t)|t=200ms
=
−60 dB .
Combining backward and forward masking curves, mb (t) and mf (t), respectively,
and assuming simultaneous masking (Moore, 1989; Zwicker and Fastl, 1999) takes
place in the interval [−1 ms, 4 ms], we obtain the masking curve m(t) illustrated in
Figure 7.2.
Fielder (2003) also proposed considering temporal masking as a criterion for timedomain distortions of equalization systems. In light of the arguments about the
limitations of using long equalization filters or filters that exhibit pre-echos, and considering temporal masking, we can expect that the use of short equalization filters
has a good chance of avoiding sensitivity to location changes. In other words, we
argue that short and well localized (“spiky”) filters, whose amplitude profiles fit into
the temporal masking threshold curve m(t) from Figure 7.2, are good solutions for
wide-area equalization. Our argument is corroborated by the fact that if a sharp
transient is emitted by a loudspeaker, a listener close to the loudspeaker will not hear
a distortion thanks to temporal masking.
Figure 7.3: Illustration of the temporal-masking constraint for RIR equalization filter
of the main loudspeaker. The filter coefficients can take on values only in the shaded
area.
The temporal-masking constraint is defined as a maximum-amplitude limit to
a filter’s impulse response. If m[n] is a sampled version of the temporal masking
threshold curve m(t), then the equalization filter hi [n] is constrained with
| hi [n] | ≤ m[n] .
(7.11)
Since we do not use a delay for the main loudspeaker, its amplitude profile is
constrained by the sampled version of the causal part of m(t). Figure 7.3 illustrates
the temporal masking constraint (7.11) for the main loudspeaker, where the shaded
area defines where the samples of h1 [n] can take on values.
150
Multichannel Room Equalization Considering Psychoacoustics
Note also that by using the following vector representation
h1
=
m1
=
T
[h1 [0] . . . h1 [Nh − 1]]
T
[m[0] . . . m[Nh − 1]] ,
the main loudspeaker temporal masking constraint (7.11) can be written as the following convex constraint:
| h1 | m1 ,
(7.12)
where denotes component-wise ≤.
Auxiliary loudspeaker filter constraints
The precedence effect (Wallach et al., 1949) is a phenomenon wherein a sound coming
to a listener from different directions and with different delays is perceived in the
direction of the earliest arrival. More precisely, inside the interval of 1 ms following the
first arrival, multiple wavefronts can combine in what is called summing localization,
where the sound is perceived in the direction defined by the relative strengths of the
incoming wavefronts. Following the summing localization interval, the precedence
effect is active, and delayed sounds do not affect localization2 until the delay reaches
the echo threshold. Beyond the echo threshold, delayed sounds are perceived as echos,
which are auditory events well separated in time and direction (Blauert, 1997). The
echo threshold is dependent on the type of audio stimulus, namely, for speech it is
around 50 ms, while for music it goes up to around 100 ms (Blauert, 1997).
Motivated by the findings described above, we formulate two additional constraints
for the auxiliary loudspeakers:
• To prevent the sounds from the auxiliary loudspeakers appear as echoes, we use
a combination of delay and gain relative to the main loudspeaker’s filter. The
delay and gain should be such that the auxiliary loudspeaker signals are below
the echo threshold.
• To prevent that the signal is perceived away from the main loudspeaker, the
delay needs to be at least about 1 ms over the whole listening area, such that
the precedence effect is active and sound is perceived at the main loudspeaker.
The described constraints for the auxiliary loudspeakers are realized by the following
modification of the temporal-masking constraint (7.11):
| hi [n] | ≤ ai m[n − ni ] ,
(7.13)
where ai is a positive attenuation factor, and ni is a lag corresponding to a delay
ti , which we set between five and ten milliseconds. The auxiliary loudspeaker filter
constraints are illustrated in Figure 7.4, where the shaded area defines where the
samples of hi [n] can take on values.
2 Note that during this interval, delayed sounds affect spatial perception, e.g., they can enlarge
the perceived width of a source.
7.2 Proposed room equalization
151
Figure 7.4: Illustration of the temporal masking constraint for RIR equalization filter
of auxiliary loudspeakers. The filter coefficients are allowed to have values only in the
shaded area.
Using the following vector notation
hi
=
mi
=
[hi [0] . . . hi [Nh − 1]]T
ai [m[−ni ] . . . m[Nh − 1 − ni ]]T ,
the auxiliary loudspeaker filter constraint (7.13) becomes
| hi | mi .
(7.14)
Like the temporal-masking constraint (7.12), the auxiliary loudspeaker filter constraint (7.14) is also convex. Combining (7.12) and (7.14), we obtain a single convex
constraint of the form
|h| m,
(7.15)
where
m = [mT1 . . . mTL ]T .
Maximum-gain constraints
An additional insurance against location sensitivity is putting a limit on the equalization filter’s gain over frequencies. Namely, this avoids excessively driving a loudspeaker in order to correct a deep notch, which is usually highly position-dependent.
The gain of equalization filters are limited using the following set of convex constraints:
| V (ωi ) h | H max ,
T
∀ωi ,
(7.16)
where H max = [H1max . . . HLmax ] is a vector with the maximum gain for each loudspeaker. The maximum-gain constraint is illustrated in Figure 7.5, where the shaded
area defines where the magnitude spectrum |Hi (ω)| can take on values.
Multichannel Room Equalization Considering Psychoacoustics
152
Figure 7.5: Illustration of the maximum-gain constraint for a RIR equalization filter.
The magnitude frequency response of RIR equalization filter is allowed to have values
only in the shaded area.
7.2.5
Filter computation procedure
The cost function (7.10) is not convex, preventing the use of conventional tools of
convex optimization (Boyd and Vandenberghe, 2004) and making the problem difficult to solve. However, we have already seen a similar problem when we discussed
magnitude-optimized transducer arrays in Section 2.8, and also in Section 5.3 when
we discussed beamformer design for a directional loudspeaker array. In both cases,
the solution was based on the iterative, local optimization algorithm by Kassakian
(2006). In this particular case we do the same, and propose Algorithm 7.1, very
similar to Algorithms 2.1 and 5.1.
Algorithm 7.1 Solving a constrained, weighted magnitude least squares problem.
1. Choose the solution tolerance 2. Choose the initial solution h
3. repeat
4.
J ← k W ( | Y | − | D | ) k2
5.
Compute D̂ such that ∀ωj ∈ {ω0 , . . . , ωNf −1 }
|D̂(ωj )| = |D(ωj )|
∠D̂(ωj )
6.
= ∠(G(ωj ) H(ωj ))
Solve the following convex program
minimize
subject to
k W ( Y − D̂ ) k2
|h| m
| V (ωj ) h | H max ,
∀ωj ∈ {ω0 , . . . , ωNf −1 }
7.
J 0 ← k W ( | Y | − | D | ) k2
8. until |J 0 − J| < Using the fact that the main loudspeaker has a dominant role—which is also
formulated through constraints—we choose the initial solution h = [hT1 . . . hTL ]T
7.3 Simulations
153
defined by
h1
=
[1 0| .{z
. . 0}]T
hi
= 0Nh ×1 ,
Nh −1
i > 1.
(7.17)
Using the same argument as in Section 2.8, it is easily shown that Algorithm 2.1
converges to a local minimum. However, it is not guaranteed convergence to the
global minimum, making the choice of the initial solution a crucial step. From our
experience, the initial solution (7.17) provides consistently better results than the
complex least squares solution (see Section 2.8 for details).
7.3
Simulations
In order to verify the effectiveness of our room equalization strategy, we performed a
simulation of a five-channel full-range loudspeaker setup, shown in Figure 7.1, using
an image source model (Allen and Berkley, 1979) implementation from Habets (2006).
We give two examples. The first example, denoted MIMO equalization, involves
the described multiple-loudspeaker equalization of the front loudspeaker S1 (center
channel),3 with the help of the remaining four. In the second example, denoted
SIMO equalization, the front loudspeaker S1 is equalized by a single equalization
filter, without the help of the remaining ones. Note that SIMO equalization can be
viewed as a special case of MIMO equalization, where the auxiliary loudspeaker filter
constraints (7.13) are set such that ai = 0.
In both examples, equalization is done up to frequency fmax = 200 Hz, which
enables filtering in the downsampled domain corresponding to the sampling frequency
fS0 = 400 Hz. The maximum-gain constraints are set to Himax = 3 dB, and the used
frequency spacing is 1 Hz. Also, the equalization filters are computed relative to the
control points C1 , . . . , C5 (see Figure 7.1).
7.3.1
MIMO equalization
In this part, we present a full multichannel equalization of the front loudspeaker S1 ,
where all equalization filters hi [n] have lengths of Nh = 16 samples. For auxiliary
loudspeakers’ filters, we use a delay ni that corresponds to 10 ms, with attenuation
ai = 0.25.
The resulting equalization filters, both in the time and frequency domain, together
with the amplitude constraints, are shown in Figure 7.6.
Figure 7.7 shows the variations of RIR frequency characteristics—before and after
equalization—on a rectangular grid of control points, spaced at 25 cm, which cover
the listening area (see Figure 7.1). From Figure 7.7, it can be seen that after equalization, strong resonances get attenuated, as desired. Also, above 80 Hz, the average
magnitude frequency characteristic of the equalized RIRs inside the listening area
gets significantly improved, exhibiting a smoother behavior and approximating the
desired magnitude characteristic more closely.
3 Note that our numbering differs from the usual way channels are numbered in five-channel
surround.
Multichannel Room Equalization Considering Psychoacoustics
154
(a)
1
h1
0.8
h2
h[ n ]
0.6
h3
0.4
h4
0.2
h5
0
−0.2
0
5
10
15
20
t [ ms]
25
30
35
(b)
5
0
H [ dB]
−5
−10
−15
−20
−25
−30
20
40
60
80
100
120
f [ Hz ]
140
160
180
200
Figure 7.6: Time-domain (a) and frequency-domain (b) characteristics of loudspeaker equalization filters hi [n]. Thin dashed lines mark the amplitude constraints
for the main and auxiliary loudspeakers (a) and the maximum-gain constraint (b).
7.3.2
SIMO equalization
In this part, we present a single-channel equalization of the front loudspeaker S1 with
an equalization filter whose impulse response has Nh = 64 samples.
The impulse response of the resulting equalization filter h1 [n], together with the
amplitude constraint, is shown in Figure 7.8.
Panels (a) and (b) in Figure 7.9 show the values of the peak-detection vector R
and the frequency characteristic H1 (ω) of the equalization filter h1 [n], respectively.
The former shows that larger weight is given to errors where strong resonances are
present, and justifies notches that H1 (ω) has at those frequencies.
Panels (c) and (d) in Figure 7.9 show variations of RIR frequency characteristics—
before and after equalization, respectively—on a rectangular grid of control points,
spaced at 25 cm, which cover the listening area (see Figure 7.1). From panels (c)
and (d) in Figure 7.9, it can be seen that after equalization, strong resonances get
only slightly attenuated. However, due to short length and the temporal-masking
constraint, the filter tends to attenuate all frequencies, bringing the equalized RIRs
down by around 2 dB on average, which may not be desired.
7.3.3
Discussion
By comparing Figure 7.6(b) and Figure 7.9(b), we clearly see the differences in the
ways MIMO and SIMO equalization attenuate room modes. Magnitude responses of
7.3 Simulations
155
(a)
−20
−25
G [ dB]
−30
−35
−40
−45
−50
−55
20
40
60
80
100
120
140
160
180
200
120
140
160
180
200
f [Hz]
(b)
−20
−25
Y [ dB]
−30
−35
−40
−45
−50
−55
20
40
60
80
100
f [Hz]
Figure 7.7: Mean (solid), 25-75 (light-gray), and 3-97 (dark-gray) percentiles of the
magnitude frequency responses on a rectangular grid of control points spaced at 25 cm,
shown in Figure 7.1, before (a) and after (b) equalizing loudspeaker S1 . The desired
frequency characteristic D(ω) is shown with a dashed line.
1
0.8
h[ n ]
0.6
0.4
0.2
0
−0.2
0
50
100
150
t [ ms]
Figure 7.8: Impulse response of the single-channel equalization filter h1 [n]. Thin
dashed line mark the amplitude constraints for the main loudspeaker.
MIMO equalization filters, shown in Figure 7.6(b), do not reflect the RIR resonance
patterns. It indicates that they combine in a mode-cancelling manner, and it also
explains why short filters suffice for a good equalization performance. On the other
hand, magnitude response of a SIMO equalization filter, shown in Figure 7.9(b), reflects the RIR resonance pattern more closely, revealing magnitude response notches
at room resonant frequencies. That is the reason why one needs longer SIMO equalization filters.
Multichannel Room Equalization Considering Psychoacoustics
156
(a)
1
0.9
R
0.8
0.7
0.6
0.5
20
40
60
80
100
120
140
160
180
200
120
f [ Hz ]
140
160
180
200
120
140
160
180
200
120
140
160
180
200
(b)
1
0
H [ dB]
−1
−2
−3
−4
−5
20
40
60
80
100
(c)
−20
−25
G [ dB]
−30
−35
−40
−45
−50
−55
20
40
60
80
100
f [Hz]
(d)
−20
−25
Y [ dB]
−30
−35
−40
−45
−50
−55
20
40
60
80
100
f [Hz]
Figure 7.9: (a) Resonance-detection vector R(ω). (b) Magnitude frequency response
H1 (ω) of the single-channel equalization filter h1 [n]. Mean (solid), 25-75 (light-gray),
and 3-97 (dark-gray) percentiles of the magnitude frequency responses on a rectangular
grid of control points spaced at 25 cm, shown in Figure 7.1, before (c) and after (d)
equalizing loudspeaker S1 . The desired frequency characteristic D(ω) is shown with a
dashed line in panels (c) and (d).
7.4 Conclusions
157
Resonances at the very low frequencies, below 80 Hz, are easier to equalize, since
they are more widely separated, and require lower-Q correction filters. This is apparent in the SIMO equalization case, where resonances are attenuated by a combination of notches at the corresponding frequencies, with the effectiveness similar to the
MIMO equalization case.
The resonances at higher frequencies are more densely spaced and require veryhigh-Q correction filters, making single-loudspeaker equalization difficult. On the
other hand, the mode cancellation approach with multiple loudspeakers offers a way
to attenuate these resonances without excessive gains or using impractically long
filters. This explains a superior performance of MIMO equalization above 80 Hz.
7.4
Conclusions
We presented an approach for low-frequency multiple-loudspeaker RIR equalization
based on convex optimization. We showed a way to formulate typical physchoacoustical and physical RIR equalization design constraints in terms of convex constraints
on the equalization filters, allowing finding optimal solutions in a systematic fashion. We have also shown the effectiveness of our approach at equalizing a simulated
five-channel loudspeaker system in an extended listening area, using short filters and
making sure that the equalization system does not cause undesired audible echoes
and localization biases.
The proposed approach, using the temporal-masking and maximum-gain constraints, can also be applied to single-loudspeaker room equalization. Similarly to
the multiple-loudspeaker equalization, this approach was verified on a simulated fivechannel loudspeaker system. It turns out that even with a single channel, our approach
offers some improvements over no equalization by being effective at attenuating—
although subtly—dominant resonances in a RIR.
158
Multichannel Room Equalization Considering Psychoacoustics
Chapter 8
Conclusion
In this thesis, we have focused on spatial sound capture and reproduction. In each
problem that we encountered along the way, there was a recurring pattern. Namely,
at the start we used spatio-temporal properties of the sound field in a considered
spatial domain in order to carry out a suitable spatial discretization. The spatial
discretization of an otherwise continuous problem consists of measuring responses of
transducers (microphones or loudspeakers) in the used transducer array. From then
on, we sought a suitable numerical optimization procedure for optimally solving the
considered acoustic problem, while respecting particular physical and psychoacoustical constraints.
We have seen that the proposed general approach is flexible, since unlike most
methods derived from acoustic theory, it does not require ideal and calibrated transducers, or free-field propagation conditions. More specifically, any discrepancies between the acoustic environment and the used transducers on one hand, and their
idealized models on the other hand, are accounted for through measurements.
A striking fact about the proposed approach is that it uses the same optimization
tools for a wide variety of problems. Namely, in all the applications presented in this
thesis, which include the design of directional and sound field microphones, directional
loudspeaker array, sound field reproduction, and room equalization, we used spatial
discretization of transducers’ responses; after specifying a desired system behavior
and constraints, we used the same optimization procedure for obtaining a solution.
Even in idealized cases, with calibrated transducers having ideal characteristics,
our approach achieves equally good (such as in microphone array design) or better
(sound field reproduction) results.
We have also shown how problem-specific physical and psychophisical observations can be included in the formulation of the used optimization problem. In the
problem of directional loudspeaker array design, we limited loudspeaker filter gains,
but we also discarded phase information from the cost function in order to achieve
a more directional reproduction. For room equalization, we similarly discarded the
phase from the cost function and limited loudspeaker gains, but we also added filter
amplitude profiles in order to avoid localization bias and temporal distortions in the
form of pre- and post-echos.
159
160
Conclusion
Future work
The possible applications of wide-band directional sound reproduction have been mentioned in Chapter 5, but we have only done informal listening without investigating
them in a systematic fashion. A further investigation into multichannel reproduction with a steerable directional loudspeaker array in a room could result in a more
systematic strategy for reproducing ambience and lateral direct sounds. The use of
such a loudspeaker array for room equalization can also be assessed, since it has long
been shown by Jacob (1985) that medium- and high-directivity loudspeakers notably
improve speech intelligibility in highly reverberant rooms.
We have seen that our approach to reproducing sound fields (SFR) is based on inverting the MIMO acoustic channel from loudspeakers to control points. Loudspeaker
characteristics and possible room effects are taken into account implicitly. However,
we have not assessed those scenarios, and leave them for future work.
Finally, an extension to the proposed constrained optimization framework for
multiple-loudspeaker room equalization could be sought. For instance, instead of
using a temporal-masking constraint, the cost function could include temporal distortion quantified with the help of temporal masking. This would enlarge the feasible
set of equalization filters’ impulse responses, and consequently increase their ability
to more effectively control room resonances. Also, the cost function could easily be
extended to use frequency-domain averaging, opening the door to full-band equalization.
Bibliography
T.D. Abhayapala and D.B. Ward. Theory and design of high order sound field microphones using spherical microphone array. IEEE International Conference on
Acoustics, Speech, and Signal Processing, 2002.
M. Abramowitz and I.A. Stegun. Handbook of Mathematical Functions: With Formulas, Graphs, and Mathematical Tables. Dover New York, 1976.
J. Ahrens and S. Spors. Sound Field Reproduction Using Planar and Linear Arrays
of Loudspeakers. IEEE Transactions on Audio, Speech, and Language Processing,
18(8):2038–2050, Nov. 2010.
T. Ajdler, L. Sbaiz, and M. Vetterli. The Plenacoustic Function and its Sampling.
IEEE Trans. Sig. Proc, 54(10):3790–3804, 2006.
T. Ajdler, C. Faller, L. Sbaiz, and M. Vetterli. Sound field analysis along a circle and
its applications to hrtf interpolation. J. Audio Eng. Soc, 56(3):156–175, 2008.
I. Allen. Matching the sound to the picture. In AES 9th International Conference,
1991.
J.B. Allen and D.A. Berkley. Image method for efficiently simulating small-room
acoustics. J. Acoust. Soc. Am, 65(4):943–950, 1979.
G.B. Arfken, H.J. Weber, and H. Weber. Mathematical Methods for Physicists. Academic press New York, 1985.
J.S. Bamford and J. Vanderkooy. Ambisonic sound for us. Preprint 99th Conv. Aud.
Eng. Soc., 1995.
B.B. Bauer. Quadraphonic reproducing system, 1974. US Patent 3,813,494.
J.L. Bentley. Multidimensional binary search trees used for associative searching.
Communications of the ACM, 18(9):509–517, 1975.
G. Berchin. Precise filter design [dsp tips & tricks]. IEEE Sig. Proc. Mag, 24(1):
137–139, 2007.
A.J. Berkhout. A holographic approach to acoustic control. J. Audio Eng. Soc, 36
(12):977–995, 1988.
161
162
Bibliography
A.J. Berkhout, D. de Vries, and P. Vogel. Wave front synthesis: a new direction in
electroacoustics. Preprint 93th Conv. Aud. Eng. Soc., 1992.
A.J. Berkhout, D. de Vries, and P. Vogel. Acoustic control by wave field synthesis. J.
Acoust. Soc. Am, 93(5):2764–2778, May 1993.
S. Bertet, J. Daniel, and S. Moreau. 3D sound field recording with higher order ambisonics-objective measurements and validation of spherical microphone.
Preprint 120th Conv. Aud. Eng. Soc., 2006.
J. Blauert. Spatial Hearing: The Psychophysics of Human Sound Localization. The
MIT Press, Cambridge, Massachusetts, USA, revised edition, 1997.
J. Blauert and P. Laws. Group delay distortions in electroacoustical systems. J.
Acoust. Soc. Am, 63(5):1478–1483, 1978.
A. Blumlein. Improvements in and relating to sound transmission, sound recording and sound reproduction systems. British Patent Specification 394325, 1931.
Reprinted in Stereophonic Techniques, Aud. Eng. Soc, New York, 1986.
C.P. Boner and C.R. Boner. A procedure for controlling room-ring modes and feedback modes in sound systems with narrow-band filters. J. Audio Eng. Soc, 13(4):
297–299, 1965.
S.P. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press,
2004.
R. Bücklein. The audibility of frequency response irregularities. J. Audio Eng. Soc,
29(3):126–131, 1981.
A. Celestinos. Low frequency sound field enhancement system for rectangular rooms
using multiple loudspeakers. PhD thesis, Aalborg University, 2006.
R.K. Cook, R.V. Waterhouse, R.D. Berendt, S. Edelman, and M.C. Thompson. Measurement of correlation coefficients in reverberant sound fields. J. Acoust. Soc. Am,
27:1072–1077, 1955.
D.H. Cooper and T. Shiga. Discrete-matrix multichannel stereo. J. Audio Eng. Soc,
20(5):346–360, 1972.
E. Corteel. Equalization in an extended area using multichannel inversion and wave
field synthesis. J. Audio Eng. Soc, 54(12):1140–1161, 2006.
E. Corteel. Synthesis of directional sources using wave field synthesis, possibilities,
and limitations. EURASIP Journal on Applied Signal Processing, 2007(1):188–188,
2007. ISSN 1110-8657.
E. Corteel, R. Pellegrini, and C. Kuhn-Rahloff. Wave Field Synthesis with increased
aliasing frequency. Preprint 124th Conv. Aud. Eng. Soc., 2008.
P.S. Cotterell. On the theory of second-order soundfield microphone. PhD thesis,
University of Reading, 2002.
Bibliography
163
H. Cox, R. Zeskind, and M. Owen. Robust adaptive beamforming. IEEE Transactions
on Acoustics, Speech and Signal Processing, 35(10):1365–1376, 1987.
R.E. Crochiere and L.R. Rabiner. Multirate Digital Signal Processing. Prentice Hall,
1983.
J. Daniel, R. Nicol, and S. Moreau. Further investigations of high order ambisonics
and wavefield synthesis for holophonic sound imaging. Preprint 114th Conv. Aud.
Eng. Soc., 2003.
H.E. de Bree, M. Iwaki, K. Ono, T. Sugimoto, and W. Woszczyk. Anechoic measurements of particle-velocity probes compared to pressure gradient and pressure
microphones. Preprint 122nd Conv. Aud. Eng. Soc., 2007.
D. de Vries. Sound reinforcement by wavefield synthesis: adaptation of the synthesis
operator to the loudspeaker directivity characteristics. J. Audio Eng. Soc, 44(12):
1120–1131, 1996.
J.R. Driscoll and D.M. Healy. Computing fourier transforms and convolutions on the
2-sphere. Advances in Applied Mathematics, 15(2):202–250, 1994.
G.W. Elko. Superdirectional microphone arrays. In Acoustic Signal Processing for
Telecommunication, pages 181–238. Kluwer Academic Publishers, 2000.
G.W. Elko. Microphone array systems for hands-free telecommunication. Speech
communication, 20(3–4):229–240, Dec. 1996.
G.W. Elko and A.T.N. Pong. A steerable and variable first-order differential microphone array. IEEE International Conference on Acoustics, Speech, and Signal
Processing, 1997.
S.J. Elliott and P.A. Nelson. Multiple-point equalization in a room using adaptive
digital filters. J. Audio Eng. Soc, 37(11):899–907, 1989.
F.J. Fahy. Measurement of acoustic intensity using the cross-spectral density of two
microphone signals. J. Acoust. Soc. Am, 62(4):1057–1059, 1977.
C. Faller and M. Kolundžija. Design and limitations of non-coincidence correction
filters for soundfield microphones. Preprint 126th Conv. Aud. Eng. Soc., 2009.
K. Farrar. Soundfield microphone. Wireless World, 1979.
L.D. Fielder. Analysis of traditional and reverberation-reducing methods of room
equalization. J. Audio Eng. Soc, 51(1/2):3–26, 2003.
S. Flanagan, B.C.J. Moore, and M.A. Stone. Discrimination of group delay in clicklike
signals presented via headphones and loudspeakers. J. Audio Eng. Soc, 53(7-8):
593–611, 2005.
R.K. Furness. Ambisonics–an overview. In Proceedings of the 8th International Conference of the Audio Engineering Society, pages 181–189, 1990.
Bibliography
164
P.A. Gauthier and A. Berry. Adaptive wave field synthesis for sound field reproduction: Theory, experiments, and future perspectives. J. Audio Eng. Soc, 55(12):
1107, 2007.
R.J. Geluk and L. de Klerk. Microphone exhibiting frequency-dependent directivity.
EU Patent Application 01201501.2, Apr. 2001.
R. Genereux. Adaptive filters for loudspeakers and rooms. Preprint 93rd Conv. Aud.
Eng. Soc, 1992.
M.A. Gerzon. Practical periphony: The reproduction of full-sphere sound. Preprint
65th Conv. Aud. Eng. Soc., 1980a.
M.A. Gerzon. Periphony: With-height sound reproduction. J. Audio Eng. Soc, 21
(1):2–10, 1973.
M.A. Gerzon. The design of precisely coincident microphone arrays for stereo and
surround sound. Preprint 50th Conv. Aud. Eng. Soc., 1975.
M.A. Gerzon. Practical Periphony: The Reproduction of full-sphere sound. Preprint
65th Conv. Aud. Eng. Soc., 1980b.
G. Golub and W. Kahan. Calculating the singular values and pseudo-inverse of a
matrix. Journal of the Society for Industrial and Applied Mathematics, Series B:
Numerical Analysis, 2(2):205–224, 1965.
M. Grant, S. Boyd, and Y. Ye. CVX: Matlab software for disciplined convex programming. http://cvxr.com/cvx/, 2011.
A.R. Groh. High-fidelity sound system equalization by analysis of standing waves. J.
Audio Eng. Soc., 22(10):795–799, 1974.
E.A.P. Habets. Room impulse response generator, 2006.
J. Hannemann and K.D. Donohue. Virtual sound source rendering using a multipoleexpansion and method-of-moments approach. J. Audio Eng. Soc, 56(6):473, 2008.
R.H. Hardin and N.J.A. Sloane. Mclaren’s improved snub cube and other new spherical designs in three dimensions. Discrete and Computational Geometry, 15:429–441,
1996.
P.D. Hatziantoniou and J.N. Mourjopoulos. Generalized fractional-octave smoothing
of audio and acoustic responses. J. Audio Eng. Soc, 48(4):259–280, 2000.
Holophonics.
Sound Shower.
speakers.html, 2011.
http://www.panphonics.com/directional-
Holosonics. Audio Spotlight. http://www.holosonics.com/products.html, 2011.
ITU-775. Multichannel stereophonic sound system with and without accompanying
picture. Rec. BS.775.1, ITU, Geneva, 1994.
K.D. Jacob. Subjective and predictive measures of speech intelligibility-the role of
loudspeaker directivity. J. Audio Eng. Soc, 33(12):950–955, 1985.
Bibliography
165
W. Jesteadt, S.P. Bacon, and J.R. Lehman. Forward masking as a function of frequency, masker level, and signal delay. J. Acoust. Soc. Am, 71:950–962, 1982.
D.H. Johnson and D.E. Dudgeon. Array Signal Processing. Prentice Hall, 1993.
P.W. Kassakian. Convex approximation and optimization with applications in magnitude filter design and radiation pattern synthesis. PhD thesis, University of California, Berkeley, 2006.
O. Kirkeby and P.A. Nelson. Reproduction of plane wave sound fields. J. Acoust.
Soc. Am, 94:2992, 1993.
O. Kirkeby, P.A. Nelson, H. Hamada, and F. Orduna-Bustamante. Fast deconvolution
of multichannel systems using regularization. IEEE Transactions on Speech and
Audio Processing, 6(2):189–194, Mar 1998.
M. Kolundžija. Microphone processing for sound field measurement. Master’s thesis,
EPFL, 2007.
M. Kolundžija, C. Faller, and M. Vetterli. Spatio-temporal Gradient Analysis of
Differential Microphone Arrays. Preprint 126th Conv. Aud. Eng. Soc., 2009a.
M. Kolundžija, C. Faller, and M. Vetterli. Sound Field Reconstruction: An Improved
Approach for Wave Field Synthesis. Preprint 126th Conv. Aud. Eng. Soc., 2009b.
M. Kolundžija, C. Faller, and M. Vetterli. Designing Practical Filters for Sound Field
Reconstruction. Preprint 127th Conv. Aud. Eng. Soc., 2009c.
M. Kolundžija, C. Faller, and M. Vetterli. Baffled Circular Loudspeaker Array with
Broadband High Directivity. IEEE International Conference on Acoustics, Speech,
and Signal Processing, March 2010a.
M. Kolundžija, C. Faller, and M. Vetterli. Sound field recording by measuring gradients. Preprint 128th Conv. Aud. Eng. Soc., 2010b.
M. Kolundžija, C. Faller, and M. Vetterli. Spatiotemporal gradient analysis of differential microphone arrays. J. Audio Eng. Soc, 59(1/2):20–28, 2011a.
M. Kolundžija, C. Faller, and M. Vetterli. Design of a compact cylindrical loudspeaker
array for spatial sound reproduction. Preprint 130th Conv. Aud. Eng. Soc., 2011b.
M. Kolundžija, C. Faller, and M. Vetterli. Reproducing sound fields using MIMO
acoustic channel inversion. accepted to J. Audio Eng. Soc, Nov 2011c.
H. Kuttruff. Room Acoustics. Taylor & Francis, 2000.
H. Lebret. Optimal beamforming via interior point methods. Journal of VLSI Signal
Processing, 14(1):29–41, 1996.
H. Lebret and S. Boyd. Antenna array pattern synthesis via convex optimization.
IEEE Trans. Sig. Proc, 45(3):526–532, 1997.
S.P. Lipshitz. Stereo microphone techniques: Are the purists wrong? J. Audio Eng.
Soc, 34(9):716–744, 1986.
166
Bibliography
A. Mäkivirta, P. Antsalo, M. Karjalainen, and V. Välimäki. Modal equalization of
loudspeaker-room responses at low frequencies. J. Audio Eng. Soc, 51(5):324–343,
2003.
J. Merimaa. Applications of a 3-D microphone array. Preprint 112th Conv. Aud. Eng.
Soc., 2002.
J. Meyer and G. Elko. A highly scalable spherical microphone array based on an
orthonormal decomposition of the soundfield. IEEE International Conference on
Acoustics, Speech, and Signal Processing, 2002.
B.C.J. Moore. An Introduction to the Psychology of Hearing. Academic Press, 1989.
P.M. Morse and K.U. Ingard. Theoretical Acoustics. Princeton University Press, 1968.
Y. Nakashima, T. Yoshimura, N. Naka, and T. Ohya. Prototype of Mobile Super
Directional Loudspeaker. NTT DoCoMo Techical Journal, 8(1):25–32, 2006.
P.A. Nelson and S.J. Elliott. Active Control of Sound. Academic Press, 1992.
H.F. Olson. Gradient microphones. J. Acoust. Soc. Am, 17:192–198, 1946.
H.F. Olson. Directional microphones. J. Audio Eng. Soc, 15(4):420–430, 1967.
H.F. Olson. The quest for directional microphones at RCA. J. Audio Eng. Soc, 28:
776–786, 1980.
A.V. Oppenheim and R.W. Schafer. Discrete-Time Signal Processing. Prentice Hall
(Englewood Cliffs, NJ), 1989.
J.A. Pedersen. Sampling the energy in a 3-D sound field. Preprint 130th Conv. Aud.
Eng. Soc, 2007.
V.M.A. Peutz. Articulation loss of consonants as a criterion for speech transmission
in a room. J. Audio Eng. Soc, 19(11):915–919, 1971.
M.A. Poletti. A unified theory of horizontal holographic sound systems. J. Audio
Eng. Soc, 48(12):1155–1182, 2000.
M.A. Poletti. The design of encoding functions for stereophonic and polyphonic sound
systems. J. Audio Eng. Soc, 44(11):948–963, 1996.
D.A. Preves, T.S. Peterson, and M.A. Bren. In-the-ear hearing aid with directional
microphone system, May 1998. US Patent 5,757,933.
V. Pulkki and C. Faller. Directional audio coding: filterbank and stft-based design.
Preprint 120th Conv. Aud. Eng. Soc., 2006.
V. Pulkki and M. Karjalainen. Localization of amplitude-panned virtual sources i:
stereophonic panning. J. Audio Eng. Soc, 49(9):739–752, 2001.
D.H. Raab. Forward and backward masking between acoustic clicks. J. Acoust. Soc.
Am, 33:137–139, 1961.
Bibliography
167
R. Raangs, W.F. Druyvesteyn, and H.E. De Bree. A low-cost intensity probe. J.
Audio Eng. Soc, 51(5):344–357, 2003.
B. Rafaely. Analysis and design of spherical microphone arrays. IEEE Transactions
on Speech and Audio Processing, 13(1):135–143, 2005.
B. Rafaely, I. Balmages, and L. Eger. High-resolution plane-wave decomposition in
an auditorium using a dual-radius scanning spherical microphone array. J. Acoust.
Soc. Am, 122:2661–2668, 2007a.
B. Rafaely, B. Weiss, and E. Bachmat. Spatial aliasing in spherical microphone arrays.
IEEE Trans. Sig. Proc, 55(3):1003–1010, 2007b.
P. Scheiber. Four channels and compatibility. J. Audio Eng. Soc, 19(4):267–279, 1971.
Sennheiser. Audiobeam. http://www.sennheiser.com, 2011.
N.J.A. Sloane, R.H. Hardin, and W.D. Smith. Tables of spherical codes. Published
electronically at http://www.research.att.com/~njas/packings.
W. Snow. Basic principles of stereophonic sound. IRE Transactions on Audio, 3(2):
42–53, 1955.
Sonic Emotion.
3D Sound.
heartheworldin3D.html, 2011.
http://www.sonicemotion.com/se/ch/
S. Spors. Extension of an analytic secondary source selection criterion for wave field
synthesis. In Preprint 123th Conv. Aud. Eng. Soc., 2007.
E.W. Start. Application of curved arrays in wave field synthesis. Preprint 100th Conv.
Aud. Eng. Soc., 1996.
P. Stoica and R.L. Moses. Introduction to Spectral Analysis. Prentice Hall, Upper
Saddle River, New Jersey, 1997.
H. Teutsch and W. Kellermann. EB-ESPRIT: 2D localization of multiple wideband
acoustic sources using eigen-beams. IEEE International Conference on Acoustics,
Speech, and Signal Processing, 2005.
H. Teutsch and W. Kellermann. Acoustic source detection and localization based on
wavefield decomposition using circular microphone arrays. J. Acoust. Soc. Am, 120:
2724–2736, 2006.
F.E. Toole. Sound Reproduction: Loudspeakers and Rooms. Focal Press, 2008.
A. Unsöld. Beiträge zur quantenmechanik der atome. Annalen der Physik, 387(3):
355–393, 1927.
M. Van der Wal, E.W. Start, and D. de Vries. Design of logarithmically spaced
constant-directivity transducer arrays. J. Audio Eng. Soc, 44:497–507, 1996.
H.L. Van Trees. Optimum Array Processing (Detection, Estimation, and Modulation
Theory, Part IV). New York: John Wiley & Sons, Inc, 2002.
168
Bibliography
B.D. Van Veen and K.M. Buckley. Beamforming: A versatile approach to spatial
filtering. IEEE ASSP Magazine, 5(2):4–24, 1988.
E.N.G Verheijen. Sound Reproduction by Wave Field Synthesis. PhD thesis, Delft
University of Technology, 1997.
H. Wallach, E.B. Newman, and M.R. Rosenzweig. The precedence effect in sound
localization. The American Journal of Psychology, 62(3):315–336, 1949.
F. Wang, V. Balakrishnan, P.Y. Zhou, J.J. Chen, R. Yang, and C. Frank. Optimal
array pattern synthesis using semidefinite programming. IEEE Transactions on
Signal Processing, 51(5):1172–1183, 2003.
D.B. Ward, R.A. Kennedy, and R.C. Williamson. Theory and design of broadband
sensor arrays with frequency invariant far-field beam patterns. J. Acoust. Soc. Am,
97(2):1023–1034, 1995.
T. Welti and A. Devantier. Low-frequency optimization using multiple subwoofers.
J. Audio Eng. Soc, 54(5):347–364, 2006.
E.G. Williams. Fourier Acoustics. Academic Press, 1999.
R.J. Wilson, M.D. Capp, and J.R. Stuart. The loudspeaker-room interface-controlling
excitation of room modes. In Proc. of the 23rd AES Conf, 2003.
Yamaha. Sound Bar / Digital Sound Projector. http://www.yamaha.com, 2011.
S. Yan and Y. Ma. Design of fir beamformer with frequency invariant patterns via
jointly optimizing spatial and frequency responses. IEEE International Conference
on Acoustics, Speech, and Signal Processing, 2005.
M. Yoneyama, J. Fujimoto, Y. Kawamo, and S. Sasabe. The audio spotlight: An
application of nonlinear interaction of sound waves to a new type of loudspeaker
design. J. Acoust. Soc. Am, 73(5):1532–1536, 1983.
E. Zwicker and H. Fastl. Psychoacoustics: Facts and Models, volume 22. Springer
Verlag, 1999.
Curiculum Vitæ
Mihailo Kolundžija
Audiovisual Communications Laboratory (LCAV)
Swiss Federal Institute of Technology (EPFL)
1015 Lausanne, Switzerland
Personal
Date of birth:
Nationality:
Civil status:
March 28, 1981
Serbian
Single
Education
2007–2011
2005–2007
1999–2004
PhD candidate in School of Computer and Communication
Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland
MSc in Communication Systems, Swiss Federal Institute of
Technology (EPFL), Lausanne, Switzerland
Dipl. Ing. in Electrical Engineering and Computer Science,
Faculty of Technical Sciences, Novi Sad, Serbia
Professional experience
04/2007–present
07/2010–09/2010
03/2004–08/2004
07/2002–09/2002
Research and teaching assistant, Audiovisual Communications Laboratory (LCAV), Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland
Software engineer intern, Google Inc., Mountain View,
USA
Research intern, Micronas GmbH, Freiburg, Germany
Research intern, Montanuniversität Leoben, Leoben, Austria
169
Curiculum Vitæ
170
Publications
Journal papers
1.
2.
M. Kolundžija, C. Faller, and M. Vetterli. Reproducing Sound Fields Using
MIMO Acoustic Channel Inversion. Journal of the Audio Engineering Society.
Nov. 2011.
M. Kolundžija, C. Faller, and M. Vetterli. Spatio-Temporal Gradient Analysis
of Differential Microphone Arrays. Journal of the Audio Engineering Society.
Feb. 2011.
Conference papers
1.
2.
3.
4.
5.
6.
7.
M. Kolundžija, C. Faller, and M. Vetterli. Design of a Compact Cylindrical
Loudspeaker Array for Spatial Sound Reproduction. AES 130th Convention.
May 2011
M. Kolundžija, C. Faller, and M. Vetterli. Sound Field Recording by Measuring Gradients. AES 128th Convention. May 2010
M. Kolundžija, C. Faller, and M. Vetterli. Baffled Circular Loudspeaker Array With Broadband High Directivity. IEEE International Conference on
Acoustics, Speech, and Signal Processing. Mar. 2010
M. Kolundžija, C. Faller, and M. Vetterli. Designing Practical Filters For
Sound Field Reconstruction. AES 127th Convention. Oct. 2009
M. Kolundžija, C. Faller, and M. Vetterli. Sound Field Reconstruction: An
Improved Approach For Wave Field Synthesis. AES 126th Convention. May
2009
M. Kolundžija, C. Faller, and M. Vetterli. Spatio-Temporal Gradient Analysis
of Differential Microphone Arrays. AES 126th Convention. May 2009
C. Faller and M. Kolundžija. Design and Limitations of Non-Coincidence
Correction Filters for Soundfield Microphones. AES 126th Convention. May
2009
Awards and honors
2011
2007
2003 & 2005
2004
2003
130th AES Convention Technical Student Paper Award
Landry prize of the EPFL for a master’s thesis work
Mileva Marić Einstein prize of the University of Novi Sad, given
to the best student in computer science and engineering
Best student award of the Faculty of Technical Sciences
Royal Norwegian Embassy in Belgrade award, given to the top 500
Serbian students
Languages
English (fluent), German (good), French (good), Czech (fair), Italian (basic), Serbian
(native)