SoundLoc_1_2012

Transcription

SoundLoc_1_2012
10/11/12
Sound localization psychophysics
Eric Young
A good reference:
B.C.J. Moore An Introduction to the Psychology of Hearing Chapter 7,
Space Perception. Elsevier, Amsterdam, pp. 233-267 (2004).
Sound localization: what is it good for?
1
10/11/12
Where’s the bird?
What are the cues for sound
localization? There are two
binaural cues:
ITD: interaural differences in
time of arrival of the sound
ILD: interaural differences in the
loudness (sound level) of the
sound
Note that ITD can be separated
into two cues: 1) the ITD of the
envelope of the sound (ITD
above) and 2) the ITD of the
details of the waveform (the fine
structure, IPD at right) ITD and
IPD are numerically
approximately equal. However IPD cues are much
stronger perceptually.
2
10/11/12
Interaural time differences are primarily a cue for azimuth and can be predicted
approximately from a simple geometric head-shadow model.
data
model
Woodworth, 1962
There is a small dependence of ITD on frequency
human
cat
(ms)
X!
Dashed lines are
predictions of the
Woodworth model
(?? For cat)
(Hz)
Kuhn, 1977 and Roth et al. 1980
3
10/11/12
ITDs are primarily a cue for azimuth in that they vary little with elevation. Middlebrooks and Green, 1990
ILDs are also mainly a cue for azimuth, although at higher frequencies, they provide
additional information.
Middlebrooks and Green, 1990
4
10/11/12
How are different cues integrated? ITDs are used at low frequencies and ILDs at high frequencies.
Blue curves show the cues measured at the ears for a speaker at the MAA. Green curves show the
minimum detectable ITD and ILD, based on headphone experiments.
ITD matches
ILD matches
spectral
cues here?
IPD!
Mills, 1972
What is the interaction of cues when they don’t correspond? Are ITDs and ILDs
perceptually equivalent? It is possible for subjects to adjust the ITD of a sound
(ordinate) so as to center the image of a sound presented with a certain ILD
(abscissa)?
Note ITDs are more
effective at low
frequencies
Harris, 1960
5
10/11/12
But that doesn't mean that ITD and ILD
produce equivalent percepts. Here
subjects discriminated a 500 Hz tone
with 0 ITD from a similar tone with
one of five ITDs as the ILD of the
second tone varied. The
discriminability (d') is plotted versus
the ILD.
Perfect trading would have resulted in
d' minima near 0. Note that as the ITD
increases, the "best" ILD (the minimum
of the curve) gives a stimulus with
higher and higher discriminability.
So ITD and ILD are not perceptually
equivalent, despite the trading ratio
experiment.
X!
Hafter and Carrier, 1972
What are the cues for elevation and front/back? Because the head is approximately
symmetrical, locations along a cone of confusion all produce roughly the same binaural cues ITD
and ILD. Thus ITD and ILD provide little information about elevation and there is a confusion
about front vs back.
6
10/11/12
The ambiguity is resolved by spectral cues produced by the external ear. The amplitude of sound
at the eardrum is modified by reflections (interference patterns) in the pinna. The pattern of
modification, plotted below, varies with the direction of the sound source
30˚
0˚
-15˚
Two sound
paths through
the pinna
Shaw
The notch in HRTFs can be predicted by a parabolic reflector model, in which the
reflector represents the posterior wall of the concha.
Directional gains at
various frequencies
Transfer functions (HRTFs) are
simpler than in real ears, but
capture the general featurs of
the notches.
X!
7
10/11/12
Evidence that pinna acoustics are
important for location in elevation:
occluding the cavities of the pinnae
decreases performance in a sound
source elevation task
Gardner and Gardner, 1973
In order to use spectral cues accurately, the stimuli must be broadband. With narrowband stimuli (1/6 octave noiseband), the
percept of elevation depends on the frequency content
of the stimulus, and not its source direction. 6 kHz
noise sounds like above and 8 kHz noise sounds like
below in this subject.
Middlebrooks, 1992
8
10/11/12
The place pointed to seems to correspond to peaks in the HRTF. The auditory system
does its best with inadequate information.
12 kHz
az, el
10 kHz
8 kHz
HRTFs at the places
pointed to for narrowband
stimuli centered at 6, 8,
10, and 12 kHz (the
stars).
Fitting HRTF to spectrum:
12 kHz sound presented
from -40, +40 is localized
at -145,-17, where the
HRTF better matches the
spectrum.
actual
position
6 kHz
The subject's response can
be predicted from the
spatial correlation of the
stimulus spectrum and the
HRTF. Contours of
correlation are shown.
subject's
response
X!
Middlebrooks, 1992
Cue trading revisited: broadband noise is presented over headphones that simulate
virtual-space by incorporating HRTFs. The result is good localization with all cues
present.
However, when ITD cues
are set to 0 or to -45˚ or
90˚ (??), it is clear that the
ITD cues dominate the
others. That is, the
localization in azimuth
follows the ITD cues and
the elevation performance
is degraded.
This result occurs only if
the stimulus contains lowfrequency energy.
X!
Wightman and Wightman
9
10/11/12
How might localization be
represented in the brain?
The P(τ) model.
Assume a neural display
with ITD on one axis and
BF (or CF) on the other.
For an 0.5 kHz tone with an
ITD of 0.5 ms, there are
extra peaks at one period of
0.5 kHz (±2 ms).
These are eliminated with a
centering function, e.g. the
number of neurons with
each ITD.
Giving a good reading of
the actual ITD.
Stern, Bernstein, and Trahiotis
0.5 kHz, 0.5 ms
A challenge to the model:
broadband noise (actually 500
Hz BW centered at 500 Hz)
with ITD = -1.5 ms. The P(τ)
function gives ambiguous
cues that vary across
frequency. noise, -1.5 ms
Subject’s localization (ignore
curves other than filled
squares)
Narrow band noise gives the same answer as a tone,
due to the centering function.
But, at full bandwidth, the noise is perceived as
having a negative ITD . . .
Stern, Bernstein, and Trahiotis
10
10/11/12
noise, -1.5 ms
. . . . even though centered
weighting doesn't give the right
answer in that case.
Stern, Bernstein, and Trahiotis
The answer is a second
level of cross-BF
coincidence analysis,
straightness weighting
(black lines and dots). This
amplifies P(τ) where the
ITD cue is the same across
frequency.
Now the estimated ITD is
closer to correct.
Stern, Bernstein, and Trahiotis
11
10/11/12
Comparison of the model to data.
X!
Stern, Bernstein, and Trahiotis
Binaural unmasking: Using localization to reduce
interference or masking: the cocktail party effect.
12
10/11/12
Binaural masking level differences
are a part of the explanation.
The results are from tone detection
experiments in a noise masker. The
relative phases in the two ears of
the tone (S) and noise (N) are given
by the subscripts.
Noise with a a different interaural
phase (different location) is less
effective in masking a tone (by up
to 15 dB!). The effect is strongest at low
frequencies, but continues at high
frequencies (ordinate is N0S0 /
NπS0). This corresponds roughly to
the strength of phase-locking in the
auditory nerve.
An important deficit in hearing
impairment is the loss of
binaural unmasking. Note the speech reception
thresholds (SRT, the signal/
noise ratio at threshold for
speech intelligibility) are
worse in impaired listeners for
various noise conditions.
Bronkhorst Plomp 1989
13
10/11/12
Using localization to suppress echoes: the precedence effect
With reverberation,
the first sound that
arrives (black Xs) is
more accurate than
subsequent sound
(gray dots).
Direction to which
ITDs point for a 580
Hz tone at three
directions. The first
few ms of
information are more
accurate in the cases
with reverberation.
Data from a MSO
model with 4 ms
sequential analysis
bins. Shinn-Cunningham et al. 2003
14
10/11/12
Precedence decreases the information about location for the second of two
stimuli, presumed to be an echo.
15