ELEC-C5340 Tilakuulon ja binauraalisen tekniikan perusteet
Transcription
ELEC-C5340 Tilakuulon ja binauraalisen tekniikan perusteet
ELEC-C5340 Tilakuulon ja binauraalisen tekniikan perusteet Ville Pulkki Department of Signal Processing and Acoustics Aalto University, Finland October 12, 2015 Tämä luento Miten ihminen kuulee äänen tilaominaisuuksia Pään aiheuttama muutos ääneen (HRTF / HRIR) Binauraaliset tekniikat kuulokkeille Introduction Pulkki Dept Signal Processing and Acoustics 2/3 October 12, 2015 Where and what? Localization of sources Beam-forming towards different directions squawk chirp chirp brr quack! toot! wind sound swooh clip clop swooh rustle Eye vs ear Eye: lens projects light to retina. Retina has a great number of light-sensitive cells. Retina cells are thus per se sensitive to direction. Eye: very high spatial accuracy, limited range of wavelengths (400 800nm) Ear: cochlea sensitive to vast range of wavelengths (2cm - 20m) Ear: cochlea has no direct sensitivity to direction of sound All human directional hearing is based on signal analysis in the brains This chapter Basic concepts Head-related acoustics Localization cues Localization accuracy Perception of direction in presence of echoes Ability to listen selectively specific direction Distance perception Coordinate system Azimuth-elevation / median plane Frontal plane z Median plane y δ r ϕ x front Horizontal plane Coordinate system 2 Cone of confusion r δcc Cone of confusion ϕcc Basic concepts Listening conditions Monaural hearing (hearing processes and cues that need signal from only one ear) BInaural hearing (differences in ear signals have also an effect) Spatial sound reproduction methods Monotic (signal fed to only one ear) Diotic (same signal fed to both ears) Dichotic (different signal fed to ears) Dichotic listening with headphones a) b) ph1 ph2 signal delays a1 a2 signal attenuators Head-related acoustics Let us consider only free field first How does incoming sound change as it hits the listener? What is the transfer function from source to ear canal? X Yl Yleft = HleftX yleft =h left(t)*x(t) Head-related acoustics measurements Transfer function from source to ear canal Place a microphone to ear canal or use dummy heads Head-related transfer function, head-related impulse response Depends heavily on the direction of the source Head related transfer function (HRTF) Measured with real subject for three directions Magnitude response 0dB -10 -20 -30 -40 10 2 -40 10 4 Hz 0dB -30 -40 10 2 10 3 10 2 -40 10 4 Hz 10 0 0 -10 -40 10 2 10 3 2 -20 right ear ϕ = 60 δ =0 -30 10 4 Hz 10 3 b) left ear ϕ =0 δ = 60 -30 -10 -20 right ear ϕ =0 δ =0 -20 left ear ϕ = 60 δ =0 -30 -10 -20 0 -10 -20 left ear ϕ =0 δ =0 10 3 a) 0 -10 -40 10 4 10 Hz right ear ϕ =0 δ = 60 -30 4 10 Hz 10 3 c) 2 10 3 4 10 Hz Head related impulse response (HRIR) HRIR == HRTF in time domain, quite often HRIRs are also called HRTFs 0.2 0.2 left ear ϕ = 0o δ = 0o 0.1 0 left ear ϕ = 60o δ = 0o 0.1 0 -0.1 -0.1 -0.2 1 0.2 a) 2 right ear ϕ = 0o δ = 0o 0.1 0 -0.1 1 0.2 b) 2 right ear ϕ = 60o δ = 0o 0.1 0 1 2 ms 1 0.2 c) 2 ms right ear ϕ = 0o δ = 60o 0.1 0 -0.1 -0.2 0 0 ms -0.1 -0.2 0 -0.2 0 ms left ear ϕ = 0o δ = 60o 0.1 -0.1 -0.2 0 0.2 -0.2 0 1 2 ms 0 1 2 ms HRTF in horizontal plane Azimuth [degrees] Horizontal plane HRTF left ear [dB] 0 0 31 63 94 124 153 −179 −150 −121 −92 −61 −30 −5 −10 −15 −20 0.1 0.2 0.4 0.8 1.6 3.2 6.4 12.8 Frequency [kHz] −25 HRIR in horizontal plane Azimuth [degree] Horizonal plane HRIR left ear 0 31 63 94 124 153 −179 −150 −121 −92 −61 −30 0.8 0.6 0.4 0.2 0 −0.2 −0.4 0 0.5 1 1.5 Time [ms] 2 HRTF in median plane Median plane HRTF left ear [dB] 0 Elevation cc [degrees] −45 0 −5 45 −10 90 −15 135 −20 180 225 0.4 0.8 1.6 3.2 6.4 Frequency [kHz] 12.8 −25 HRIR in median plane Median plane HRIR left ear 0.05 Elevation cc [degrees] −45 0 45 90 0 135 180 225 0 0.5 1 1.5 2 2.5 Time [ms] 3 3.5 −0.05 Localization cues HRTFs impose some spatial information to ear canal signals What information do we dig out from those? Binaural localization cues Interaural time difference, ITD Interaural level difference, ILD Monaural localization cues: spectral cues Dynamic cues Interaural time difference (ITD) Sound arrives a bit earlier to one ear than the other ITD varies between -0.7ms and +0.7ms JND of ITD is of order of 20µs = D D ( + sin 2c Computed from equation ϕ Time difference ITD [μs] 600 ΔS ) 400 Measured values 200 0 0 30 60 Azimuth angle ϕ [o] 90 Interaural time difference (ITD) We are sensitive to phase differences at low frequencies time differences between envelopes at higher frequencies low frequencies ~200 − ~1600 Hz high frequencies > ~1600 Hz time delay btw envelopes time/phase delay btw carriers left right ITD ITD Interaural time difference HRTFs measured from a human, ITD of noise sound source in free field computed with an auditory model 0.2 0.4 0.7 1.1 1.7 2.6 3.9 5.7 8.5 12.418.2 1 ITD [ms] 0.5 0 −0.5 −1 −90 −60 −30 0 30 60 Azimuth [degree] 90 0.2 0.4 0.7 1.1 1.7 2.6 3.9 5.7 8.5 12.4 Frequency [kHz] 18.2 Interaural level difference (ILD) Head shadows incoming sound Effect depends on wavelength, no shadowing at low frequencies distance, larger ILD at very short distances (< 1m) ILD varies between -20dB and 20dB JND is of order of 1dB for sources near median plane Interaural level difference HRTFs measured from a human, ILD of sound source in free field computed with an auditory model 0.2 0.7 0.4 1.1 1.7 2.6 3.9 5.7 18.2 8.5 12.4 30 ILD [phon] 20 10 0 −10 −20 −30 −90 −60 −30 0 30 60 Azimuth [degree] 90 0.2 0.4 0.7 1.1 1.7 2.6 3.9 5.7 Frequency [kHz] 8.5 12.4 18.2 Basic lateralization results Subjects report the position of internalized auditory source on interaural axis b) ph2 delays a1 left earlier 0 2 6 perceived lateralization right 2 signal left later 4 left perceived lateralization 6 4 6 -1500 -1000 -500 0 interaural delay 500 1000 1500 τph [μs] (Toole and Sayers, 1965) attenuators a2 signal left stronger 4 2 left weaker right ph1 0 2 left a) 4 6 -12 -6 0 6 12 interaural level difference [dB] (Sayers, 1964) Addtional cues Binaural cues (ITD and ILD) in principle resolve only the cone of confusion where the source lies Other cues are used to resolve the direction inside the cone monaural spectral cues dynamic cues (effect of head movements to cues) θcc sound source cc cone of confusion Monaural spectral cues Effect of pinna diffraction Effect of reflections/diffraction from torso Monaural spectral cues Subject 012 HRTF [dB] Subject 017 HRTF [dB] 0 0 −5 45 90 −10 135 −15 180 225 0.8 1.6 3.2 6.4 Frequency [kHz] 12.8 0 −45 Elevation cc [degrees] Elevation cc [degrees] −45 0 −5 45 90 −10 135 −15 180 −20 225 0 −45 0.8 Elevation cc [degrees] 0 −5 45 90 −10 135 −15 180 225 0.8 1.6 3.2 6.4 Frequency [kHz] 12.8 −20 Subject 019 HRTF [dB] −45 12.8 −20 Elevation cc [degrees] Subject 018 HRTF [dB] 1.6 3.2 6.4 Frequency [kHz] 0 0 −5 45 90 −10 135 −15 180 225 0.8 1.6 3.2 6.4 Frequency [kHz] 12.8 −20 Monaural spectral cues Source has to have relatively flat spectrum Relatively reliable cue for most natural sounds Narrow-band sounds hard to localize (mosquito in dimmed room) Humans adapt relatively fast to new HRTFs, especially of visual cues are available The ears also grow throughout the life, constant adaptation Dynamic cues When head rotates, ITD and ILD and spectral cues change Powerful though coarse cue Resolves efficiently front/back/inside-the-head locations, but not fine details ITD & ILD constant ITD & ILD change a lot head rotation ITD & ILD change a lot in opposite direction Interaction between spatial hearing and vision Both senses try to find out "where" is the source If visual cue is clear, vision dominates Ventriloquism, within about 30 area visual image captures auditory image If visual cue is blurry, or not prominent, auditory cue wins In some cases two events occur, visual event and auditory event Accuracy of localization How well do we localize point-like sources? What is the accuracy of directional hearing? Azimuth / elevation / 3D Accuracy of perception of spatial distribution of wide sound sources Binaural techniques Ear canal signals are the main input to hearing Why not replicate only them? Recording/reproduction/synthesis of ear canal signals Challenges: dynamic cues (head movements), tactile perception Binaural recording, headphone playback careful microphone and headphone equalization binaural cues and auditory spectrum reproduced as were in recording in some cases this is appealing solution Applications: personalized recording, academic use, noise measurements, augmented reality audio Binaural recording Challenges headphone equalization is problematic listener head movements does change binaural cues inside-head localization front-to-back confusions vision conflicts with audition works best only with recordings made with your own head Binaural synthesis, headphones convolve monophonic sound tracks with measured [individual] HRTFs auditory objects can be positioned in 3D virtual space inside-head localization, front-back confusions need of individual HRTFs c Bill Gardner head tracking may be used to resolve this virtual reality, gaming, aviation playback of surround audio content over multiple virtual loudspeakers Practical implementation of HRTF syntesis HRIRs are relatively long FIRs (around 100-150 taps) Computational constraints HRIRs have zeros in start: can be replaced with simple delay How to find the number of zeros in start? Thresholding Cross correlation between HRIRs, this gives time delay How to implement HRIR coming after zeros? Minimum-phase version of the remaining FIR filter (with fewer taps) IIR filter with few taps Introduction Pulkki Dept Signal Processing and Acoustics 3/3 October 12, 2015 Head related impulse response (HRIR) HRIR == HRTF in time domain, quite often HRIRs are also called HRTFs 0.2 0.2 left ear ϕ = 0o δ = 0o 0.1 0 left ear ϕ = 60o δ = 0o 0.1 0 -0.1 -0.1 -0.2 1 0.2 a) 2 right ear ϕ = 0o δ = 0o 0.1 0 -0.1 1 0.2 b) 2 right ear ϕ = 60o δ = 0o 0.1 0 1 2 ms 1 0.2 c) 2 ms right ear ϕ = 0o δ = 60o 0.1 0 -0.1 -0.2 0 0 ms -0.1 -0.2 0 -0.2 0 ms left ear ϕ = 0o δ = 60o 0.1 -0.1 -0.2 0 0.2 -0.2 0 1 2 ms 0 1 2 ms