acoustic impulse detection algorithms for application in gunshot

Transcription

acoustic impulse detection algorithms for application in gunshot
ACOUSTIC IMPULSE DETECTION ALGORITHMS
FOR APPLICATION IN GUNSHOT LOCALISATION
by
J.F. VAN DER MERWE
Submitted in partial fulfilment of the requirements for the degree
MAGISTER TECHNOLOGIAE: ELECTRICAL
ENGINEERING
in the
Department of Electrical Engineering
FACULTY OF ENGINEERING & THE BUILT ENVIRONMENT
TSHWANE UNIVERSITY OF TECHNOLOGY
Supervisor: Dr. J.A. Jordaan
November 2012
DECLARATION BY CANDIDATE
“I hereby declare that the dissertation submitted for the degree M.Tech: Electrical
Engineering, at Tshwane University of Technology, is my own work and has not previously
been submitted to any other institution of higher education. I further declare that all sources
cited or quoted are indicated and acknowledged by means of a comprehensive list of
references”.
J.F. van der Merwe
Copyright© Tshwane University of Technology 2012
ii
DEDICATION
I dedicate this to the preservation of all wild life…
iii
ACKNOWLEDGEMENTS
I want to thank the Creator for the opportunity that was given to me, whom without none of
this would have been possible.
I also wish to thank Dr. Jaco Jordaan for all his insight, effort and patience in helping me
with this project.
Furthermore I would like to thank F’SATI and Mr. André Hattingh for helping me to obtain
the environmental data, used to compile this project and also for the opportunity to do so.
Lastly I wish to thank my family for all their support and love during this time.
iv
ABSTRACT
In a society that becomes more gun driven each day and with a great decline in endangered
wildlife species, a need was created for a system that can identify and pinpoint the position
of gunfire events from within a natural environment. This dissertation researches, simulates
and compares different impulse detection algorithms for the application of gunshot
localisation. The area of research includes Generalised Cross Correlation (GCC), Least
Square (LS) training algorithms as well as training algorithms using a Reproducing Kernel
Hilbert Space (RKHS) approach. Lastly it also incorporates Support Vector Machines
(SVM) for training a network to recognise gunshot impulses. The gunshot sound can be
corrupted with greatly amplified noise or nearby sounds like speech.
Gunshot sounds were recorded using a star topology array of 3 microphones, connected to a
low pass filter and amplifier. Gunshot sounds of large and medium calibre guns were
recorded at different distances away from the microphone setup. These were used to create
templates of large and medium calibre guns, used for training a system, to recognise a
gunshot at different distances and sound environments.
The GCC and the SVM algorithms proved to be the most accurate over all the different
distances. The GCC algorithm executed much faster than the SVM algorithm, but given
more sound templates and equipment with higher processing power, the SVM algorithm
might be more accurate at even larger distances.
The output of this research can be used to create an anti-poaching system, especially for
endangered species like elephants and rhinos.
v
EKSERP
In 'n gemeenskap wat elke dag meer geweer gedrewe word en met 'n groot afname in
bedreigde natuurlewe spesies, het daar ‘n leemte ontstaan vir ‘n stelsel wat geweerskote kan
identifiseer en die posisie van die geweerskote kan opspoor vanuit ‘n natuurlike omgewing
Hierdie verhandeling vors na, simuleer en vergelyk verskillende impuls opsporings
algoritmes vir die toepassing van geweerskoot lokalisering. Die gebied van navorsing sluit
in, Algemene Kruis Korrelasie (AKK), Kleinste Kwadrate (KK) algoritmes vir opleibare
netwerke asook algoritmes vir 'n Herproduseerbare Kern Hilbert Ruimte (HKHR)
benadering. Ten slotte inkorporeer die verhandeling ook Ondersteunbare Vektor Masjiene
(OVM) vir opleibare netwerke in geweerskoot impuls herkenning. Die geweerskoot klank
kan besoedel wees met groot versterkte geraas of nabygeleë klanke soos spraak.
Geweerskoot klanke was opgeneem deur gebruik te maak van ‘n stêr topologie struktuur
van 3 mikrofone wat verbind was aan ‘n laaglaatfilter en versterker. Geweerskoot klanke
van groot en medium kaliber gewere was opgeneem op verskillende afstande vanaf die
mikrofoonstelsel. Die klanke was dan gebruik om profielvorme van groot en medium
kaliber gewere te skep, wat dan gebruik is vir opleiding van 'n stelsel om geweerskoot klank
herkenning
te
doen
op
verskillende
afstande
en
klank
omgewings.
Die AKK en die OVM algoritmes was die mees akkuraatste op al die verskillende afstande.
Die AKK het baie vinniger uitgevoer as die OVM algoritme, maar met meer klank
profielvorme en toerusting met 'n hoër verwerkingspoed, behoort die OVM algoritme selfs
meer akkuraat te wees op groter afstande.
vi
Die uitset van hierdie navorsingsprojek kan gebruik word om 'n anti-stroping stelsel te skep,
veral vir bedreigde spesies soos olifante en renosters.
vii
TABLE OF CONTENTS
LIST OF FIGURES .............................................................................................................. xiii
LIST OF TABLES .............................................................................................................. xvii
GLOSSARY ....................................................................................................................... xviii
1.
2
INTRODUCTION ........................................................................................................... 1
1.1
PROBLEM STATEMENT ..................................................................................... 3
1.2
DELIMITATIONS .................................................................................................. 3
1.3
BENEFITS OF STUDY .......................................................................................... 4
1.4
CONTRIBUTIONS OF STUDY ............................................................................ 4
1.5
FUNCTIONAL BREAKDOWN OF SYSTEM...................................................... 4
1.6
DISSERTATION LAYOUT ................................................................................... 7
LITERATURE REVIEW ................................................................................................ 8
2.1
THE MULTI-BILLION DOLLAR INDUSTRY OF POACHING ........................ 8
2.2
GUNSHOT DETECTION THEORECTICAL OVERVIEW ............................... 10
2.2.1 ACOUSTIC SENSING ...................................................................................... 11
2.2.2 OPTICAL SENSING ......................................................................................... 12
2.3
GUNSHOT DETECTION AND LOCALISATION APPLICATIONS AND
STRATEGIES ....................................................................................................... 13
2.3.1 GUNSHOT DETECTION IN VIDEO AND FILM ........................................... 13
2.3.2 MUZZLE BLAST DETECTION AND LOCALISATION USING A JOINT
TACTICAL RADIO SYSTEM ....................................................................... 14
2.3.3 MUZZLE BLAST AND SHOCKWAVE DETECTION USING LIGHTNING
PROTOCOL..................................................................................................... 15
2.4
CONCLUSION ..................................................................................................... 16
viii
3
MATHEMATICAL MODELLING REVIEW ............................................................. 17
3.1
SYSTEM IDENTIFICATION .............................................................................. 17
3.2
ADAPTIVE FILTERS .......................................................................................... 19
3.2.1 NOISE CANCELLATION ................................................................................ 19
3.3
TIME DELAY ESTIMATION AND IMPULSE DETECTION USING
GENERALISED CORRELATION ...................................................................... 20
3.3.1 TDE USING GENERALISED CORRELATION ............................................. 21
3.3.2 PULSE DETECTION USING GENERALISED CROSS CORRELATION .... 22
3.4
LEAST SQUARES ............................................................................................... 23
3.4.1 LEAST SQUARES SIDELOBE MINIMISATION .......................................... 24
3.5
REPRODUCING KERNEL HILBERT SPACES ................................................ 26
3.5.1 NON-LINEAR TEMPLATE MATCHING FRAMEWORK ............................ 27
3.5.2 TEST INPUT-OUTPUT PAIRS ........................................................................ 28
3.5.3 REPRODUCING KERNEL TYPES .................................................................. 28
3.5.4 MINIMUM NORM TEMPLATES .................................................................... 29
3.6
SUPPORT VECTOR MACHINES....................................................................... 30
3.6.1 LINEAR SUPPORT VECTOR MACHINES .................................................... 31
3.6.2 NON-LINEAR SUPPORT VECTOR MACHINES .......................................... 33
3.6.3 LEAST SQUARE SUPPORT VECTOR MACHINES ..................................... 35
3.7
4
CONCLUSION ..................................................................................................... 36
EXPERIMENTAL SETUP ........................................................................................... 37
4.1
SOUND RECORDING EQUIPMENT SETUP.................................................... 37
4.2
LABVIEW EXPERIMENTAL PREPARATION ................................................ 38
4.3
REAL ENVIRONMENT DATA GATHERING .................................................. 39
4.4
CONCLUSION ..................................................................................................... 41
ix
5
RESULTS ...................................................................................................................... 42
5.1
OVERVIEW OF DATA RECORDING AND PRE-PROCESSING
PROCEDURES ..................................................................................................... 42
5.2
FREQUENCY SPECTRUM ANALYSIS ............................................................ 43
5.2.1 MAUSER POWER SPECTRUM ESTIMATE.................................................. 44
5.2.2 PISTOL POWER SPECTRUM ESTIMATE ..................................................... 46
5.2.3 POWER SPECTRUM ESTIMATE CONCLUSION ........................................ 47
5.3
GENERAL CROSS CORRELATION ................................................................. 48
5.3.1 TEMPLATE GENERATION ............................................................................ 48
5.3.2 ANALYSIS ........................................................................................................ 49
5.3.2.1 Cross Correlation of Pistol sound 500m away from array .......................... 50
5.3.2.2 Cross Correlation of Mauser sound 500m away from array........................ 51
5.3.2.3 Cross Correlation of Pistol sound 1000m away from array ........................ 53
5.3.2.4 Cross Correlation of Mauser sound 1000m away from array...................... 54
5.3.2.5 Cross Correlation of Mauser sound 1500m away from array...................... 56
5.3.2.6 Cross Correlation of Mauser sound 1700m away from array...................... 58
5.4
TRAINABLE TEMPLATE MATCHING ALGORITHMS ................................. 59
5.4.1 TEMPLATE MATCHING WITH LEAST SQUARES..................................... 59
5.4.2 RKHS USING A SECOND ORDER POLYNOMIAL KERNEL..................... 61
5.4.2.1 The Mauser Template used for RKHS ........................................................ 62
5.4.2.2 The Pistol Template used for RKHS ........................................................... 63
5.4.2.3 Mauser impulse detection using a 2nd order polynomial RKHS kernel at
500m ............................................................................................................ 64
5.4.2.4 Pistol impulse detection using a 2nd order polynomial RKHS kernel at 500m
..................................................................................................................... 65
x
5.4.2.5 Mauser impulse detection using a 2nd order polynomial RKHS kernel at
1000m .......................................................................................................... 66
5.4.2.6 Pistol impulse detection using a 2nd order polynomial RKHS kernel at
1000m .......................................................................................................... 67
5.4.2.7 Mauser impulse detection using a 2nd order polynomial RKHS kernel at
1500m .......................................................................................................... 68
5.4.2.8 Mauser impulse detection using a 2nd order polynomial RKHS kernel at
1700m .......................................................................................................... 69
5.4.3 SUPPORT VECTOR MACHINES .................................................................... 70
5.4.3.1 The Mauser Template used for SVM network ............................................ 70
5.4.3.2 The Pistol Template used for SVM ............................................................. 71
5.4.3.3 Impulse detection using a 2nd order polynomial SVM kernel and a Pistol
Training set for recordings 500m away ....................................................... 72
5.4.3.4 Impulse detection using a 3rd order polynomial SVM kernel and a Mauser
Training set for recordings 500m away ....................................................... 73
5.4.3.5 Impulse detection using a 2nd order polynomial SVM kernel and a Pistol
Training set for recordings 1000m away ..................................................... 74
5.4.3.6 Impulse detection using a 3rd order polynomial SVM kernel and a Mauser
Training set for recordings 1000m away ..................................................... 75
5.4.3.7 Impulse detection using a 3rd order polynomial SVM kernel and a Mauser
Training set for recordings 1500m away ..................................................... 76
5.4.3.8 Impulse detection using a 3rd order polynomial SVM kernel and a Mauser
Training set for recordings 1700m away ..................................................... 77
5.5
DETECTION ALGORITHM ACCURACY COMPARISON ............................. 78
xi
5.6
EXECUTION TIME COMPARISON BETWEEN THE DIFFERENT IMPULSE
DETECTION ALGORITHMS ............................................................................. 81
5.7
6
CONCLUSION ..................................................................................................... 83
CONCLUSION ............................................................................................................. 84
6.1
THE ACCURACY OF THE ALGORITHMS ...................................................... 84
6.2
COMPLEXITY MEASURED IN ALGORITHM PROCESSING TIME ............ 84
6.3
OVERALL PERFORMANCE OF THE IMPULSE DETECTION
ALGORITHMS ..................................................................................................... 85
6.4
FUTURE RESEARCH.......................................................................................... 85
6.4.1 ANTI-POACHING APPLICATION AND STRATEGY .................................. 86
6.4.1.1 Low-end gunshot detection modules ........................................................... 86
6.4.1.2 High-end gunshot detection modules .......................................................... 86
6.4.1.3 Habitat Protection Strategy .......................................................................... 87
7
BIBLIOGRAPHY ......................................................................................................... 89
APPENDIX A ....................................................................................................................... 94
A.1 LABVIEW EXPERIMENTAL PREPARATION...................................................... 94
xii
LIST OF FIGURES
Figure 1.1: Functional block diagram of the gunshot detector. ............................................... 5
Figure 3.1: Input X(S) changed by function H(S) to produce output Y(S)............................ 17
Figure 3.2: Block diagram of an adaptive filter as a noise canceller .................................... 20
Figure 3.3: Maximum-margin hyperplane and margins for an SVM.................................... 31
Figure 3.4: Mapping of non-linear input space to higher dimensional feature space ........... 34
Figure 4.1: Setup of recording equipment ............................................................................. 37
Figure 4.2: Top view of different positions of the gunshots fired relative to the microphone
array .................................................................................................................... 40
Figure 4.3: Side view of gunshot positions showing the surface curvature of the game farm
............................................................................................................................ 41
Figure 5.1: Example of a 3 channel recording of gunshots fired 1000m away ..................... 43
Figure 5.2: Power spectrum estimate of a Mauser gunshot at 500m away from microphones
............................................................................................................................ 44
Figure 5.3: Power spectrum estimate of a Mauser gunshot at 1000m away from microphones
............................................................................................................................ 44
Figure 5.4: Power spectrum estimate of a Mauser gunshot at 1500m away from microphones
............................................................................................................................ 45
Figure 5.5: Power spectrum estimate of a Mauser gunshot at 1700m away from microphones
............................................................................................................................ 45
Figure 5.6: PSD of audio stream with no recorded gunshot ................................................. 46
Figure 5.7: Power spectrum estimate of a Pistol gunshot at 500m and 1000m away from
microphones ....................................................................................................... 47
Figure 5.8: Template of Pistol gunshot ................................................................................. 48
xiii
Figure 5.9: Template for Mauser Gunshot ............................................................................ 49
Figure 5.10: Cross correlation of pistol template with pistol gunshot fired 500m away ...... 50
Figure 5.11: Cross correlation of Mauser template with Mauser gunshot fired 500m away 51
Figure 5.12: Cross correlation of Pistol template with Mauser and pistol gunshots fired
500m away.......................................................................................................... 52
Figure 5.13 Cross correlation of pistol template with pistol gunshot fired 1000m away ..... 53
Figure 5.14 Cross correlation of Mauser template with Mauser gunshot fired 1000m away
............................................................................................................................ 54
Figure 5.15 Cross correlation of Pistol template with Mauser and pistol gunshots fired
1000m away........................................................................................................ 55
Figure 5.16: Cross correlation of Mauser template with Mauser gunshot fired 1500m away
............................................................................................................................ 56
Figure 5.17: Cross correlation of Pistol template with Mauser gunshot fired 1500m away . 57
Figure 5.18: Cross correlation of Mauser template with Mauser gunshot fired 1700m away
............................................................................................................................ 58
Figure 5.19: LS Mauser waveform template ......................................................................... 59
Figure 5.20: Output of Least Squares algorithm of Mauser gunshot at 500m away ............. 60
Figure 5.21: Output of Least Squares algorithm of Mauser gunshot at 1000m away ........... 60
Figure 5.22: Output of Least Squares algorithm of Mauser gunshot at 1500m away ........... 61
Figure 5.23: Mauser template used in RKHS training sequence, smoothed by interpolation
............................................................................................................................ 62
Figure 5.24: Pistol template used in RKHS training sequence, smoothed by interpolation . 63
Figure 5.25: Mauser gunshot at 500m and output using a 2nd order polynomial RKHS kernel
............................................................................................................................ 64
xiv
Figure 5.26: Pistol gunshots at 500m and output using a 2nd order polynomial RKHS kernel
............................................................................................................................ 65
Figure 5.27: Mauser gunshot at 1000m and output using a 2nd order polynomial RKHS
kernel .................................................................................................................. 66
Figure 5.28: Pistol gunshot at 1000m and output using a 2nd order polynomial RKHS kernel
............................................................................................................................ 67
Figure 5.29: Mauser gunshot at 1500m and output using a 2nd order polynomial RKHS
kernel .................................................................................................................. 68
Figure 5.30: Mauser gunshot at 1700m and output using a 2nd order polynomial RKHS
kernel .................................................................................................................. 69
Figure 5.31: Mauser template used in SVM network training sequence, smoothed by
interpolation ........................................................................................................ 70
Figure 5.32: Pistol template used in SVM network training sequence ................................. 71
Figure 5.33: Pistol and Mauser gunshots at 500m and output of 2nd order polynomial SVM
kernel using a pistol template ............................................................................. 72
Figure 5.34: Pistol and Mauser gunshots at 500m and output of 3rd order polynomial SVM
kernel using a Mauser template .......................................................................... 73
Figure 5.35: Pistol and Mauser gunshots at 1000m and output of 2nd order polynomial SVM
kernel using a pistol template ............................................................................. 74
Figure 5.36: Pistol and Mauser gunshots at 1000m and output of 3rd order polynomial SVM
kernel using a Mauser template .......................................................................... 75
Figure 5.37: Mauser gunshot at 1500m and output of 3rd order polynomial SVM kernel
using a Mauser template ..................................................................................... 76
Figure 5.38: Mauser gunshot at 1700m and output of 3rd order polynomial SVM kernel
using a Mauser template ..................................................................................... 77
xv
Figure 5.39: Comparison of execution time, milliseconds per 5000 samples, between the
different impulse detection algorithms .............................................................. 82
Figure 6.1: Illustration of a habitat protection strategy ......................................................... 88
Figure A.1: Labview program that mixes incoming channels with noise in a good signal-tonoise ratio ........................................................................................................... 94
Figure A.2: Labview program that shows the correlation graphs and angle calculation for a
optimal signal-to-noise ratio ............................................................................... 95
Figure A.3: Shows where the noise and gunshot signal peaks are the same, gunshot impulses
start to get buried in the noise............................................................................. 96
Figure A.4: Correlation peaks start to disappear ................................................................... 97
Figure A.5: Gunshot signal peaks are buried in the noise, noise values are greater than the
impulse values .................................................................................................... 98
Figure A.6: The correlation calculation becomes unstable giving wrong values for the angle
............................................................................................................................ 99
xvi
LIST OF TABLES
Table 2.1: Yearly statistics of rhino poaching in South Africa ............................................... 9
Table 5.1: Comparison of detection algorithms using Pistol templates for shots fired 500m
away ...................................................................................................................... 78
Table 5.2: Comparison of detection algorithms using Mauser templates for shots fired 500m
away ...................................................................................................................... 79
Table 5.3: Comparison of detection algorithms using Pistol templates for shots fired 1000m
away ...................................................................................................................... 79
Table 5.4: Comparison of detection algorithms using Mauser templates for shots fired
1000m away .......................................................................................................... 80
Table 5.5: Comparison of detection algorithms using Mauser templates for shots fired
1500m away .......................................................................................................... 80
Table 5.6: Comparison of detection algorithms using Mauser templates for shots fired
1700m away .......................................................................................................... 81
Table 5.7: Execution time of the detection algorithms in µs/sample .................................... 83
xvii
GLOSSARY
A/D
Analog-to-Digital
ADC
Analog-to-Digital Converter
AGDLS
Acoustic Gunshot Detection and Localisation System
AoA
Angle of Arrival
CITES
Convention on International Trade in Endangered Species
D/A
Digital-to-Analog
dB
Decibel
DMA
Direct Memory Access
DSP
Digital Signal Processor
EEG
Electroencephalography
HMS
Habitat Management System
F’SATI
French South African Institute of Technology
FU
Functional Unit
GCC
Generalised Cross Correlation
Hz
Hertz
k
Kilo
LPF
Low Pass Filter
LS
Least Square
NI
National Instruments
PC
Personal Computer
PCI
Peripheral Component Interconnect
PCM
Pulse Code Modulation
PSD
Power Spectral Density
RKHS
Reproducing Kernel Hilbert Space
xviii
SI
International System of Units
SSE
Sum of Squared Error
SVM
Support Vector Machine
TDE
Time Delay Estimation
USB
Universal Serial Bus
V
Volt
xix
1. INTRODUCTION
South Africa is a country with a massive wildlife industry, both for the preservation of
endangered species and commercial hunting of game. Thus a need arises for a system that
can identify when and where a gun is fired. A gunshot detection and localisation system is
proposed which could be incorporated into a larger system that can help protect and
manage large habitats.
An Acoustic Gunshot Detection and Localisation System (AGDLS) would be able to run
on a Habitat Management System’s (HMS) Radio Frequency (RF) network. The HMS,
which can be designed to protect large habitats, would consist of many communication
nodes which log events and broadcast it to a central computer. An AGDLS would be one
of these event gathering modules.
Gunshot localisation is based on the principle of time delay estimation, from a minimum
of three modules each containing an array of 3 microphones, stationed at different
coordinates. Each module calculates the angle from where the gunshot originated relative
to the position of the module.
Non-gunshot sounds and noise are also present in the signal with the gunshot sound, and
if the shot is too far away, the gunshot impulse can’t be accurately extracted from the
signal. This research attempts to find computational efficient ways to identify and extract
gunshot impulses from signals.
1
Areas of study include Generalised Cross Correlation (GCC), sidelobe minimisation
utilising Least Square (LS) techniques as well as training algorithms using a Reproducing
Kernel Hilbert Space (RKHS) approach. It also incorporates Support Vector Machines
(SVM) to train a network to recognise gunshot impulses. By combining these individual
research areas more optimal solutions are obtainable.
The different algorithms (methods) are compared to one another, where some of the
attributes used for comparison are accuracy (detection) in different signal-to-noise ratios
and the number of instructions for the algorithm to complete (computation time).
The work is both practical and theoretical. Experiments are carried out with the different
methods and the best attributes of the different methods are combined to give the optimal
solution. Sound recording data were gathered with a Labview* data acquisition card. The
data are processed in both the Labview and Matlab† environments.
*
†
National Instruments http://www.ni.com
Mathworks http://www.mathworks.com
2
1.1 PROBLEM STATEMENT
The dwindling numbers of especially Rhinos in South Africa over the past 4 years, and
also other endangered species like the Elephant, has created a need for methods that will
track and perhaps catch poachers and discourage them from killing these animals. Trading
in poached animal body parts has become a multi-billion dollar industry. A gunshot
detection system is needed that is able to protect large natural habitats with high
efficiency and accuracy. Because of the remoteness of some of the areas in the large
habitat, energy efficiency of the system is also a major design consideration.
Acoustic gunshot localisation is based on the principle of time delay estimation or angle
of arrival information. If one can determine the time delays between the gunshot impulses
in the 3 audio channels of a module, then it is possible to triangulate the location of the
gunshot if you have 3 or more modules to calculate the angle. But before the gunshot can
be located, it must be correctly detected first.
Over large distances the recorded gunshot impulse can be so small that it is not
distinguishable in all the noise and reverberant clutter, thus noise cancelling or impulse
detection mechanisms must also be incorporated. Thus the development of the system
consists of two key factors namely gunshot detection and time delay estimation.
1.2 DELIMITATIONS
This study focuses primarily on different gunshot detection algorithms and not
specifically on gunshot localisation.
3
1.3 BENEFITS OF STUDY
The knowledge gained from this study might be used to create effective anti-poaching
strategies, especially in remote and high risk poaching areas of nature reserves, where
energy efficiency and accuracy of a protection system is of major importance. The
complexity (which can be related to hardware implementation) versus the accuracy of a
detection algorithm is therefore an important factor to research.
1.4 CONTRIBUTIONS OF STUDY
This study addresses different methods of detecting a gunshot impulse especially in an
environment where poaching of large animals like elephants and rhinos can occur.
Templates of large and medium calibre guns were created, and used for training a system
to recognise gunshots at different distances and sound environments. Comparisons of
specifically the following methods are done and discussed: generalised cross correlation,
template matching using Least Squares and RKHS with different kernels, as well as
Support Vector Machines. The comparison between the methods includes complexity,
which can be measured in terms of the speed of execution and the accuracy of the
different algorithms.
1.5
FUNCTIONAL BREAKDOWN OF SYSTEM
Figure 1.1 shows a functional block diagram of a gunshot detection and localisation
system parsed into smaller functional units.
4
FU 1
Gunshot
FU 9
Calculate
Coordinates
FU 2
Acoustic Sound
Receiver
FU 8
Gunshot
Triangulation
FU 3
Noise & Low
Pass Filter
FU 7
Time delay
estimator
FU 4
Signal
Amplifier
FU 5
Analog to
Digital
Converter
FU 6
Gunshot
Detector
System Input
DSP Implementation
Figure 1.1: Functional block diagram of the gunshot detector.
FU 1: This unit shows the input to the system, which is the sound of a gunshot.
FU 2: The acoustic sound receiver is the sensor array that should receive the incoming
sounds. Sensitive microphones with a high signal-to-noise ratio should be used to
implement this unit. This unit converts sound wave energy to an electrical signal for
processing.
5
FU 3: The electrical signal received from FU 2 is fed to a Low Pass Filter (LPF), in the 0
kHz – 2.5 kHz range. This also serves as a noise filter, which should be implemented
before amplification of the signal. Otherwise excessive noise will be amplified as well.
FU 4: The small signal received from the filter should be amplified to levels that fall into
the full range of what an ADC (Analog-to-Digital Converter) can quantise.
FU 5: The analog signal should be converted to a digital format for processing by a
Digital Signal Processor (DSP). An ADC can be used for this task.
FU 6: The digital words will then be fed to a DSP for processing of the digitised audio
signal. Because the system will be constantly analysing the incoming sounds, the Direct
Memory Access (DMA) capabilities of a DSP can be utilised for real-time monitoring.
This unit will analyse the incoming sounds to verify whether it is a gunshot sound or not.
FU 7: When verified that the analysed sound is a gunshot by a gunshot detection
algorithm, the time delay between the different sensors will be calculated. Since the
position of all the sensors will be known, the direction from where the gunshot sound
originated can be triangulated.
FU 8: Taking the time delays, the speed of sound through air and positions of the sensors
into consideration, the position of the gunshot sound can be triangulated.
FU 9: When the relative position is calculated the coordinates in latitude and longitude
can be calculated.
6
1.6 DISSERTATION LAYOUT
Background on the need for a system that can protect large natural habitats and also
characteristics, applications and strategies for gunshot detection and localisation are
discussed in chapter 2. A mathematical modelling review is given in chapter 3. It includes
some mathematical background on acoustic gunshot detection and time delay estimation
algorithms. Signal detection is also discussed as well as some template matching and
machine learning algorithms like Reproducing Kernel Hilbert Spaces and Support Vector
Machines.
Chapter 4 discusses the experimental layout and how some preliminary data was obtained.
The method of how the final data was recorded in a real outdoor game farm environment
is also given in chapter 4. This data was used to compare the performance of the different
detection algorithms.
The method of implementation and the output of the different algorithms are given in
chapter 5 with graphs showing the respective outputs relative to the input recordings. Also
comparisons on the performance of the different algorithms are shown in chapter 5.
Chapter 6 gives a conclusion and summarises the results obtained in chapter 5. Also some
recommendations on future research are given. The References chapter lists all the articles
and work of authors that were used in this dissertation.
7
2 LITERATURE REVIEW
Chapter 2 reviews the need for a gunshot detection system and then progresses to gunshot
detection techniques found in current literature.
2.1 THE MULTI-BILLION DOLLAR INDUSTRY OF POACHING
In a society that gets more gun driven every day, a solution is needed that keeps track of
all the shots fired. In the USA, gunshot detection is incorporated in some of the Police‘s
anti-crime strategies already, especially in densely populated areas where crimes with
guns, are reaching staggering figures (Green, et al., 1999).
On the other hand in Africa and especially South Africa, with its wealth of natural
resources, a mechanism is needed that can help to curb the poaching of endangered
species. From 1 January 2000 to 30 April 2002, Zambia’s population of elephants has
decreased by 1000. For the same period of time Kenya and India reported 5953 kg of
illegal ivory seized. This is despite the CITES‡ ban on ivory, which is still smuggled
globally (Roberts, 2002).
The U.S. Department of State estimates that black-market trade in illegal ivory and other
wildlife and wildlife products generates between 10 and 20 billion dollars per year
(Raffensperger, 2008). The number of seizures of more than a ton of ivory increased to 32
between 1998 and 2006, compared to the 17 seizures reported between 1989 and 1997
‡
Convention on International Trade in Endangered Species
8
(Milliken, Burn and Sangalakula, 2007), thus indicating a rise in this lucrative business
trend.
But elephants are not the only animals which population’s numbers have succumbed to
this multi-billion dollar industry. Poachers have killed 448 rhinos in South Africa during
2011, with 252 of them killed at the Kruger National Park (KNP). (WWF South Africa,
2012)
Table 2.1 shows the number of rhinos poached per year in South Africa during the past 6
years. It also shows the alarming rate at which the poaching of rhinos has increased over
the past 6 years (SavingRhinos.org, 2012). In 2012 the number of rhinos that has been
killed illegally, has more than doubled since 2010 (SavingRhinos.org, 2013).
Table 2.1: Yearly statistics of rhino poaching in South Africa
Year
Rhinos killed in R.S.A.
2007
13
2008
83
2009
122
2010
333
2011
448
2012
668
The recent upsurge in rhino poaching has been tied to an increased demand for rhino horn
in Asia, and in particularly Vietnam according to the World Wildlife Fund (WWF). In the
9
Asian market rhino horn carries prestige as a luxury item, a post-partying cleanser, and
has also been flaunted as a cure for cancer. But rhino horn has no proven cancer treating
properties or uses as an aphrodisiac, according to traditional Chinese medicine experts
(WWF South Africa, 2012).
The street market value of powdered Rhino horn in Vietnam and China has driven the
price as high as US $50 000 per kilogram in 2011 (Environment News Service, 2011).
The price per kilogram of rhino horn has since increased to an estimated $65 000 in 2012
(The Register, 2012).
The African Black Rhinoceros remains on the red list (critically endangered) of the
International Union for Conservation of Nature (IUCN). By the end of 2010, there were
only about 4800 left in existence, which is a 97% decline in population since 1960
(IUCN, 2011). The Rhinoceros Sondaicus Annamiticus subspecies of the Javan
Rhinoceros (Vietnam rhino) was declared extinct by the end of October 2011 by the
WWF. The likely cause of death of the last remaining Vietnam rhino was poaching
(WWF, 2011).
2.2 GUNSHOT DETECTION THEORECTICAL OVERVIEW
Gunshot detection systems in current literature and on the commercial and military
market are primarily based on two sensing techniques, acoustic and optical. Gunshot
detection based on the acoustic characteristic (muzzle blast or shock wave) of gunfire uses
microphones (Maher, 2007), while electro-optical or optical detectors are employed to
detect the muzzle flash or the bullet’s path in optical schemes (Zhang et al., 2009).
10
2.2.1
ACOUSTIC SENSING
Systems that use only acoustic techniques for detection of weapon’s discharge, utilise the
muzzle blast and/or the shockwave characteristic of the gunshot. The hot expanding gases
of the explosion in the weapon’s chamber, create a muzzle blast that emerges from the
barrel of the gun. For most fire arms, the sound level is the highest in the direction the
barrel is pointing (Maher, 2007).
The second source of acoustic information, the shock wave, is present when the bullet
travels at a speed higher than the speed of sound. The acoustic shock wave propagates
away from the bullet’s path at the speed of sound and expands as a cone behind the bullet
(Maher, 2007).
Another sound source that can be used to detect the presence of a gunshot according to
Maher (2007) is the mechanical action of a firearm. These include the sounds of the
trigger and hammer mechanism, the positioning of new ammunition by the gun's loading
system and also the ejection of spent cartridges. The mechanical action sounds are
generally much quieter than the muzzle blast or the projectile shock wave, thus the
microphones need to be in a much closer proximity to the gun to pick up these sounds.
Acoustic vibration may also be carried through the ground or other solid surfaces
according to Maher (2007). The sounds of gunshots cause detectable vibratory signals
propagating through the ground many tens of meters from the source. Sound propagation
in rock and soil is generally at least 5 times faster than the speed of sound in air. Maher
(2007) suggests that calculations can be made to correlate surface vibratory motion and
the subsequent airborne sound of arrival.
11
A challenge with acoustically based systems is deconvolving the gunshot from the
reflected sound and the reverberant clutter and also to distinguish between gunfire and
non-gunfire sounds (Maher, 2007).
Pure acoustically based detection systems will react slower than their optical counterparts
because they rely on the propagation of sound waves at approximately 330 m/s (Zhang et
al., 2009). Therefore the sound from a gunshot reaching a sensor 1 km from its origin will
take almost 3 seconds
2.2.2
OPTICAL SENSING
Systems that employ optical or electro-optical techniques for gunfire detection and
localisation, detect either the muzzle flash of a bullet being fired or the heat caused by the
friction of the bullet as it moves through the air, or incorporates both afore mentioned
strategies . These systems necessitate a clear line of sight to the weapon being fired or the
projectile while it is in motion. Muzzle flashes can be defeated by specialized Flash
suppressors (Zhang et al., 2009).
Optical detection systems are used successfully in military environments where response
time is of critical and life threatening importance (Defense Update, 2008). Usually
multiple optical sensors must be used for a 360 degree detection capability. An optimal
system would incorporate both acoustic and optical sensing techniques, which would
enable it to detect and calculate location of gunfire with greater precision (Pauli et al.,
2004).
12
2.3 GUNSHOT DETECTION AND LOCALISATION
APPLICATIONS AND STRATEGIES
Gunshot detection and localisation are mostly used in military environments and also as a
tool to reduce crime in populated areas. It has also found uses in video and film, where
protection of sensitive groups in a community is necessary.
2.3.1
GUNSHOT DETECTION IN THE VIDEO AND FILM INDUSTRY
Pikrakis, Giannakopoulos & Theodoridis (2008) showed that gunshot detection from a
movie’s audio stream could be treated as a maximisation task, where the solution was
obtained by means of dynamic programming and Bayesian Networks (BN). Pikrakis,
Giannakopoulos & Theodoridis (2008) describes a method which seeks the sequence of
segments and divide the respective class labels, by gunshots against all other audio types
in the stream that maximise the product of posterior class label probabilities, given the
segments’ data. By combining soft classification decisions from a set of Bayesian
Network combiners, the required posterior probabilities are estimated.
Pikrakis, Giannakopoulos & Theodoridis (2008) concludes that almost 80% of gunshot
data was correctly detected by this method with a 20% false alarm rate if the measurement
was event-based. Ten percent of gunshots were not detected implementing this method.
Another approach for detecting a gunshot event in video and film by Chen, Abdallah and
Wolf (2006) uses both the audio and visual aspects of a gunshot scene. By separating the
sound and video at a preprocessing stage and then building a gunshot sound model based
on a 4 state continuous Hidden Markov Model (HMM). The gunshot sound model is then
13
trained with different gun type sounds and non-gunshot sounds. A model for human
emotion is also built drawing on audio features ie. speech patterns and video features
incorporating different facial expressions. A Support Vector Machine (SVM) classifier is
trained to determine the emotion of the scene. Lastly a pure visual model is built that is
trained with different human activities.
Chen, Abdallah and Wolf (2006) combine the three models to classify the scene in four
categories namely, gunshot, normal, threatening and wounded victim.
2.3.2
MUZZLE BLAST DETECTION AND LOCALISATION USING A JOINT
TACTICAL RADIO SYSTEM
Gunshot detection using a JTRS (Joint Tactical Radio System) radio is proposed by
Smith, Buscemi, and Xu (2010). In this strategy, each radio acts as a sensor node to
determine and share muzzle blast time of arrival information in order to determine a
shooter’s location. A rake-correlation filter loop is used as a detection algorithm to
accurately pinpoint the arrival of the muzzle blast of a gunshot at a single microphone.
The location algorithm (realised by an extended time-invariant Kalman filter) uses the
position and time of arrival information, gathered from multiple sources to determine the
shooter’s location.
Smith, Buscemi, and Xu (2010) used a combination of a correlation filter and rake
receiver to detect a gunshot. The correlation filter helped to resolve the signal from
uncorrelated surrounding noise, while the rake receiver helped to eliminate much of the
multipath present in the signal.
14
The Kalman filter is very adaptable according to Smith, Buscemi, and Xu (2010), and can
incorporate additional measurements or sensors such as terrain information to eliminate
large vertical errors. It would also be able to incorporate Angle of Arrival (AoA)
information from additional gunshot systems into the network. Their results show that
with only time of arrival information, gunshot location accuracy is dictated by radio
positioning and orientation.
Their research is based on a single microphone setup per radio, that doesn’t use fixed
positions for node placement but rather uses GPS data to calculate AoA information. Thus
in the absence of a fixed microphone array the system is forced to work with less
information. Each sensor alone cannot calculate AoA information and can therefore not
operate autonomously.
The research of Smith, Buscemi, and Xu (2010) demonstrated the ability for a multiple
radio system to identify the location of a gunshot and also the ability of such a system to
incorporate stand alone systems, to improve the accuracy over both systems
independently. It is proposed in their work that the detection algorithm could be
implemented on a SRW (Soldier Radio Waveform) and the localisation algorithm be
implemented on the WNW (Wideband Networking Waveform) of a JTRS.
2.3.3
MUZZLE BLAST AND SHOCKWAVE DETECTION USING LIGHTNING
PROTOCOL
Gunshot detection implementing Lightning Protocol is proposed by Wang (2009). It is
proposed that the muzzle blast is detected by low-end (low power, low computation
complexity) nodes on a wireless sensor network, which in turn wakes up hibernating
15
high-end nodes before the shock wave, generated from the supersonic bullet reaches
them. High-end nodes, located at distances much further away and on both sides of the of
the bullet’s trajectory, must detect the shockwave front and its propagation direction.
These high-end nodes would also be able to “catch” the trailing muzzle blast arrival
before it reaches them.
According to Wang (2009), would it be much more energy efficient for high-end nodes
that are capable of complex processing functions, to only be awake when a gunshot event
occurs. Comparing the BBN Boomerang II tactical anti-sniper system which has a 25W
power consumption when fully turned on, with a MICA§ mote’s power consumption of 27
mW also when fully turned on. When only RF listening is active on the Boomerang
system it will only consume 12 mW (microphone array and localisation modules switched
off). Thus the low-end wireless sensor network is used to detect and localise the muzzle
blast and then wake up the relevant high-end nodes.
2.4 CONCLUSION
This chapter gave a review on the increasing illegal trade of endangered animal parts and
why this might deem a need for a system that can protect large habitats. The chapter also
discussed some of the current methods, applications and strategies for the application of
gunshot detection and localisation. Perhaps by combining the different aforementioned
strategies an optimal solution to protect nature reserves might be obtained.
§
www.polastre.com/papers/hotchips-2004-mote-table.pdf (Accessed 25 July 2012)
16
3 MATHEMATICAL MODELLING REVIEW
This chapter will give a review of the different mathematical methods that are found in
literature that can be used in the implementation of gunshot detection and localisation. A
brief review of systems identification, adaptive filters and time delay estimation are given.
Then the different mathematical methods used for gunshot detection that was used in the
algorithms, to obtain the results in chapter 5, will be discussed.
3.1 SYSTEM IDENTIFICATION
Systems can be described as some input changed or altered to produce a desired output
(Ifeachor & Jervis, 2002). For electronic systems the same principle applies, where an
input can be altered by a function to produce a desired output as shown in Figure 3.1:
X(S)
H(S)
Y(S)
Figure 3.1: Input X(S) changed by function H(S) to produce output Y(S)
H(S) can be defined as the transfer function of the system that changes the input X(S), to
the desired output Y(S). Convolution amongst other things describes how the input
interacts with a system to produce an output (Ifeachor & Jervis, 2002). Thus equation
(3.1) describes the output in terms of multiplication of X(S) with H(S) in the Laplace
domain.
Y (S ) = X (S ) × H (S )
17
(3.1)
Equation (3.1) gives the relation between the input to system, x(t) and its output y(t) (in
the Laplace domain). The term system identification refers to the determination of h(t)
(the impulse response) when it’s unknown (Ifeachor & Jervis, 2002). If the impulse
response and the output of the system are known, then the procedure to obtain the input is
known as deconvolution.
For system identification blind deconvolution can be used. Basic blind deconvolution is
the process of determining the input from the output signal when the impulse response of
the system is unknown, thus making it “blind” (Ifeachor & Jervis, 2002). As shown in
Figure 3.2, the required unknown source signal x(t) is passed through a system of impulse
response h(t) and thus the measurable output would be the convolution of x(t) and h(t).
When little knowledge about an impulse response and the temporal characteristics and
statistic of a source signal is known, blind digital deconvolution can be used to recover
the source signal distorted by a linear system from observations of the system’s response
only. In vector notation, the linear input-output system’s model can be seen in equation
(3.2)
x(t) = h T s(t) + N(t) .
(3.2)
In equation (3.2), s(t) is the input sample vector, i.e. s(t) = [s(t); s(t − 1); s(t − 2);…; s(t −
k +1)]T where k is the number of entries into h. N(t) is a zero-mean additive noise that
originates from many simultaneous sources or effects; it can be measurement errors,
additive external disturbances, measurement errors, sampling and round off errors.
18
3.2 ADAPTIVE FILTERS
Signal detection plays an integral part in digital systems these days. It is the process of
recovering a wanted or specific signal from an array of signals, like noise or other
unwanted signals. Methods of signal detection include adaptive filter techniques and
various pattern matching methods used in conjunction with cancellation functions.
Adaptive filters can be used in a wide range of applications, from echo cancellation in cell
phones to filtering of ocular artefacts from the human EEG in the biomedical engineering
field (Ifeachor & Jervis, 2002). With increasing processing power of digital signal
processing chips, more applications are using adaptive filters.
An adaptive filter is in essence a digital filter with self adjusting coefficients. It adapts,
and changes automatically to the changes in its input signal. It has the capability to learn
from an environment and then to adjust its output accordingly, to converge to a final
solution (Ifeachor & Jervis, 2002).
3.2.1
NOISE CANCELLATION
An adaptive filter consists of two parts, the digital filter with the adjustable filter weights,
and also an adaptive algorithm which is to be used to adjust the filter weights. Both yk and
xk (from Figure 3.2) are applied simultaneously to the adaptive filter (Ifeachor & Jervis,
2002). The signal yk is the recorded signal containing both the gunshot impulse and the
noise.
19
yk = sk + nk
(signal+ noise)
+
xk
(noise)
Digital
Filter
ñk
(noise estimate)
∑
êk = ŝk
(signal estimate)
Adaptive
algorithm
Figure 3.2: Block diagram of an adaptive filter as a noise canceller
The signal xk is a measure of the contaminating signal, in this case the noise ñk, which is
correlated in some way with nk.. The signal xk is fed into the digital filter to produce an
estimate of the noise which can be subtracted from the signal corrupted with noise
As long as the input noise xk, remains correlated to the unwanted noise accompanying the
desired signal yk, the adaptive filter adjusts its coefficients to reduce the value of the
difference between nk and ñk, thus removing the noise and resulting in a cleaner signal in
êk. In this application, the error signal converges to the input data signal, rather than
converging to zero (Matlab, 2002).
3.3 TIME DELAY ESTIMATION AND IMPULSE DETECTION
USING GENERALISED CORRELATION
Time Delay Estimation (TDE) is a research field that has been in existence for a long
time. It is usually one of the principles that are used in radar, to detect objects at large
20
distances away from the radar. In this configuration, time delay estimation is used to
determine how far the transmitted pulse is removed from the received pulse (from the
reflection of the object) in time. Thus knowing the properties of the signal used and the
medium it is sent through, one can determine the distance that the object is from the radar
array.
3.3.1
TDE USING GENERALISED CORRELATION
The same principle can be applied to acoustic shot detection and localisation. To
determine at what angle the gun was fired, relative to the position of the microphone
array, the time delay or time difference between the gunshot impulses in the microphone
channels must be calculated. Conventional time delay estimation uses generalised cross
and auto correlation (Hertz, 1986).
Cross correlation in signal processing defines the degree of interdependence or similarity
between two signals (Ifeachor & Jervis, 2002). Whereas in this application, the auto
correlation maxima peak of the signal defines where the signal is in time, relative to the
cross correlated maxima peak of the signal with which it is correlated. The maxima of the
correlation function indicates where the two signals correlate the most.
If you have three signals, i.e. x1[k], x2[k] and x3[k], then the cross correlation of the first
two signals can be obtained by equation (3.3) (Ifeachor & Jervis, 2002),
r12 =
1
N
N −1
∑ x [k ]x [k + τ ] ,
1
2
(3.3)
k =0
where N is the number of samples in the current data window. The cross correlation for
the last two signals and the first and last of the signals can be obtained the same way, by
21
replacing x1[k] and x2[k] with x2[k] and x3[k] for the second correlation, and with x1[k] and
x3[k] respectively for the third correlation in equation (3.3). The next step will be to do an
autocorrelation on all three signals, where the auto correlation has the following equation,
taking x1[k] as an example:
r11 =
1
N
N −1
∑ x [k ]x [k + τ ] .
1
1
(3.4)
k =0
The same must be calculated for x2[k] and x3[k] with equation (3.4) (Ifeachor & Jervis,
2002). Thus when one has all six maxima from the correlations, the delays between the
three signals can be obtained by subtracting the sample indices of the maxima. Because of
the digital domain, the delay will be measured in samples and thus knowing the sample
rate, the delay in milliseconds can be obtained. Let sd be the sample delay, and let k11max
and k12max be the time instants where r11[k] and r12[k] respectively have the maxima, then:
s d = k11 max − k12 max .
(3.5)
The time delay can then be calculated as follows:
Td =
sd
,
Fs
(3.6)
where Fs is the sampling frequency of the signal.
3.3.2
PULSE DETECTION USING GENERALISED CROSS CORRELATION
Generalised Cross Correlation (GCC) can also be used to detect a specific or wanted pulse
from a signal. The first step would be to create a template of the pulse that you want to
22
detect from the signal (Ifeachor & Jervis, 2002). Then using equation (3.3) where x1[k] is
the template pulse and x2[k] is the signal, the maxima of r12[k] will be where the template
pulse and the signal correlates the most. The position of the maxima of r12[k] would then
be where the wanted pulse is located in the signal.
3.4 LEAST SQUARES
The term Least Squares (LS) describes an approach to solving over-determined or
inexactly specified systems of equations in an approximate sense. Instead of solving the
equations exactly, we seek only to minimise the sum of the squares of the residuals. A
residual is the difference between an observed value and the fitted value provided by a
model (Moler, 2008).
In radar environments, incorporating pulse compression, when the pulse is received back,
a matched filter is employed to maximise the signal-to-noise ratio. The waveforms which
are transmitted are chosen to have an auto-correlation function with a narrow peak at zero
time shift and values as low as possible at other at all other times. These low values are
called sidelobes. These sidelobes have the undesirable effect of masking smaller objects
if it is in the same proximity as larger objects (Cilliers and Smit, 2007).
A technique of sidelobe mimimisation will be extended to gunshot detection where sound
from the microphones will be matched with pre-existing gunshot sound templates using a
least squares algorithm proposed by Cilliers and Smit (2007) to detect a valid gunshot.
23
3.4.1
LEAST SQUARES SIDELOBE MINIMISATION
The output b of the detection filter, which is the convolution sequence of the gunshot
sound from the microphones and the detection filter can be written in matrix form as
b = AFx ,
(3.7)
where
b = [b1 b2 ... b2 N −1 ] T ,
(3.8)
x = [x1 x 2 ... x N ] T ,
(3.9)
and
 a1
0

AF =  M

0
 0
K
a2
aN
K
0
aN
0
0
K
K
O O
O
O
O
O
L
L
a1
0
a2
a1
L
a2
aN
L
a2
a1
0
0
In equation (3.10), A F is the full convolution matrix, and
T
T
0
0 
M  .

0
a N 
(3.10)
denotes the transpose of the
vector or matrix. The formulation in equations (3.7) to (3.10) leads to the following
expression for the sum-of-squares of the convolution sequence (Cilliers and Smit, 2007)
b H b = || b1 || 2 + || b2 || 2 + L + || b2 N −1 || 2 .
(3.11)
The complex conjugate transpose is denoted by H. The gunshot pulse’s sidelobe measure
cost function, according to Cilliers and Smit (2007), can now be formulated by defining a
new matrix, A , similar to AF , except that the rows in AF which produce the gunshot peak
are removed. The sidelobe measure cost function to be minimised can therefore be written
as
24
f (x ) = b H b = b H A H Ax
= x H Cx
(3.12)
where
C = AH A .
(3.13)
Using the method of Lagrange multipliers, a solution for x can be found that will
minimise the sidelobe measure cost function while satisfying the constraint that a gunshot
pulse peak with amplitude bpeak must be produced (Cilliers and Smit, 2007). This
constraint can be written as
ax = b peak ,
(3.14)
a = [a N a N −1 ... a1 ] .
(3.15)
g (x ) = ax − b peak
(3.16)
where
This leads to the constraint function
No symmetry constraints are placed on the filter response and also no constraint is placed
on the samples adjacent to the peak sample. The samples adjacent to the peak allow the
optimisation processes to force energy from the sidelobes into these two samples.
According to Cilliers and Smit (2007), this allows the optimisation algorithm the freedom
to widen the gunshot pulse peak to achieve a lower and flatter sidelobe response. A
system of simultaneous equations arises from the complex Lagrangian
d
d
( f (x)) +
(Re{λg (x)}) = 0 ,
dx
dx
25
(3.17)
and the constraint given in equation (3.16). This extended system of simultaneous
equations can now be solved to obtain the value of x that minimises the sidelobe measure.
The closed form solution for x is then given by equation (3.18):
x=
b peak C -1a H
aC -1a H
.
(3.18)
This solution for x produces a mismatched receive filter for the gunshot template pulse
{an} that minimises the gunshot pulse sidelobes in the least-squares sense (Cilliers and
Smit, 2007).
3.5 REPRODUCING KERNEL HILBERT SPACES
In recent years kernel-based algorithms have become the state-of-the-art methods for
many machine learning problems. The common feature of these methods is that they are
based on an optimisation problem over a Reproducing Kernel Hilbert Space (RKHS)
(Steinwart, Hush and Scovel, 2006).
Let X be a non-empty set. Then a function k : X × X → K is called a kernel on X if there
exists a K-Hilbert space H and a map Φ : X → H such that for all x, x ′ ∈ X we have
k ( x, x ′) = Φ(x ′), Φ(x) .
(3.19)
Φ is a feature map and H is a feature space of K. Now let H be a Hilbert function space
over X, thus a Hilbert space which consists of a function mapping from X into K. The
26
space H is called an RKHS over X if for all x ∈ X , the Dirac functional δ x : H → K
defined by δ x ( f ) := f ( x), f ∈ H is continuous (Steinwart, Hush & Scovel, 2006).
Also function k : X × X is called a reproducing kernel of H if we have k (., x) ∈ H for all
x ∈ X and the reproducing property (where
denotes the inner product),
f ( x) = f , k (., x) ,
(3.20)
holds for all f ∈ H and all x ∈ X .
3.5.1
NON-LINEAR TEMPLATE MATCHING FRAMEWORK
Van Wyk, van Wyk, and Noel (2004) proposes that if an input-ouput signal containing a
~
gunshot F , that has to be identified belongs to an RKHS Hn and provided that we are
given a set of test input-output pairs
{(x i ∈ ℜ N , y i )}im=1
(3.21)
where xi, i = 1,…,m, are linearly independent elements of ℜ N , the problem has a unique
minimum norm solution expressed by
m
~
F ( x) = ∑ C i K ( x i , x) ,
i =1
27
(3.22)
where K (x i ,⋅) is a reproducing kernel of Hilbert space Hn. The coefficients Ci in equation
(3.22) are given by equation (3.23)
C = G −1 y
(3.23)
where
C = (C1 , ... , C m ) T ,
y = ( y1 , ... , y m )
T
(3.24)
and the Gram matrix G, is given by
G = (Gi j )
(3.25)
where
Gi j = K (x i , x j ),
3.5.2
i, j = 1,..., m.
(3.26)
TEST INPUT-OUTPUT PAIRS
For the Desirable Gunshot Templates (DGT) (valid gunshot sounds) the yi in equation
(3.21) are usually chosen equal to some positive value, for instance γ . The rest of the yi
values for the Undesirable Gunshot Templates (UGT) (not gunshot sounds) would be set
to α , where α would be normally 0 (Van Wyk, van Wyk, and Noel, 2004).
3.5.3
REPRODUCING KERNEL TYPES
According to Van Wyk, van Wyk, and Noel (2004) kernel types could include the
following kernels; the linear kernel,
K ( x, z ) = x T z ,
28
(3.27)
the polynomial kernel,
K (x, z ) = (1 + x T z ) d ,
3.5.4
d ≥ 1.
(3.28)
MINIMUM NORM TEMPLATES
A Minimum Norm Template (MNT) can be inferred once the interpolation coefficients
are obtained. According to Van Wyk, van Wyk, and Noel (2004), if a linear kernel is used
the MNT has the form
m
~
x = ∑ Ci x i ,
(3.29)
i =1
and that K (~
x ,⋅) will satisfy
γ for i = 1,..., k
K (~
x , x) = 
.
α for i > k
(3.30)
For a second order polynomial kernel it can be shown that
m
~
x = ∑ Ci ~
xi ,
(3.31)
i =1
where ~
x i = [[1 x Ti ] ⊗ [1 x Ti ]]T and ⊗ denotes the Kronecker Tensor Product and it is also
similar to the linear kernel,
γ for i = 1,..., k
K (~
xT , ~
xi ) = 
.
α for i > k
29
(3.32)
Thus for the matching process the MNT should be first calculated offline using a gunshot
training template. The interpolation coefficients to construct the MNT are obtained by
inverting the Gram matrix. Once the RKHS network is trained based on the training data,
the template ~
x in equation (3.31) is used to calculate the ith output of the RKHS network
as ~
xT ~
x i (Van Wyk, van Wyk, and Noel, 2004).
3.6 SUPPORT VECTOR MACHINES
The Support Vector Machine (SVM) is a powerful methodology for solving problems in
nonlinear classification, density estimation and function estimation which has also led to
many other developments in kernel based learning methods. SVMs have been introduced
within the context of structural risk minimisation and statistical learning theory. The SVM
methodology solves convex optimisation problems, usually with quadratic programming.
Least Squares Support Vector Machines (LS-SVMs) are reformulations to standard SVMs
which lead to solving linear Karush-Kuhn-Tucker (KKT) systems. LS-SVMs are closely
related to Gaussian processes and regularisation networks but also accentuate and exploit
primal-dual interpretations (De Brabanter et al., 2011).
Originally first developed for binary classification problems, the key concept of an SVM
is the use of hyperplanes to define decision boundaries, which separates the data points of
different classes. SVMs can be used for linear classification tasks, and also for more
complex nonlinear classification problems. The concept behind SVMs is to map the
original data points from the input space to a higher dimensional or infinite dimensional
feature space, in such a way that the classification problem becomes simpler in the feature
space (Lutsa, et al., 2010).
30
3.6.1
LINEAR SUPPORT VECTOR MACHINES
Figure 3.3 shows a maximum-margin hyperplane (H3) and margins (two dotted lines, H1
and H2 parallel to H3) for an SVM trained with samples from two classes (Burges, 1998).
This example is a linearly separable case. Samples on the margin are called the support
vectors.
Margin
x2
w
−b
|| w ||
H2
H3
H1
x1
Figure 3.3: Maximum-margin hyperplane and margins for an SVM
The support vectors are circled in Figure 3.3. Given a set of training data points of the
form,
{x i , y i },1 ≤ i ≤ l , y i ∈ {−1,1}, x i ∈ ℜ n ,
31
(3.33)
the yi (where yi is either 1 or −1) indicates the class to which x i belongs. The maximummargin hyperplane divides the points having yi = 1 from those having yi = − 1 (Burges,
1998). The points x which lie on the hyperplane satisfy
w ⋅x + b = 0.
The vector w is a normal vector perpendicular to the hyperplane, where
(3.34)
|b|
determines
|| w ||
the offset of the hyperplane from the origin along w (Burges, 1998). We want to choose w
and b to maximise the distance between the parallel hyperplanes, while still separating the
data. These hyperplanes can be described by the equations
w ⋅ x i + b ≥ 1 for y i = 1 ,
(3.35)
w ⋅ x i + b ≤ −1 for yi = −1 .
(3.36)
These can then be combined into one set of inequalities
yi (w ⋅ x i + b) − 1 ≥ 0 ∀i .
(3.37)
If the training data is linearly separable, like in the previously mentioned case, then we
can find the pair of hyperplanes which gives the maximum margin by minimising
|| w || 2 subject to constraints of equation (3.37) (Burges, 1998). The distance between the
hyperplanes is then
2
.
|| w ||
Thus the primal form becomes,
32
(3.38)
minimise (in w, b) ,
1
|| w || 2 subject to yi (w ⋅ x i + b) − 1 ≥ 0 ∀i .
2
(3.39)
The classification rule can be written in its unconstrained dual form which shows that the
maximum margin hyperplane is only a function of the support vectors. Thus, we introduce
positive Lagrange multipliers, α i , 1 ≤ i ≤ n , one for each of the inequality constraints in
equation (3.37) (Burges, 1998). The dual form of the SVM can be shown to be the
following optimisation problem:
n
maximise (in α i ),
∑α
i =1
i
−
1
∑α iα j yi y j xi ⋅ x j
2 i, j
(3.40)
n
subject to, α ≥ 0 , ∑ α i y i = 0 .
(3.41)
i =1
3.6.2
NON-LINEAR SUPPORT VECTOR MACHINES
SVMs can be extended to non-linear cases using kernel spaces, by replacing the dot
product by a non-linear kernel function. This allows the algorithm to fit the maximummargin hyperplane in the transformed feature space. Figure 3.4 illustrates the mapping
from input space to higher dimensional feature space (Lutsa, et al., 2010).
33
Figure 3.4: Mapping of non-linear input space to higher dimensional feature space
According to Lutsa, et al. (2010) non-linear SVM classifiers take the from
 # SV

f ( x) = sign  ∑ α i y i K (x, x i )  + b,
 i =1

(3.42)
where #SV represents the number of support vectors and K(·, ·) is the kernel function.
According to Burges (1998) and Lutsa, et al. (2010) some kernels functions include
linear: K (x, z ) = x T z ,
(3.43)
polynomial (of degree d): K (x, z ) = (1 + x T z ) d ,
d ≥1 ,
2
and a radial basis function: K (x, z ) = exp(−γ x − z ) for γ ≥ 0 .
34
(3.44)
(3.45)
3.6.3
LEAST SQUARE SUPPORT VECTOR MACHINES
Least Squares SVMs simplify the formulation by replacing the inequality constraint in
SVMs with an equality constraint. Suykens and Vandewalle (1999) proposed to modify
the SVM methodology by introducing a least squares loss function and equality instead of
inequality constraints. The LS-SVM solution is obtained from a set of linear equations,
rather than solving a quadratic programming problem. The LS-SVM methodology
significantly reduces the computational effort and complexity. The LS-SVM classifier
optimizes the following problem (Lutsa, et al., 2010)
minimise (in w,e, b) ,
1 T
1
w w + γ ∑ e i2 ,
2
2 i, j
(3.46)
subject to
y i ( w T ϕ ( x i ) + b) = 1 − e i
i = 1,..., N ,
(3.47)
where e = [e1 e2 ... e N ] T is a vector of error variables to tolerate misclassifications, and
φ(·) : ℜ n → ℜ
nh
is a mapping from the input space into a high-dimensional feature space
of dimension nh. The vector w is of the same dimension as φ and γ is a positive
regularisation constant and b is a bias term. The primal problem is expressed in terms of
the feature map and the dual problem in terms of the kernel function. The resulting
classifier in the dual space is similar to the standard SVM classifier according to Lutsa, et
al. (2010) and is given by
 N

f ( x) = sign  ∑ α i y i K (x, x i )  + b,
 i =1

35
(3.48)
where K is the kernel matrix with K(x,xi) = φ(x)Tφ(xi) and αi is the Lagrange multipliers.
The errors of the corresponding training data points are proportional to the support values
αi. This implies usually that every training data point is a support vector and no sparseness
property remains in the LS-SVM formulation. According to Lutsa, et al., (2010) high
support values indicate a high contribution of the training data point on the decision
boundary.
3.7 CONCLUSION
Chapter 3 discussed the mathematical modelling found in literature that can be used for
the implementation of a gunshot detection and localisation system. It also gave a review
on the different detection algorithms (GCC, LS, RKHS and SVM) that was used in this
project.
36
4 EXPERIMENTAL SETUP
Two sessions of gunshot sound recordings were undertaken. The first session was a
general gunshot sound recording at Swartkops military shooting range in Pretoria to
obtain preliminary gunshot data. The second session of gun sound recordings (real
environmental data) was obtained on a game farm up in the northern region of South
Africa near Mussina.
4.1 SOUND RECORDING EQUIPMENT SETUP
The 3 microphones were placed in a star topology at a 120 degree angle from one another.
The cables connected from the microphones to the amplifier were 2 meters in length.
Figure 4.1 shows the setup of the recording equipment used in both of the recording
sessions.
N
Mic1
Gunshot
Wave front
120 °
120 °
Amp
120 °
Mic3
Mic2
ADC + PC
Figure 4.1: Setup of recording equipment
37
The amplified signals were recorded with PCI A/D cards form National Instruments (NI)
at different sampling frequencies, varying from 2 kHz to 40 kHz. The predominant
sampling frequencies were 10 kHz and 20 kHz.
4.2 LABVIEW EXPERIMENTAL PREPARATION
Figures A.1 to A.6 in Appendix A show screenshots of a Labview program, that
implements a general cross correlation method for signal detection and direction finding.
The data that is used was obtained with the first gunshot sound recording session
(preliminary data). The program mixes uniform white noise with a recorded signal before
it reaches the processing stage which calculates the angle of the gunshot. The additional
noise was added to experiment with the GCC (Generalised Cross Correlation) algorithm’s
detection accuracy in low signal-to-noise ratio conditions.
Figure A.1 shows input signals with a 1.45V peak value mixed with a 0.10 V peak-topeak generated noise signal.
As seen in Figure A.2 (from the correlation graphs) the generalised correlation
calculations are performed which gives the time delay estimate and calculate the angles
from the gunshot impulse. The maxima of the correlation graphs are clearly visible.
Figure A.3 shows the boundary where the noise and the peak values of the signal are the
same and the signal starts to get buried in the noise. The peak values for the signal is
0.10V and the noise is 0.10V peak-to- peak.
38
As can be seen in Figure A.4, the calculated angles stay the same as in the previous
example, but the correlation graphs starts to falter and the maxima are not as clearly
visible.
Figure A.5 shows the gunshot impulses buried in the noise. The maximum value of the
impulses is 0.04V and that of the noise is 0.10 V
Because the signal is now buried in the noise, it is visible from the correlation graphs
(Figure A.6) that the maxima from the correlation calculations disappear and the angle
values become erroneous.
4.3 REAL ENVIRONMENT DATA GATHERING
This section describes how the gunshot data was gathered on a game farm close to
Mussina in South Africa. A game farm was chosen because the implementation of this
project will be executed in an environment similar to the game farm.
The same setup as described in Figure 4.1 was used for the recording of the gunshot data.
Gunshots were fired and recorded at different distances from the microphone array. The
distances between the gunfire and the microphone array were measured using a GPS
(Global Positioning System) navigation device from Garmin**.
**
http://www.garmin.com [accessed 1 November 2008]
39
D
1700m
C
1500m
B
1000m
A
500m
Microphone Array
Position of Gunfire
Figure 4.2: Top view of different positions of the gunshots fired relative to the microphone array
Figure 4.2 shows the different positions of the gunshots fired relative to the microphone
array, where position A is 500m and position D is 1700m away from the microphone
array. Gunshots from a Mauser rifle (large calibre) and a Pistol (medium calibre) were
mainly used in the recordings.
40
A
B
D
C
Game farm surface
Microphone Array
A
Gunshots at 500m
B
Gunshots at 1000m
C
Gunshots at 1500m
D
Gunshots at 1700m
Figure 4.3: Side view of gunshot positions showing the surface curvature of the game farm
Figure 4.3 shows the side view of the gunshot positions on the game farm in Mussina.
Position C of the gunshots which are 1500m away from the microphone array, is in a
surface dip. Thus the direct path of the gunshot sound to the microphone array is
obscured.
4.4 CONCLUSION
Chapter 4 gave an overview on the sound recording equipment setup used for the
recordings and also on the experimental preparation that was done in Labview. Then the
chapter discussed how the data for this project was recorded, at different distances with 2
types of guns, on a game farm close to Mussina in South Africa.
41
5 RESULTS
This chapter will discuss the results obtained from the various signal detection algorithms.
The chapter starts out by giving a brief description on how the data that was used in this
chapter was obtained. Then power spectrum estimates of the Mauser and Pistol gunshots
recorded at different distances are shown. The chapter then continues by showing and
discussing the results obtained from the GCC method, followed by the trainable template
matching section. It is then followed by a brief tabled summary, comparing the accuracy
of the 4 algorithms (GCC, LS, RKHS and SVM). Lastly the execution time of each
algorithm is estimated and compared in a table.
5.1 OVERVIEW OF DATA RECORDING AND PRE-PROCESSING
PROCEDURES
The data used in this chapter was obtained by the following method:
•
Gunshots at various distances were recorded with microphones connected to a 3
channel amplifier through a LPF.
•
The amplified signals were then digitised with the NI PCI cards, primarily at
sampling rates of 10 kHz and 20 kHz.
•
The samples were saved to the hard drive of a PC in PCM format.
•
Characteristics of each recorded gunshot event were saved to a spreadsheet file.
•
The recordings were imported into Matlab and saved as a 3 dimensional array (1
dimension for each channel) in the Matlab workspace.
•
The different recordings were resampled to 5 kHz to increase the speed of
execution of the different detection algorithms. Most of the energy of a gunshot
42
that is detectable at large distances away from the microphones resides below 2
kHz (the reader is referred to sections 5.2.1 and 5.2.2)
•
All the recordings (test signals) and templates (used for training) were all
normalised to a maximum value of 1. Thus there are no explicit SI units stated on
the y-axis of the graphs of sections 5.3 to 5.4. The y-axis units are therefore in perunit values.
Figure 5.1 shows an example of a 3 channel recording (from the game farm in Mussina)
of a gunshot fired 1000m away from the recording equipment, imported and plotted in
Matlab. This recording was at a sampling rate of 20 kHz.
Figure 5.1: Example of a 3 channel recording of gunshots fired 1000m away
5.2 FREQUENCY SPECTRUM ANALYSIS
The following two sections show the frequency response of the microphones and lowpass filter used for the gunshot recordings.
43
5.2.1
MAUSER POWER SPECTRUM ESTIMATE
Figure 5.2 to Figure 5.5 illustrate the frequency spectra of the Mauser gunshots. The
further away from the recording equipment the guns are fired, the less power there is in
the higher frequencies. There is a estimated 5dB drop per 1000 Hz for every 500m the
Mauser is fired further away.
Figure 5.2: Power spectrum estimate of a Mauser gunshot at 500m away from microphones
Figure 5.3: Power spectrum estimate of a Mauser gunshot at 1000m away from microphones
44
Figure 5.4: Power spectrum estimate of a Mauser gunshot at 1500m away from microphones
Figure 5.5: Power spectrum estimate of a Mauser gunshot at 1700m away from microphones
Looking at Figures 5.2 to 5.5 of the power spectrum density (PSD) estimate, there is an
increase in power in the 7 kHz to 8 kHz band.
45
Figure 5.6 shows a power spectrum estimate of an audio stream without any gunshot
impulse in the signal.
Figure 5.6: PSD of audio stream with no recorded gunshot
Thus the increase in power in the 7 kHz to 8 kHz band is not from the gunshot impulse,
but might rather be a characteristic of the microphone used.
5.2.2
PISTOL POWER SPECTRUM ESTIMATE
Figure 5.7 shows the frequency spectra of the Pistol gunshots. The further away from the
recording equipment the gun is fired, the less power there is in the higher frequencies. In
the case of the Pistol PSD, it is an estimated 8 dB to 10 dB drop per 1000 Hz for every
500m the Pistol is fired further away.
46
Figure 5.7: Power spectrum estimate of a Pistol gunshot at 500m and 1000m away from microphones
5.2.3
POWER SPECTRUM ESTIMATE CONCLUSION
Following from the analysis of the power spectrum estimate, can it be concluded that all
of the power of the recorded signals resides in the lower part of the spectrum between 0
kHz and 3 kHz. This result is expected since the low pass filter that was used was
designed to only let through frequencies up to 2.5 kHz. Thus the lowest sampling
frequency that can be used is 5 kHz without losing any information in the signals or
recordings. This is useful to make complex impulse detection algorithms faster, because
now it can use fewer samples to calculate a result.
47
5.3 GENERAL CROSS CORRELATION
The results from the GCC algorithm will be discussed in the following section.
5.3.1
TEMPLATE GENERATION
Pre-constructed templates of gunshots are needed to identify gunshot impulses in the
audio stream of the recording microphones. The gunshot data used in the following
sections were recorded in the second recording session on the game farm. The preconstructed templates in Figure 5.8 and Figure 5.9 were constructed by taking recorded
gunshot waveforms, at the same distances from the fired gun, then using visual and cross
correlating overlap-and-averaging methods. The template constructed for the pistol
consists of samples taken from pistol gunshots fired 1000m away from the microphone
array. Lastly the templates were normalised to a maximum value of 1. Figure 5.8 shows
the template created for the Pistol gunshot.
Figure 5.8: Template of Pistol gunshot
48
Figure 5.9 shows the template created for the Mauser gunshot. The template constructed
for the Mauser consists of samples taken from Mauser gunshots fired 1500m away from
the microphone array.
Figure 5.9: Template for Mauser Gunshot
5.3.2
ANALYSIS
This section shows the output of the cross correlation algorithm at various distances. The
templates shown in Figure 5.8 and Figure 5.9 are used to cross correlate the recorded
signals with. All the sound recordings used for the cross correlation algorithm were
normalised to a maximum value of 1 (by dividing with the maximum) before applying the
algorithm.
49
5.3.2.1 Cross Correlation of Pistol sound 500m away from array
Figure 5.10 (a) shows a recording of two pistol gunshots and speech waveforms. Figure
5.10 (b) shows the output of the cross correlation algorithm. This output was obtained by
cross correlating the recording with the pistol template shown in Figure 5.8, and then
squaring the outcome.
As can be seen from Figure 5.10 (b), two maximum peaks are obtained in the output as
shown at the arrow points in Figure 5.10. These peaks in the output are in the same
position as the pistol gunshots, thus positively identifying the gunshots and their positions
in the recording.
2 Pistol Gunshots
(a)
GCC maxima
(b)
Figure 5.10: Cross correlation of pistol template with pistol gunshot fired 500m away
50
5.3.2.2 Cross Correlation of Mauser sound 500m away from array
A recording of a Mauser gunshot mixed with speech waveforms are shown in Figure 5.11
(a). Figure 5.11 (b) shows the output of the cross correlation algorithm. This output was
obtained by cross correlating the recording with the Mauser template shown in Figure 5.9,
and then squaring the outcome.
As can be seen in Figure 5.11 (b), a maximum peak is obtained in the output. This peak in
the output is at the same position as the Mauser gunshot, thus positively identifying the
gunshot and its position in the recording.
Mauser gunshot
(a)
(b)
Figure 5.11: Cross correlation of Mauser template with Mauser gunshot fired 500m away
51
Figure 5.12 (a) shows a recording of a Mauser and Pistol gunshots with speech
waveforms. The output of the cross correlation algorithm can be seen in Figure 5.12 (b).
The output was generated by cross correlating the recording with the pistol template
shown in Figure 5.8, and then squaring the outcome.
Figure 5.12 (b) reveals maximum peaks in the output at the same positions as the
gunshots’ waveform positions. This identifies the gunshots and their positions in the
recording. This means that the Pistol template from Figure 5.8 extracts both the Mauser
and pistol waveforms from the recorded signal at a distance of 500m.
2 Pistol Gunshots
Mauser Gunshot
(a)
(b)
Figure 5.12: Cross correlation of Pistol template with Mauser and pistol gunshots fired 500m away
52
5.3.2.3
Cross Correlation of Pistol sound 1000m away from array
A pistol waveform from a gunshot fired 1000m away from the microphone array and
speech sound waveforms (recorded close to the microphones) are shown in Figure 5.13
(a). The output of the cross correlation algorithm is shown Figure 5.13 (b). The output
was produced by cross correlating the recording with the pistol template shown in Figure
5.8, and then squaring the outcome.
Figure 5.13 (b) shows that the speech waveform is sufficiently suppressed relative to the
enhanced peak of the gunshot waveform in the output. This peak in the output is in the
same position as the pistol gunshot, thus identifying the gunshot and its position in the
recording.
Pistol Gunshot
(a)
(b)
Figure 5.13 Cross correlation of pistol template with pistol gunshot fired 1000m away
53
5.3.2.4 Cross Correlation of Mauser sound 1000m away from array
A recording of a Mauser gunshot fired 1000m away from the microphone array can be
seen in Figure 5.14 (a). Recorded speech waveforms are also seen in Figure 5.14 (a).
Figure 5.14 (b) shows the output of the cross correlation algorithm. This output was
obtained by cross correlating the recording with the Mauser template shown in Figure 5.9,
and then squaring the outcome.
The output reveals a maximum peak as shown in Figure 5.14 (b). This peak in the output
is in the same position as the Mauser gunshot, thus positively identifying the gunshot and
its position in the recording.
Mauser Gunshot
(a)
(b)
Figure 5.14 Cross correlation of Mauser template with Mauser gunshot fired 1000m away
54
A recording of a Mauser gunshot (at 1000m) with speech waveforms can be seen in the
Figure 5.15 (a). The output shown in Figure 5.15 (b) was obtained by cross correlating the
recording with the pistol template shown in Figure 5.8, and then squaring the outcome.
As can be seen from Figure 5.15 (b), a maximum peak is obtained in the output. This peak
in the output is at the same position as the Mauser gunshot, thus positively identifying the
gunshot and its position in the recording. It shows that the Pistol template from Figure 5.8
extracts both the Mauser and pistol waveforms from the recorded signal at a distance of
1000m.
Mauser Gunshot
(a)
(b)
Figure 5.15 Cross correlation of Pistol template with Mauser and pistol gunshots fired 1000m away
55
5.3.2.5 Cross Correlation of Mauser sound 1500m away from array
Figure 5.16 (a) shows a waveform recording of a Mauser gunshot fired 1500m away from
the microphone array, with speech sound and car engine sound waveforms recorded near
to the microphones. Figure 5.16 (a) shows that the speech and noise is almost
indistinguishable from the Mauser gunshot waveform. Figure 5.16 (b) shows the output of
the cross correlation algorithm. This output was obtained by cross correlating the
recording with the Mauser template shown in Figure 5.9, and then squaring the outcome.
As can be seen Figure 5.16 (b), a maximum peak is obtained in the output. This peak in
the output is in the same position as the Mauser gunshot, thus positively identifying the
gunshot and its position in the recording
(a)
Mauser Gunshot
(b)
Figure 5.16: Cross correlation of Mauser template with Mauser gunshot fired 1500m away
56
Figure 5.17 (a) shows a recording of a Mauser gunshot fired 1500m away from the
microphone array with speech sounds close to the microphone, while Figure 5.17 (b)
shows the output of the cross correlation algorithm. This output was obtained by cross
correlating the recording with the pistol template shown in Figure 5.8, and then squaring
the outcome.
Figure 5.17 (b) shows that a maximum peak is obtained in the output, but that this peak
does not coincide with the position of the Mauser gunshot’s waveform. It shows that the
pistol template from Figure 5.8 does not extract the Mauser waveform from the recorded
signal at a distance of 1500m.
Thus the pistol template from Figure 5.8 is better suited to extract gunshot waveforms of
more types of guns but at shorter distances.
(a)
Mauser Gunshot
(b)
Figure 5.17: Cross correlation of Pistol template with Mauser gunshot fired 1500m away
57
5.3.2.6 Cross Correlation of Mauser sound 1700m away from array
Figure 5.18 (a) shows a recording of a Mauser gunshot fired 1700m away from the
microphone array with speech sounds recorded close to the microphones. The output of
the cross correlation algorithm is shown in Figure 5.18 (b). This output was obtained by
cross correlating the recording with the Mauser template shown in Figure 5.9, and then
squaring the outcome.
As can be seen from Figure 5.18 (b), a maximum peak is obtained in the output. This peak
in the output is at the same position as the Mauser gunshot, thus positively identifying the
gunshot and its position in the recording.
Mauser Gunshot
(a)
(b)
Figure 5.18: Cross correlation of Mauser template with Mauser gunshot fired 1700m away
58
5.4 TRAINABLE TEMPLATE MATCHING ALGORITHMS
The following algorithms (Least Squares, RKHS and SVM) discussed in this section need
sets of data that can be used to train networks to recognise a pattern in the data stream.
Gunshot waveform templates with different sample sizes were experimented with. It
proved that if there are more samples in the training set, the longer it takes to obtain an
answer for the various algorithms. Figure 5.19 shows the Mauser template waveform used
for the LS algorithm.
Figure 5.19: LS Mauser waveform template
5.4.1
TEMPLATE MATCHING WITH LEAST SQUARES
To recognise a Mauser gunshot, a template of a Mauser gunshot was created, by
averaging 6 gunshot waveforms at the same time instance. The audio template was also
normalised to a maximum of 1. The following figures were obtained using the audio
template waveform, as shown in Figure 5.19, with the LS based algorithm. As can be seen
59
in Figure 5.20, the output of the algorithm shows a maximum value where the gunshot
impulse was found for a gunshot fired 500m away from the recording equipment.
Figure 5.20: Output of Least Squares algorithm of Mauser gunshot at 500m away
The LS algorithm was also tested on a gunshot from a Mauser that was fired 1000m
meters away from the recorded microphones. Figure 5.21 shows that the gunshot was also
detected in the output.
Figure 5.21: Output of Least Squares algorithm of Mauser gunshot at 1000m away
60
Figure 5.22 shows the output of the LS algorithm for the gunshot recording where the gun
was fired 1500m meters away from the recording microphones. Figure 5.22 also shows
that the gunshot is not detected and that the recorded speech waveforms have higher
peaks in the output than the gunshot.
Mauser gunshot detection
Figure 5.22: Output of Least Squares algorithm of Mauser gunshot at 1500m away
The LS algorithm only gave reasonably good results with the Mauser gunshot data,
compared to when the LS algorithm was applied to the pistol gunshot data. The results
obtained were not conclusive for pistol data.
5.4.2
RKHS USING A SECOND ORDER POLYNOMIAL KERNEL
The outputs of the RKHS algorithm using a second order polynomial kernel are described
in the following section. Higher order polynomial kernels took extremely long to process
and the linear kernel did not produce any conclusive results.
61
5.4.2.1
The Mauser Template used for RKHS
Figure 5.23 shows the Mauser template created to be used as the training set for this
algorithm. This training set shown in Figure 5.23 showed the best result in obtaining
Mauser impulse detections on the different tested distances (i.e. 500m, 1000m, 1500m,
and 1700m).
The template in Figure 5.23 was created by averaging Mauser waveforms from gunshots
fired 1500m away from the microphone array. Every 4th sample of the output of the
averaging was then taken and interpolated in between the samples, but still keeping the
original number of samples in the waveform. The waveform of the template was
smoothed by interpolating over the samples. More accurate results were obtained with the
RKHS algorithm using the smoothed waveform. Interpolation of the templates to smooth
out the waveforms, stemmed from work done by Viola and Walker (2005). Lastly the
audio template was normalised to a maximum of 1.
Figure 5.23: Mauser template used in RKHS training sequence, smoothed by interpolation
62
5.4.2.2 The Pistol Template used for RKHS
Figure 5.24 shows the pistol template created to be used as the pistol training set for this
algorithm. This training set shown in Figure 5.24 showed the best result in obtaining
pistol impulse detections on the different tested distances (i.e. 500m and 1000m)
The template in Figure 5.24 was created by averaging pistol waveforms from gunshots
fired 1000m away from the microphone array. The audio template of the pistol was also
normalised to a maximum of 1.
Figure 5.24: Pistol template used in RKHS training sequence, smoothed by interpolation
63
5.4.2.3 Mauser impulse detection using a 2nd order polynomial RKHS kernel at 500m
Figure 5.25 (a) shows a waveform from a Mauser gunshot fired 500m away from the
microphone array with speech sound waveforms. Figure 5.25 (b) shows the output of the
RKHS algorithm using a 2nd order polynomial kernel. This output was obtained by
creating a training set from the Mauser template shown in Figure 5.23, then feeding the
signal shown in Figure 5.25 (a), into the RKHS network.
Figure 5.25 (b) shows a maximum peak and a large undershoot compared to the rest of the
output. This peak in the output is in the same position as the Mauser gunshot, thus
identifying the gunshot and its position in the recording.
Mauser Gunshot
(a)
(b)
Figure 5.25: Mauser gunshot at 500m and output using a 2nd order polynomial RKHS kernel
64
5.4.2.4 Pistol impulse detection using a 2nd order polynomial RKHS kernel at 500m
Two pistol waveforms from gunshots fired 500m away from the microphone array can be
seen in Figure 5.26 (a). The recorded waveform in Figure 5.26 (a) also contains speech
sound waveforms recorded close to the microphone array. Figure 5.26 (b) shows the
output of the RKHS algorithm using a 2nd order polynomial kernel. This output was
obtained by creating a training set from the pistol template shown in Figure 5.24 and then
feeding the signal shown in Figure 5.26 (a), into the RKHS network.
Figure 5.26 (b) shows that the maximum peaks from the speech waveform are also
present with the peaks of the pistol waveforms in the output. If one would make a
detection threshold of 105, then 4 gunshots would be detected, although there are only 2
shots fired. The RKHS algorithm in this case (pistol at 500m) doesn’t suppress the speech
waveform peaks enough in the output.
2 Pistol Gunshots
(a)
(b)
Figure 5.26: Pistol gunshots at 500m and output using a 2nd order polynomial RKHS kernel
65
5.4.2.5 Mauser impulse detection using a 2nd order polynomial RKHS kernel at 1000m
Figure 5.27 (a) shows a waveform from a Mauser gunshot fired 1000m away from the
microphone array with speech sound waveforms recorded close to the microphone array.
Figure 5.27 (b) shows the output of the RKHS algorithm using a 2nd order polynomial
kernel. This output was obtained by creating a training set from the Mauser template
shown in Figure 5.23, then feeding the signal shown in Figure 5.27 (a) into the RKHS
network.
Figure 5.27 (b) shows a maximum peak and a large undershoot compared to the rest of the
output. This peak in the output is in the same position as the Mauser gunshot, thus
identifying the gunshot and its position in the recording.
Mauser Gunshot
(a)
(b)
Figure 5.27: Mauser gunshot at 1000m and output using a 2nd order polynomial RKHS kernel
66
5.4.2.6 Pistol impulse detection using a 2nd order polynomial RKHS kernel at 1000m
Figure 5.28 (a) shows a pistol waveform from a gunshot fired 1000m away from the
microphone array. Figure 5.28 (a) also contains speech sound waveforms. Figure 5.28 (b)
shows the output of the RKHS algorithm using a 2nd order polynomial kernel. This output
was obtained by creating a training set from the pistol template shown in Figure 5.24 and
then feeding the signal shown in Figure 5.28 (a), into the RKHS network.
Figure 5.28 (b) shows that the maximum peaks from the speech waveform are also
present with the peaks of the pistol waveforms in the output. If one would make a
detection threshold of 125, then only the pistol’s impulse is detected. The output of the
RKHS algorithm in this case (pistol at 1000m) does identify the pistol impulse and its
position in the recording.
Pistol Gunshot
(a)
(b)
Figure 5.28: Pistol gunshot at 1000m and output using a 2nd order polynomial RKHS kernel
67
5.4.2.7 Mauser impulse detection using a 2nd order polynomial RKHS kernel at 1500m
In Figure 5.29 (a) the waveform from a Mauser gunshot fired 1500m away from the
microphone array with speech sound waveforms can be seen. Figure 5.29 (b) shows the
output of the RKHS algorithm using a 2nd order polynomial kernel. This output was
obtained by creating a training set from the Mauser template shown in Figure 5.23, then
feeding the signal shown in Figure 5.29 (a) into the RKHS network.
Figure 5.29 (b) shows a large undershoot compared to the rest of the output. This large
negative peak in the output is in the same position as the Mauser gunshot, thus positively
identifying the gunshot and its position in the recording. It is evident that in this case that
the speech and environmental noise are suppressed in the output by the RKHS algorithm.
Mauser Gunshot
(a)
(b)
Figure 5.29: Mauser gunshot at 1500m and output using a 2nd order polynomial RKHS kernel
68
5.4.2.8 Mauser impulse detection using a 2nd order polynomial RKHS kernel at 1700m
A waveform from a Mauser gunshot fired 1700m away from the microphone array and
speech sound waveforms are shown Figure 5.30 (a). The output of the RKHS algorithm
using a 2nd order polynomial kernel is seen in Figure 5.30 (b). This output was obtained
by creating a training set from the Mauser template shown in Figure 5.23, then feeding the
signal shown in Figure 5.30 (a) into the RKHS network.
Figure 5.30 (b) shows a large undershoot, compared to the rest of the output. This large
negative peak in the output is in the same position as the Mauser gunshot, thus positively
identifying the gunshot and its position in the recording. The speech and environmental
noise are suppressed (compared to the input signal) relative to the detected Mauser
impulse in the output by the RKHS algorithm.
(a)
(b)
Figure 5.30: Mauser gunshot at 1700m and output using a 2nd order polynomial RKHS kernel
69
5.4.3
SUPPORT VECTOR MACHINES
The outputs for the Least Square version of a Support Vector Machine (LS-SVM)
algorithm developed by Katholieke Universiteit Leuven†† in Belgium are described in the
following section. For simplicity the term SVM will be used in the following sections
when referring to the Least Square implementation. Results from the second and third
order polynomial kernels are shown, because they gave much better results compared to
the linear kernel. The regularisation constant gamma (γ) set to a value of 1 also showed
the best results.
5.4.3.1
The Mauser Template used for SVM network
Figure 5.31 shows the Mauser template created to use as the training set for the SVM
algorithm. The template shown in Figure 5.31 showed the best result in obtaining Mauser
impulse detections with the SVM network, on the different recorded distances (i.e. 500m,
1000m, 1500m, and 1700m).
Figure 5.31: Mauser template used in SVM network training sequence, smoothed by interpolation
††
LS-SVMLab available at http://www.esat.kuleuven.be/sista/lssvmlab/ (Accessed 3 October 2012)
70
The template in Figure 5.31 was created by averaging Mauser waveforms from gunshots
fired 1500m away from the microphone array. Lastly the audio template was normalised
to a maximum of 1.
5.4.3.2 The Pistol Template used for SVM
Figure 5.32 shows the pistol template created to use as the pistol training set for the SVM
algorithm. This training set shown in Figure 5.32 showed the best result in obtaining
pistol impulse detections on the different tested distances (i.e. 500m and 1000m). The
template in Figure 5.32 was created by averaging pistol waveforms from gunshots fired
1000m away from the microphone array and then the audio template was normalised to a
maximum of 1.
Figure 5.32: Pistol template used in SVM network training sequence
71
5.4.3.3 Impulse detection using a 2nd order polynomial SVM kernel and a Pistol
Training set for recordings 500m away
In Figure 5.33 (a) two pistol waveforms and a Mauser waveform from gunshots fired
500m away from the microphone array can be seen. Figure 5.33 (a) also contains speech
sound waveforms. Figure 5.33 (b) shows the output of the SVM algorithm using a 2nd
order polynomial kernel. This output was obtained by creating a training set from the
pistol template shown in Figure 5.32 and then feeding the signal shown in Figure 5.33 (a),
into the SVM network.
The output, in Figure 5.33 (b), reveals 3 maximum peaks at 1, indicating gunshot
detections at those positions of the recording. These peaks are at the same position as the
Mauser and pistol gunshot waveforms in the test signal (Figure 5.33 (a)). This indicates
that this pistol training set (template) detects both the Mauser and pistol impulses with an
SVM network using a 2nd order polynomial kernel.
2 Pistol Gunshots
Mauser Gunshot
(a)
(b)
Figure 5.33: Pistol and Mauser gunshots at 500m and output of 2nd order polynomial SVM kernel
using a pistol template
72
5.4.3.4 Impulse detection using a 3rd order polynomial SVM kernel and a Mauser
Training set for recordings 500m away
Figure 5.34 (a) shows two pistol waveforms and a Mauser waveform from gunshots fired
500m away from the microphone. Figure 5.34 (a) also contains speech sound waveforms.
Figure 5.34 (b) shows the output of the SVM algorithm using a 3rd order polynomial
kernel. This output was obtained by creating a training set from the Mauser template
shown in Figure 5.31 and then feeding the signal shown in Figure 5.34 (a), into the SVM
network.
The output in Figure 5.34 (b) reveals a maximum peak with amplitude of 1, indicating a
gunshot detection in that position of the recording. This peak is at the same position as the
Mauser waveform in the test signal shown Figure 5.34 (a). This indicates that this Mauser
training set (template) detects only the Mauser impulse with an SVM network using a 3rd
order polynomial kernel.
2 Pistol Gunshots
Mauser Gunshot
(a)
(b)
Figure 5.34: Pistol and Mauser gunshots at 500m and output of 3rd order polynomial SVM kernel
using a Mauser template
73
5.4.3.5 Impulse detection using a 2nd order polynomial SVM kernel and a Pistol
Training set for recordings 1000m away
Figure 5.35 (a) shows a pistol waveform and a Mauser waveform from gunshots fired
1000m away from the microphone array. Figure 5.35 (a) also shows that the recording
contains speech sound waveforms. Figure 5.35 (b) shows the output of the SVM
algorithm using a 2nd order polynomial kernel. This output was obtained by creating a
training set from the pistol template shown in Figure 5.32 and then feeding the signal
shown in Figure 5.35 (a), into the SVM network.
The output as seen in Figure 5.35 (b) reveals 2 maximum peaks with amplitude of 1,
indicating gunshot detections in the same positions of the recording. These peaks are at
the same position as the Mauser and pistol gunshot waveforms in the test signal (Figure
5.35 (a)). This indicates that this pistol training set (template) detects both the Mauser and
pistol impulses (from gunfire 1000m away) with an SVM network using a 2nd order
polynomial kernel.
Pistol Gunshot
Mauser Gunshot
(a)
(b)
Figure 5.35: Pistol and Mauser gunshots at 1000m and output of 2nd order polynomial SVM kernel
using a pistol template
74
5.4.3.6 Impulse detection using a 3rd order polynomial SVM kernel and a Mauser
Training set for recordings 1000m away
Figure 5.36 (a) shows a pistol waveform and Mauser waveform from gunshots fired 000m
away from the microphone array. Figure 5.36 (a) also contains speech sound and
environmental noise waveforms that are local to the microphone array. Figure 5.36 (b)
shows the output of the SVM algorithm using a 3 rd order polynomial kernel. This output
was obtained by creating a training set from the Mauser template shown in Figure 5.31
and then feeding the signal shown in Figure 5.36 (a), into the SVM network.
The output as seen in Figure 5.36 (b) reveals a maximum peak with amplitude of 1,
indicating gunshot detection in that position of the recording. This peak is at the same
position as the Mauser waveform in the test signal (Figure 5.36 (a)). This indicates that
this Mauser training set (template) detects only the Mauser impulse with an SVM network
using a 3rd order polynomial kernel.
Mauser Gunshot
Pistol Gunshot
(a)
(b)
Figure 5.36: Pistol and Mauser gunshots at 1000m and output of 3rd order polynomial SVM kernel
using a Mauser template
75
5.4.3.7 Impulse detection using a 3rd order polynomial SVM kernel and a Mauser
Training set for recordings 1500m away
Figure 5.37 (a) shows a waveform from a Mauser gunshot fired 1500m away from the
microphone. Figure 5.37 (a) also contains speech sound waveforms that are local to the
microphone array. Figure 5.37 (a) shows that the speech and noise are almost
indistinguishable from the Mauser gunshot waveform. Figure 5.37 (b) shows the output of
the SVM network using a 3rd order polynomial kernel. This output was obtained by
creating a training set from the Mauser template shown in Figure 5.31 and then feeding
the signal shown in Figure 5.37 (a), into the SVM network.
The output (Figure 5.37 (b)) reveals a maximum peak at 1, indicating gunshot detection in
that position of the recording. This peak is at the same position as the Mauser waveform
in the test signal (Figure 5.37 (a)). This indicates that this Mauser training set (template)
detects the Mauser impulse with an SVM network using a 3rd order polynomial kernel at a
distance 1500m away from the microphone array.
(a)
Mauser Gunshot
(b)
Figure 5.37: Mauser gunshot at 1500m and output of 3rd order polynomial SVM kernel using a
Mauser template
76
5.4.3.8 Impulse detection using a 3rd order polynomial SVM kernel and a Mauser
Training set for recordings 1700m away
Figure 5.38 (a) shows a Mauser waveform from a gunshot fired 1700m away from the
microphone. Figure 5.38 (a) also contains speech sound and environmental noise
waveforms that are local to the microphone array. Figure 5.38 (b) shows the output of the
SVM algorithm using a 3rd order polynomial kernel. This output was obtained by
creating a training set from the Mauser template shown in Figure 5.31 and then feeding
the signal shown in Figure 5.38 (a), into the SVM network.
The output reveals a maximum peak at 1, indicating a gunshot detection in that position of
the recording. This peak is at the same position as the Mauser waveform in the test signal
(Figure 5.38 (a)). This indicates that this Mauser training set (template) detects the
Mauser impulse with an SVM network using a 3rd order polynomial kernel at a distance of
1700m away from the microphone array.
(a)
(b)
Figure 5.38: Mauser gunshot at 1700m and output of 3rd order polynomial SVM kernel using a
Mauser template
77
5.5 DETECTION ALGORITHM ACCURACY COMPARISON
This section will compare the accuracy of GCC, LS, RKHS and the SVM algorithms used
for impulse detection. In Tables 5.1 to 5.6 a value of 1, indicates a positive detected
gunshot, while a value of 0 indicates a non-detection.
Table 5.1 shows that the GCC and SVM algorithms are the most accurate using pistol
templates, for shots fired 500m away from the recording device. At this distance, using
Pistol templates, the GCC and SVM algorithms detects both the Pistol and the Mauser as
a gunshot sound. Although the LS and the RHKS algorithms indicates a gunshot detection
from the Pistol sound, these algorithms make a false detection by indicating the speech
and car sounds also as gunshot sounds.
Table 5.1: Comparison of detection algorithms using Pistol
templates for shots fired 500m away
Pistol Templates 500m
Sound
Speech
Car door
Car engine
Mauser
Pistol
GCC
LS
RHKS
SVM
0
0
0
1
1
1
1
0
0
1
1
1
0
0
1
0
0
0
1
1
78
In Table 5.2 the GCC, RHKS, LS and the SVM algorithms make a positive detection on
the Mauser gunshot sound using Mauser templates, for shots fired 500m away. The LS
also makes a false detection, indicating the speech sound as a gunshot.
Table 5.2: Comparison of detection algorithms using Mauser
templates for shots fired 500m away
Mauser Templates 500m
Sound
Speech
Car door
Car engine
Mauser
Pistol
GCC
LS
RHKS
SVM
0
0
0
1
0
1
0
0
1
0
0
0
0
1
0
0
0
0
1
0
In Table 5.3, using Pistol Templates, all the listed algorithms detect the Pistol and the
Mauser sounds as a gunshot, except the RHKS algorithm, which only detects a Pistol
shot. The LS algorithm also makes a false detection on the speech sound. The gunshots
were fired 1000m away from the microphones.
Table 5.3: Comparison of detection algorithms using Pistol
templates for shots fired 1000m away
Pistol Templates 1000m
Sound
Speech
Car door
Car engine
Mauser
Pistol
GCC
LS
RHKS
SVM
0
0
0
1
1
1
0
0
1
1
0
0
0
0
1
0
0
0
1
1
79
As can be seen from the comparison of the 4 listed algorithms seen in Table 5.4, all of the
algorithms using Mauser templates detect the Mauser gunshot. The LS algorithm also
makes a false gunshot detection on the speech sound. These results are for gunshots fired
at a distance of 1000m away from the microphones.
Table 5.4: Comparison of detection algorithms using Mauser
templates for shots fired 1000m away
Mauser Templates 1000m
Sound
Speech
Car door
Car engine
Mauser
Pistol
GCC
LS
RHKS
SVM
0
0
0
1
0
1
0
0
1
0
0
0
0
1
0
0
0
0
1
0
The LS algorithm makes no gunshot detections, as shown in Table 5.5. The GCC, RKHS
and the SVM all make positive gunshot detections on the Mauser gunshot sound using
Mauser templates on shots fired 1500m away from the microphones.
Table 5.5: Comparison of detection algorithms using Mauser
templates for shots fired 1500m away
Mauser Templates 1500m
Sound
Speech
Car door
Car engine
Mauser
Pistol
GCC
LS
RHKS
SVM
0
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
80
Table 5.6 shows the comparison of the 4 listed algorithms, on Mauser gunshots fired
1700m away from the microphones. All of the algorithms, except the LS, make a positive
gunshot detection, using Mauser Templates.
Table 5.6: Comparison of detection algorithms using Mauser
templates for shots fired 1700m away
Mauser Templates 1700m
Sound
Speech
Car door
Car engine
Mauser
Pistol
GCC
LS
RHKS
SVM
0
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
5.6 EXECUTION
TIME
COMPARISON
BETWEEN
DIFFERENT IMPULSE DETECTION ALGORITHMS
THE
The following execution times were obtained in the Matlab 2007b environment on an
Intel® Core 2 Duo™ T7500
‡‡
2.2 GHz processor, and 1 GB RAM (Random Access
Memory).
The average length of the templates or training sets were 100 samples and the sample
length of the test signals were 5000 samples. The training and test signals were sampled at
5 kHz. The training of the different algorithms were done before the prediction algorithms
were applied, and is not incorporated in the calculation of the execution time. Only the
time it takes to make a prediction based on per 5000 samples of the test signal is used in
the calculation.
‡‡
http://www.intel.com [Accessed 20 September 2011]
81
The Matlab function “cputime” was used to measure the time a specific algorithm takes to
process 5000 samples. The Least Square algorithm executed the fastest with 1.562 ms per
5000 samples. Secondly was the General Cross Correlation algorithm averaging 3.120 ms
per 5000 samples. The Support Vector Machine algorithm with 2nd and 3rd order
polynomial kernels took 1078 ms to process 5000 samples. The RKHS algorithm with a
2nd order polynomial took the longest to execute by averaging 4287 ms to process 5000
samples.
Figure 5.39 shows the different execution times in millisecond per 5000 samples, for the
different impulse detection algorithms.
1.0E+04
4187 ms
Time (ms) per 5000 samples
1078 ms
1.0E+03
1.0E+02
1.0E+01
3.120 ms
1.562 ms
1.0E+00
LS
GCC
SVM
RKHS
Figure 5.39: Comparison of execution time, milliseconds per 5000 samples, between the different
impulse detection algorithms
82
Table 5.7 shows the execution time for the different detection algorithms in µs/sample. It
is shown in ascending order, where LS is the fastest algorithm and RKHS takes the
longest to execute.
Table 5.7: Execution time of the detection algorithms in µs/sample
Algorithm
LS
GCC
SVM
RKHS
µs/sample
0.313
0.624
216
837
5.7 CONCLUSION
Chapter 5 showed and discussed the graphed Matlab results obtained from the various
detection algorithms (GCC, LS, RKHS and SVM). It also gave tabled summaries on the
performance of the algorithms compared with one another.
83
6 CONCLUSION
This chapter discusses the results obtained in chapter 5, and will state the conclusions
drawn from the discussion. Recommendations are also made on future research and a
habitat protection strategy.
6.1 THE ACCURACY OF THE ALGORITHMS
The results in chapter 5 show that the GCC and SVM impulse detection algorithms are the
most accurate, over all the different distances (500m, 1000m, 1500m and 1700m). In the
shorter ranges (500m and 1000m), the RKHS algorithm sometimes generates false
detections on speech waveforms, but becomes more accurate for the longer ranges. This
might be attributed to a bigger impulse waveform of the gunshot, at shorter distances, that
correlates too much with the waveform of the speech that was recorded close to the
microphones. The LS algorithm performs the worst of all 4. It does detect gunshots at
short range (500m and 1000m), but it also creates false detections on speech waveforms.
At longer ranges (1500m and 1700m) the LS detection algorithm fails completely.
6.2 COMPLEXITY MEASURED IN ALGORITHM PROCESSING
TIME
The LS detection algorithm executes the fastest at 313 ns per sample, followed by the
GCC detection algorithm at 624 ns per sample. The RKHS detection algorithm takes the
longest of all 4 algorithms to process at 837 µs per sample, while the SVM algorithm
executes almost 4 times faster than the RKHS algorithm at 216 µs per sample. The
84
execution time of all the algorithms can be greatly reduced if implemented in a
programming language like C instead of Matlab.
6.3 OVERALL PERFORMANCE OF THE IMPULSE DETECTION
ALGORITHMS
Overall the GCC detection algorithm performed the best measured on its short execution
time (less complex) and its accuracy. The SVM detection algorithm’s accuracy is the
same as the GCC’s accuracy measured at 500m to 1700m, but with a higher complexity
(longer processing time). For gunshots fired at distances larger than 1700m, the SVM
detection algorithm might be more accurate than the GCC, if the SVM algorithm could be
implemented with more sound templates on equipment with higher processing power.
6.4 FUTURE RESEARCH
Future research might include the accuracy of the GCC and SVM impulse detection
algorithms over larger distances, up to a range of 7km or more. Also a wider range of gun
calibres might be included in the study, as well as more types of sounds, for instance the
mechanical action sounds of firearms, helicopters and other automobiles which might be
added as a possible threat to the preservation of wild life. Another area of further research
could be to train multiple classifiers for the specific gunshots which they are good at, and
then combining the outputs of the different detection algorithms. Also different strategies
for implementing gunshot detection with the aim of protecting large habitats might also be
researched. Optical threat detection systems could also be incorporated into the
conservation strategy.
85
6.4.1
ANTI-POACHING APPLICATION AND STRATEGY
By incorporating the research from Smith, Buscemi, and Xu (2010) and Wang (2009) and
also building on the conclusions reached in sections 6.1 to 6.3, further research might be
warranted for different gunshot detection and localisation strategies that can form part of a
Habitat Management System (HMS). This HMS should be able to protect and monitor
nature reserves like the Kruger National Park or high risk poaching areas in a reserve.
6.4.1.1 Low-end gunshot detection modules
The GCC algorithm can be implemented on low-end gunshot detection modules where
the operation of the module might rely solely on solar energy or batteries. Because the
GCC algorithm is less complex, it might be implemented on a low-power device. A
detection node could be integrated into the communications radios, as suggested by
Smith, Buscemi, and Xu (2010), of park rangers or patrol personnel. Implementation of
Lightning Protocol, as suggested by Wang (2009), could also be incorporated into the
strategy to wake-up high-end detection modules in hibernation. Apart from waking-up
high-end modules, the low-end modules would share muzzle blast and/or shock wave
time of arrival information with the high-end nodes. The low-end modules can either be at
fixed positions or have an integrated GPS on mobile units (communications radios).
6.4.1.2 High-end gunshot detection modules
Implementation of the SVM algorithm can occur on high-end detection and localisation
modules. Because of the higher complexity and possible higher accuracy at larger
86
distances of the SVM algorithm, modules that are capable of higher computational
complexity might be used for the implementation. Thus requiring more power to operate,
these modules can be integrated into the patrol vehicles and also at rest camps of the
reserve. Also the existing structures of cell phone base stations can be utilised to
implement the high-end detection nodes. In remote areas where batteries or solar energy
is the only option for power, these modules can be put into hibernation until woken up by
a low-end module as suggested by Wang (2009).
Other functionalities of the high-end modules can include the localisation of gunshots and
also some additional measurements or sensors for terrain information for increased
accuracy. The terrain information can be incorporated with a Kalman adaptive filter as
suggested by Smith, Buscemi, and Xu (2010). The high-end modules could also be used
to relay the gunshot event data to a centralised node or control room at the rest camps.
6.4.1.3 Habitat Protection Strategy
Figure 6.1 illustrates a strategy to protect a nature reserve. High-end and low-end gunshot
detection modules are shown dispersed across the nature reserve. Low-end gunshot
detection modules are also incorporated in the communications radios of the reserve’s
personnel. High-end modules are furthermore incorporated into structures of cell phone
base stations and into the patrol vehicles of the reserve. High-end modules in remote areas
can operate from solar-power. Figure 6.1 also shows optical threat detectors on locations
with higher elevations. The whole reserve is protected by a perimeter intrusion detection
system in addition.
87
Cell phone Base Station
High-end Gunshot Detection Module
Reserve Patrol Vehicle
Low-end Gunshot Detection Module
Patrol Personnel
Perimeter Intrusion Detector
Optical Threat Detectors
Rest camp and Central Control Room
Figure 6.1: Illustration of a habitat protection strategy
88
7 BIBLIOGRAPHY
Burges, C.J.C., 1998. ‘A Tutorial on Support Vector Machines for Pattern Recognition’,
Data Mining and Knowledge Discovery, 2, 121–167, 1998.
Chen, C., Abdallah A. and Wolf W., 2006. ‘Audiovisual Gunshot Event Recognition’, in
Proc IEEE International Conference on Systems, Man and Cybernetics, (SMC '06),
Taipei, Taiwan, pp. 4807-4812, October 2006.
Cilliers, J.E. and Smit, J.C, 2007. ‘Pulse compression sidelobe reduction by minimization
of Lp norms’, IEEE Transactions on Aerospace and Electronic Systems, 2007.
De Brabanter, K., Karsmakers, P., Ojeda, F., Alzate, C., De Brabanter, J., Pelckmans, K.,
De Moor, B., Vandewalle, J. and Suykens, J.A.K., 2011. ‘LS-SVMlab Toolbox User’s
Guide, version1.8’. [online] Available at: <http://www.esat.kuleuven.be/sista/lssvmlab/>
[Accessed 3 July 2012].
Defense Update, 2008. ‘Sniper Location & Gunshot Detection Systems’. [online]
Available
at:
<http://defense-
update.com/features/2008/november/231108_sniper_detection.html>
October 2012].
89
[Accessed
3
Environment News Service, 2011. ‘Powdered Rhino Horn as Pricey as Street Cocaine’.
[online] Available at: <http://www.ens-newswire.com/ens/feb2011/2011-02-15-02.html>
[Accessed 3 July 2012].
Green, M.L., Watkins, C., Rogan D. and Frank, J., 1999. ‘Random Gunfire Problems and
Gunshot Detection Systems’, National Institute of Justice, December 1999.
Hertz, D., 1986. ‘Time delay Estimation by combining efficient algorithms and
generalized cross-correlation methods’. IEEE transactions on acoustics, speech, and
signal processing, vol. asp-34 number 1, February 1986.
Ifeachor, E.C. and Jervis, B.W., 2002. Digital signal processing a practical approach. 2nd
ed., Pearson education limited, England.
IUCN, 2011. ‘The IUCN Red List of Threatened Species, Diceros bicornis’. [online]
Available at: <http://www.iucnredlist.org/apps/redlist/details/6557/0> [Accessed 3 July
2012].
Lutsa, J., Ojeda F., Van de Plas, R., De Moor, B., Van Huffel, S. and Suykens, J.A.K.,
2010. ‘A tutorial on support vector machine-based methods for classification problems in
chemometrics’. Analytica Chimica Acta, vol. 665, pp. 129–145, 2010.
Maher,
R.C.,
2007
‘Acoustical
Characterization
of
Gunshots’,
IEEE Workshop on Signal Processing Applications for Public Security and Forensics,
April, 2007.
90
Matlab, 2002. ‘Adaptive Filters in the Filter Design Toolbox’. [online] Available at:
<http://ee.tamu.edu/matlab-help/toolbox/filterdesign/adaptive.html> [Accessed 27 June
2007].
Milliken T., Burn R.W. and Sangalakula L, 2007. ‘The Elephant Trade Information
System (ETIS) and the Illicit Trade in Ivory’, A report to the 14th meeting of the
Conference of the Parties to CITES.
Moler, C., 2008. ‘Least Squares’, Numerical Computing with MATLAB. [online]
Available at: <www.mathworks.com/moler/leastsquares.pdf> [Accessed 27 September
2012].
Pauli, M., Seisler, W., Price, J., Williams, A., Maraviglia, C., Evans, R., Moroz, S.,
Ertem, M. C., Heidhausen, E. and Burchick, D.A., 2004. ‘Infrared Detection and
Geolocation of Gunfire and Ordnance Events from Ground and Air Platforms’. [online]
Available
at:
<www.dtic.mil/cgi-bin/GetTRDoc?AD=ADA460225>
[Accessed
27
September 2012].
Pikrakis, A., Giannakopoulos T., and Theodoridis S., 2008. ‘Gunshot detection in audio
streams from movies by means of dynamic programming and Bayesian networks’, in
Proc IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP 2008), Las Vegas, NV, pp. 21-24, March 2008.
Raffensperger, L, 2008. ‘Illegal Animal Trade Finances War in Africa’. [online]
Available at: <http://earthtrends.wri.org/updates/node/291> [Accessed 9 June 2009].
91
Roberts A.M, 2002. ‘Elephants still under the gun’, Animal welfare institute quarterly,
vol. 51, nr. 3, 2002.
SavingRhinos.org, 2012. ‘South Africa: 251 Rhinos Killed in 172 Days’. [online]
Available
at:
<http://www.rhinoconservation.org/2012/06/21/south-africa-251-rhinos-
killed-in-172-days/> [Accessed 3 July 2012].
SavingRhinos.org, 2013. ‘South Africa: 668 Rhinos Killed in 2012’. [online] Available at:
<http://www.rhinoconservation.org/2013/01/10/south-africa-668-rhinos-killed-in-2012/>
[Accessed 26 February 2013].
Smith, M., Buscemi, S and Xu, DJ., 2010. ‘Gunshot Detection System for JTRS Radios’,
The 2010 Military Communications Conference’, p.266-271, October 2010.
Steinwart I, Hush D and Scovel C, 2006. ‘An explicit description of the reproducing
kernel Hilbert spaces of Gaussian RBF kernels’, Modeling, Algorithms and Informatics
Group, CCS-3 Los Alamos National Laboratory, February 6, 2006.
Suykens, J.A.K., Vandewalle, J., 1999. ‘Least squares support vector machine classifiers’,
Neural Processing Letters, vol. 9, issue 3, pp. 293–300, 1999.
The Register, 2012. ‘Rhino horn price spike drives record poaching’. [online] Available
at: <http://www.theregister.co.uk/2012/01/03/quacks_and_crims_take_rhinos> [Accessed
3 July 2012].
92
Van Wyk, B.J., van Wyk, M.A. and Noel, G., 2004. ‘Lecture Notes in Computer
Science’, LNCS: Structural, Syntactic, and Statistical Pattern Recognition, vol. 3138, pp.
831-839, 2004.
Viola, F. and Walker, W.F., 2005. ‘A Spline-Based Algorithm for Continuous TimeDelay Estimation Using Sampled Data’, IEEE transactions on ultrasonics, ferroelectrics,
and frequency control, vol. 52, nr. 1, January 2005.
Wang Q, 2009. ‘Applying Lightning Protocol to Gunshot Localization’, Department of
Computer Science, University of Illinois at Urbana-Champaign. [online] Available at:
<http://www-rtsl.cs.uiuc.edu/papers/lightningingunshotlocalization.pdf> [Accessed 3 July
2012].
WWF,
2011.
‘Javan
rhinos
extinct
in
Viëtnam’.
[online]
Available
at:
<http://www.worldwildlife.org/who/media/press/2011/WWFPresitem24582.html>
[Accessed 3 July 2012 ].
WWF South Africa, 2012. ‘Rhino poaching deaths continue to increase in South Africa’.
[online] Available at: http://www.wwf.org.za/?5203/rhino2011 [Accessed 3 July 2012 ].
Zhang, Y., Li, X., Jin, Y., Amin, M.G., 2009. ’Distributed Radar Network for Real-Time
Tracking of Bullet Trajectory’, Wireless Sensing and Processing IV, Proc. of SPIE Vol.
7349, 2009.
93
APPENDIX A
A.1 LABVIEW EXPERIMENTAL PREPARATION
Figure A.1: Labview program that mixes incoming channels with noise in a good signal-to-noise ratio
94
Figure A.2: Labview program that shows the correlation graphs and angle calculation for a optimal
signal-to-noise ratio
95
Figure A.3: Shows where the noise and gunshot signal peaks are the same, gunshot impulses start to
get buried in the noise
96
Figure A.4: Correlation peaks start to disappear
97
Figure A.5: Gunshot signal peaks are buried in the noise, noise values are greater than the impulse
values
98
Figure A.6: The correlation calculation becomes unstable giving wrong values for the angle
99