acoustic impulse detection algorithms for application in gunshot
Transcription
acoustic impulse detection algorithms for application in gunshot
ACOUSTIC IMPULSE DETECTION ALGORITHMS FOR APPLICATION IN GUNSHOT LOCALISATION by J.F. VAN DER MERWE Submitted in partial fulfilment of the requirements for the degree MAGISTER TECHNOLOGIAE: ELECTRICAL ENGINEERING in the Department of Electrical Engineering FACULTY OF ENGINEERING & THE BUILT ENVIRONMENT TSHWANE UNIVERSITY OF TECHNOLOGY Supervisor: Dr. J.A. Jordaan November 2012 DECLARATION BY CANDIDATE “I hereby declare that the dissertation submitted for the degree M.Tech: Electrical Engineering, at Tshwane University of Technology, is my own work and has not previously been submitted to any other institution of higher education. I further declare that all sources cited or quoted are indicated and acknowledged by means of a comprehensive list of references”. J.F. van der Merwe Copyright© Tshwane University of Technology 2012 ii DEDICATION I dedicate this to the preservation of all wild life… iii ACKNOWLEDGEMENTS I want to thank the Creator for the opportunity that was given to me, whom without none of this would have been possible. I also wish to thank Dr. Jaco Jordaan for all his insight, effort and patience in helping me with this project. Furthermore I would like to thank F’SATI and Mr. André Hattingh for helping me to obtain the environmental data, used to compile this project and also for the opportunity to do so. Lastly I wish to thank my family for all their support and love during this time. iv ABSTRACT In a society that becomes more gun driven each day and with a great decline in endangered wildlife species, a need was created for a system that can identify and pinpoint the position of gunfire events from within a natural environment. This dissertation researches, simulates and compares different impulse detection algorithms for the application of gunshot localisation. The area of research includes Generalised Cross Correlation (GCC), Least Square (LS) training algorithms as well as training algorithms using a Reproducing Kernel Hilbert Space (RKHS) approach. Lastly it also incorporates Support Vector Machines (SVM) for training a network to recognise gunshot impulses. The gunshot sound can be corrupted with greatly amplified noise or nearby sounds like speech. Gunshot sounds were recorded using a star topology array of 3 microphones, connected to a low pass filter and amplifier. Gunshot sounds of large and medium calibre guns were recorded at different distances away from the microphone setup. These were used to create templates of large and medium calibre guns, used for training a system, to recognise a gunshot at different distances and sound environments. The GCC and the SVM algorithms proved to be the most accurate over all the different distances. The GCC algorithm executed much faster than the SVM algorithm, but given more sound templates and equipment with higher processing power, the SVM algorithm might be more accurate at even larger distances. The output of this research can be used to create an anti-poaching system, especially for endangered species like elephants and rhinos. v EKSERP In 'n gemeenskap wat elke dag meer geweer gedrewe word en met 'n groot afname in bedreigde natuurlewe spesies, het daar ‘n leemte ontstaan vir ‘n stelsel wat geweerskote kan identifiseer en die posisie van die geweerskote kan opspoor vanuit ‘n natuurlike omgewing Hierdie verhandeling vors na, simuleer en vergelyk verskillende impuls opsporings algoritmes vir die toepassing van geweerskoot lokalisering. Die gebied van navorsing sluit in, Algemene Kruis Korrelasie (AKK), Kleinste Kwadrate (KK) algoritmes vir opleibare netwerke asook algoritmes vir 'n Herproduseerbare Kern Hilbert Ruimte (HKHR) benadering. Ten slotte inkorporeer die verhandeling ook Ondersteunbare Vektor Masjiene (OVM) vir opleibare netwerke in geweerskoot impuls herkenning. Die geweerskoot klank kan besoedel wees met groot versterkte geraas of nabygeleë klanke soos spraak. Geweerskoot klanke was opgeneem deur gebruik te maak van ‘n stêr topologie struktuur van 3 mikrofone wat verbind was aan ‘n laaglaatfilter en versterker. Geweerskoot klanke van groot en medium kaliber gewere was opgeneem op verskillende afstande vanaf die mikrofoonstelsel. Die klanke was dan gebruik om profielvorme van groot en medium kaliber gewere te skep, wat dan gebruik is vir opleiding van 'n stelsel om geweerskoot klank herkenning te doen op verskillende afstande en klank omgewings. Die AKK en die OVM algoritmes was die mees akkuraatste op al die verskillende afstande. Die AKK het baie vinniger uitgevoer as die OVM algoritme, maar met meer klank profielvorme en toerusting met 'n hoër verwerkingspoed, behoort die OVM algoritme selfs meer akkuraat te wees op groter afstande. vi Die uitset van hierdie navorsingsprojek kan gebruik word om 'n anti-stroping stelsel te skep, veral vir bedreigde spesies soos olifante en renosters. vii TABLE OF CONTENTS LIST OF FIGURES .............................................................................................................. xiii LIST OF TABLES .............................................................................................................. xvii GLOSSARY ....................................................................................................................... xviii 1. 2 INTRODUCTION ........................................................................................................... 1 1.1 PROBLEM STATEMENT ..................................................................................... 3 1.2 DELIMITATIONS .................................................................................................. 3 1.3 BENEFITS OF STUDY .......................................................................................... 4 1.4 CONTRIBUTIONS OF STUDY ............................................................................ 4 1.5 FUNCTIONAL BREAKDOWN OF SYSTEM...................................................... 4 1.6 DISSERTATION LAYOUT ................................................................................... 7 LITERATURE REVIEW ................................................................................................ 8 2.1 THE MULTI-BILLION DOLLAR INDUSTRY OF POACHING ........................ 8 2.2 GUNSHOT DETECTION THEORECTICAL OVERVIEW ............................... 10 2.2.1 ACOUSTIC SENSING ...................................................................................... 11 2.2.2 OPTICAL SENSING ......................................................................................... 12 2.3 GUNSHOT DETECTION AND LOCALISATION APPLICATIONS AND STRATEGIES ....................................................................................................... 13 2.3.1 GUNSHOT DETECTION IN VIDEO AND FILM ........................................... 13 2.3.2 MUZZLE BLAST DETECTION AND LOCALISATION USING A JOINT TACTICAL RADIO SYSTEM ....................................................................... 14 2.3.3 MUZZLE BLAST AND SHOCKWAVE DETECTION USING LIGHTNING PROTOCOL..................................................................................................... 15 2.4 CONCLUSION ..................................................................................................... 16 viii 3 MATHEMATICAL MODELLING REVIEW ............................................................. 17 3.1 SYSTEM IDENTIFICATION .............................................................................. 17 3.2 ADAPTIVE FILTERS .......................................................................................... 19 3.2.1 NOISE CANCELLATION ................................................................................ 19 3.3 TIME DELAY ESTIMATION AND IMPULSE DETECTION USING GENERALISED CORRELATION ...................................................................... 20 3.3.1 TDE USING GENERALISED CORRELATION ............................................. 21 3.3.2 PULSE DETECTION USING GENERALISED CROSS CORRELATION .... 22 3.4 LEAST SQUARES ............................................................................................... 23 3.4.1 LEAST SQUARES SIDELOBE MINIMISATION .......................................... 24 3.5 REPRODUCING KERNEL HILBERT SPACES ................................................ 26 3.5.1 NON-LINEAR TEMPLATE MATCHING FRAMEWORK ............................ 27 3.5.2 TEST INPUT-OUTPUT PAIRS ........................................................................ 28 3.5.3 REPRODUCING KERNEL TYPES .................................................................. 28 3.5.4 MINIMUM NORM TEMPLATES .................................................................... 29 3.6 SUPPORT VECTOR MACHINES....................................................................... 30 3.6.1 LINEAR SUPPORT VECTOR MACHINES .................................................... 31 3.6.2 NON-LINEAR SUPPORT VECTOR MACHINES .......................................... 33 3.6.3 LEAST SQUARE SUPPORT VECTOR MACHINES ..................................... 35 3.7 4 CONCLUSION ..................................................................................................... 36 EXPERIMENTAL SETUP ........................................................................................... 37 4.1 SOUND RECORDING EQUIPMENT SETUP.................................................... 37 4.2 LABVIEW EXPERIMENTAL PREPARATION ................................................ 38 4.3 REAL ENVIRONMENT DATA GATHERING .................................................. 39 4.4 CONCLUSION ..................................................................................................... 41 ix 5 RESULTS ...................................................................................................................... 42 5.1 OVERVIEW OF DATA RECORDING AND PRE-PROCESSING PROCEDURES ..................................................................................................... 42 5.2 FREQUENCY SPECTRUM ANALYSIS ............................................................ 43 5.2.1 MAUSER POWER SPECTRUM ESTIMATE.................................................. 44 5.2.2 PISTOL POWER SPECTRUM ESTIMATE ..................................................... 46 5.2.3 POWER SPECTRUM ESTIMATE CONCLUSION ........................................ 47 5.3 GENERAL CROSS CORRELATION ................................................................. 48 5.3.1 TEMPLATE GENERATION ............................................................................ 48 5.3.2 ANALYSIS ........................................................................................................ 49 5.3.2.1 Cross Correlation of Pistol sound 500m away from array .......................... 50 5.3.2.2 Cross Correlation of Mauser sound 500m away from array........................ 51 5.3.2.3 Cross Correlation of Pistol sound 1000m away from array ........................ 53 5.3.2.4 Cross Correlation of Mauser sound 1000m away from array...................... 54 5.3.2.5 Cross Correlation of Mauser sound 1500m away from array...................... 56 5.3.2.6 Cross Correlation of Mauser sound 1700m away from array...................... 58 5.4 TRAINABLE TEMPLATE MATCHING ALGORITHMS ................................. 59 5.4.1 TEMPLATE MATCHING WITH LEAST SQUARES..................................... 59 5.4.2 RKHS USING A SECOND ORDER POLYNOMIAL KERNEL..................... 61 5.4.2.1 The Mauser Template used for RKHS ........................................................ 62 5.4.2.2 The Pistol Template used for RKHS ........................................................... 63 5.4.2.3 Mauser impulse detection using a 2nd order polynomial RKHS kernel at 500m ............................................................................................................ 64 5.4.2.4 Pistol impulse detection using a 2nd order polynomial RKHS kernel at 500m ..................................................................................................................... 65 x 5.4.2.5 Mauser impulse detection using a 2nd order polynomial RKHS kernel at 1000m .......................................................................................................... 66 5.4.2.6 Pistol impulse detection using a 2nd order polynomial RKHS kernel at 1000m .......................................................................................................... 67 5.4.2.7 Mauser impulse detection using a 2nd order polynomial RKHS kernel at 1500m .......................................................................................................... 68 5.4.2.8 Mauser impulse detection using a 2nd order polynomial RKHS kernel at 1700m .......................................................................................................... 69 5.4.3 SUPPORT VECTOR MACHINES .................................................................... 70 5.4.3.1 The Mauser Template used for SVM network ............................................ 70 5.4.3.2 The Pistol Template used for SVM ............................................................. 71 5.4.3.3 Impulse detection using a 2nd order polynomial SVM kernel and a Pistol Training set for recordings 500m away ....................................................... 72 5.4.3.4 Impulse detection using a 3rd order polynomial SVM kernel and a Mauser Training set for recordings 500m away ....................................................... 73 5.4.3.5 Impulse detection using a 2nd order polynomial SVM kernel and a Pistol Training set for recordings 1000m away ..................................................... 74 5.4.3.6 Impulse detection using a 3rd order polynomial SVM kernel and a Mauser Training set for recordings 1000m away ..................................................... 75 5.4.3.7 Impulse detection using a 3rd order polynomial SVM kernel and a Mauser Training set for recordings 1500m away ..................................................... 76 5.4.3.8 Impulse detection using a 3rd order polynomial SVM kernel and a Mauser Training set for recordings 1700m away ..................................................... 77 5.5 DETECTION ALGORITHM ACCURACY COMPARISON ............................. 78 xi 5.6 EXECUTION TIME COMPARISON BETWEEN THE DIFFERENT IMPULSE DETECTION ALGORITHMS ............................................................................. 81 5.7 6 CONCLUSION ..................................................................................................... 83 CONCLUSION ............................................................................................................. 84 6.1 THE ACCURACY OF THE ALGORITHMS ...................................................... 84 6.2 COMPLEXITY MEASURED IN ALGORITHM PROCESSING TIME ............ 84 6.3 OVERALL PERFORMANCE OF THE IMPULSE DETECTION ALGORITHMS ..................................................................................................... 85 6.4 FUTURE RESEARCH.......................................................................................... 85 6.4.1 ANTI-POACHING APPLICATION AND STRATEGY .................................. 86 6.4.1.1 Low-end gunshot detection modules ........................................................... 86 6.4.1.2 High-end gunshot detection modules .......................................................... 86 6.4.1.3 Habitat Protection Strategy .......................................................................... 87 7 BIBLIOGRAPHY ......................................................................................................... 89 APPENDIX A ....................................................................................................................... 94 A.1 LABVIEW EXPERIMENTAL PREPARATION...................................................... 94 xii LIST OF FIGURES Figure 1.1: Functional block diagram of the gunshot detector. ............................................... 5 Figure 3.1: Input X(S) changed by function H(S) to produce output Y(S)............................ 17 Figure 3.2: Block diagram of an adaptive filter as a noise canceller .................................... 20 Figure 3.3: Maximum-margin hyperplane and margins for an SVM.................................... 31 Figure 3.4: Mapping of non-linear input space to higher dimensional feature space ........... 34 Figure 4.1: Setup of recording equipment ............................................................................. 37 Figure 4.2: Top view of different positions of the gunshots fired relative to the microphone array .................................................................................................................... 40 Figure 4.3: Side view of gunshot positions showing the surface curvature of the game farm ............................................................................................................................ 41 Figure 5.1: Example of a 3 channel recording of gunshots fired 1000m away ..................... 43 Figure 5.2: Power spectrum estimate of a Mauser gunshot at 500m away from microphones ............................................................................................................................ 44 Figure 5.3: Power spectrum estimate of a Mauser gunshot at 1000m away from microphones ............................................................................................................................ 44 Figure 5.4: Power spectrum estimate of a Mauser gunshot at 1500m away from microphones ............................................................................................................................ 45 Figure 5.5: Power spectrum estimate of a Mauser gunshot at 1700m away from microphones ............................................................................................................................ 45 Figure 5.6: PSD of audio stream with no recorded gunshot ................................................. 46 Figure 5.7: Power spectrum estimate of a Pistol gunshot at 500m and 1000m away from microphones ....................................................................................................... 47 Figure 5.8: Template of Pistol gunshot ................................................................................. 48 xiii Figure 5.9: Template for Mauser Gunshot ............................................................................ 49 Figure 5.10: Cross correlation of pistol template with pistol gunshot fired 500m away ...... 50 Figure 5.11: Cross correlation of Mauser template with Mauser gunshot fired 500m away 51 Figure 5.12: Cross correlation of Pistol template with Mauser and pistol gunshots fired 500m away.......................................................................................................... 52 Figure 5.13 Cross correlation of pistol template with pistol gunshot fired 1000m away ..... 53 Figure 5.14 Cross correlation of Mauser template with Mauser gunshot fired 1000m away ............................................................................................................................ 54 Figure 5.15 Cross correlation of Pistol template with Mauser and pistol gunshots fired 1000m away........................................................................................................ 55 Figure 5.16: Cross correlation of Mauser template with Mauser gunshot fired 1500m away ............................................................................................................................ 56 Figure 5.17: Cross correlation of Pistol template with Mauser gunshot fired 1500m away . 57 Figure 5.18: Cross correlation of Mauser template with Mauser gunshot fired 1700m away ............................................................................................................................ 58 Figure 5.19: LS Mauser waveform template ......................................................................... 59 Figure 5.20: Output of Least Squares algorithm of Mauser gunshot at 500m away ............. 60 Figure 5.21: Output of Least Squares algorithm of Mauser gunshot at 1000m away ........... 60 Figure 5.22: Output of Least Squares algorithm of Mauser gunshot at 1500m away ........... 61 Figure 5.23: Mauser template used in RKHS training sequence, smoothed by interpolation ............................................................................................................................ 62 Figure 5.24: Pistol template used in RKHS training sequence, smoothed by interpolation . 63 Figure 5.25: Mauser gunshot at 500m and output using a 2nd order polynomial RKHS kernel ............................................................................................................................ 64 xiv Figure 5.26: Pistol gunshots at 500m and output using a 2nd order polynomial RKHS kernel ............................................................................................................................ 65 Figure 5.27: Mauser gunshot at 1000m and output using a 2nd order polynomial RKHS kernel .................................................................................................................. 66 Figure 5.28: Pistol gunshot at 1000m and output using a 2nd order polynomial RKHS kernel ............................................................................................................................ 67 Figure 5.29: Mauser gunshot at 1500m and output using a 2nd order polynomial RKHS kernel .................................................................................................................. 68 Figure 5.30: Mauser gunshot at 1700m and output using a 2nd order polynomial RKHS kernel .................................................................................................................. 69 Figure 5.31: Mauser template used in SVM network training sequence, smoothed by interpolation ........................................................................................................ 70 Figure 5.32: Pistol template used in SVM network training sequence ................................. 71 Figure 5.33: Pistol and Mauser gunshots at 500m and output of 2nd order polynomial SVM kernel using a pistol template ............................................................................. 72 Figure 5.34: Pistol and Mauser gunshots at 500m and output of 3rd order polynomial SVM kernel using a Mauser template .......................................................................... 73 Figure 5.35: Pistol and Mauser gunshots at 1000m and output of 2nd order polynomial SVM kernel using a pistol template ............................................................................. 74 Figure 5.36: Pistol and Mauser gunshots at 1000m and output of 3rd order polynomial SVM kernel using a Mauser template .......................................................................... 75 Figure 5.37: Mauser gunshot at 1500m and output of 3rd order polynomial SVM kernel using a Mauser template ..................................................................................... 76 Figure 5.38: Mauser gunshot at 1700m and output of 3rd order polynomial SVM kernel using a Mauser template ..................................................................................... 77 xv Figure 5.39: Comparison of execution time, milliseconds per 5000 samples, between the different impulse detection algorithms .............................................................. 82 Figure 6.1: Illustration of a habitat protection strategy ......................................................... 88 Figure A.1: Labview program that mixes incoming channels with noise in a good signal-tonoise ratio ........................................................................................................... 94 Figure A.2: Labview program that shows the correlation graphs and angle calculation for a optimal signal-to-noise ratio ............................................................................... 95 Figure A.3: Shows where the noise and gunshot signal peaks are the same, gunshot impulses start to get buried in the noise............................................................................. 96 Figure A.4: Correlation peaks start to disappear ................................................................... 97 Figure A.5: Gunshot signal peaks are buried in the noise, noise values are greater than the impulse values .................................................................................................... 98 Figure A.6: The correlation calculation becomes unstable giving wrong values for the angle ............................................................................................................................ 99 xvi LIST OF TABLES Table 2.1: Yearly statistics of rhino poaching in South Africa ............................................... 9 Table 5.1: Comparison of detection algorithms using Pistol templates for shots fired 500m away ...................................................................................................................... 78 Table 5.2: Comparison of detection algorithms using Mauser templates for shots fired 500m away ...................................................................................................................... 79 Table 5.3: Comparison of detection algorithms using Pistol templates for shots fired 1000m away ...................................................................................................................... 79 Table 5.4: Comparison of detection algorithms using Mauser templates for shots fired 1000m away .......................................................................................................... 80 Table 5.5: Comparison of detection algorithms using Mauser templates for shots fired 1500m away .......................................................................................................... 80 Table 5.6: Comparison of detection algorithms using Mauser templates for shots fired 1700m away .......................................................................................................... 81 Table 5.7: Execution time of the detection algorithms in µs/sample .................................... 83 xvii GLOSSARY A/D Analog-to-Digital ADC Analog-to-Digital Converter AGDLS Acoustic Gunshot Detection and Localisation System AoA Angle of Arrival CITES Convention on International Trade in Endangered Species D/A Digital-to-Analog dB Decibel DMA Direct Memory Access DSP Digital Signal Processor EEG Electroencephalography HMS Habitat Management System F’SATI French South African Institute of Technology FU Functional Unit GCC Generalised Cross Correlation Hz Hertz k Kilo LPF Low Pass Filter LS Least Square NI National Instruments PC Personal Computer PCI Peripheral Component Interconnect PCM Pulse Code Modulation PSD Power Spectral Density RKHS Reproducing Kernel Hilbert Space xviii SI International System of Units SSE Sum of Squared Error SVM Support Vector Machine TDE Time Delay Estimation USB Universal Serial Bus V Volt xix 1. INTRODUCTION South Africa is a country with a massive wildlife industry, both for the preservation of endangered species and commercial hunting of game. Thus a need arises for a system that can identify when and where a gun is fired. A gunshot detection and localisation system is proposed which could be incorporated into a larger system that can help protect and manage large habitats. An Acoustic Gunshot Detection and Localisation System (AGDLS) would be able to run on a Habitat Management System’s (HMS) Radio Frequency (RF) network. The HMS, which can be designed to protect large habitats, would consist of many communication nodes which log events and broadcast it to a central computer. An AGDLS would be one of these event gathering modules. Gunshot localisation is based on the principle of time delay estimation, from a minimum of three modules each containing an array of 3 microphones, stationed at different coordinates. Each module calculates the angle from where the gunshot originated relative to the position of the module. Non-gunshot sounds and noise are also present in the signal with the gunshot sound, and if the shot is too far away, the gunshot impulse can’t be accurately extracted from the signal. This research attempts to find computational efficient ways to identify and extract gunshot impulses from signals. 1 Areas of study include Generalised Cross Correlation (GCC), sidelobe minimisation utilising Least Square (LS) techniques as well as training algorithms using a Reproducing Kernel Hilbert Space (RKHS) approach. It also incorporates Support Vector Machines (SVM) to train a network to recognise gunshot impulses. By combining these individual research areas more optimal solutions are obtainable. The different algorithms (methods) are compared to one another, where some of the attributes used for comparison are accuracy (detection) in different signal-to-noise ratios and the number of instructions for the algorithm to complete (computation time). The work is both practical and theoretical. Experiments are carried out with the different methods and the best attributes of the different methods are combined to give the optimal solution. Sound recording data were gathered with a Labview* data acquisition card. The data are processed in both the Labview and Matlab† environments. * † National Instruments http://www.ni.com Mathworks http://www.mathworks.com 2 1.1 PROBLEM STATEMENT The dwindling numbers of especially Rhinos in South Africa over the past 4 years, and also other endangered species like the Elephant, has created a need for methods that will track and perhaps catch poachers and discourage them from killing these animals. Trading in poached animal body parts has become a multi-billion dollar industry. A gunshot detection system is needed that is able to protect large natural habitats with high efficiency and accuracy. Because of the remoteness of some of the areas in the large habitat, energy efficiency of the system is also a major design consideration. Acoustic gunshot localisation is based on the principle of time delay estimation or angle of arrival information. If one can determine the time delays between the gunshot impulses in the 3 audio channels of a module, then it is possible to triangulate the location of the gunshot if you have 3 or more modules to calculate the angle. But before the gunshot can be located, it must be correctly detected first. Over large distances the recorded gunshot impulse can be so small that it is not distinguishable in all the noise and reverberant clutter, thus noise cancelling or impulse detection mechanisms must also be incorporated. Thus the development of the system consists of two key factors namely gunshot detection and time delay estimation. 1.2 DELIMITATIONS This study focuses primarily on different gunshot detection algorithms and not specifically on gunshot localisation. 3 1.3 BENEFITS OF STUDY The knowledge gained from this study might be used to create effective anti-poaching strategies, especially in remote and high risk poaching areas of nature reserves, where energy efficiency and accuracy of a protection system is of major importance. The complexity (which can be related to hardware implementation) versus the accuracy of a detection algorithm is therefore an important factor to research. 1.4 CONTRIBUTIONS OF STUDY This study addresses different methods of detecting a gunshot impulse especially in an environment where poaching of large animals like elephants and rhinos can occur. Templates of large and medium calibre guns were created, and used for training a system to recognise gunshots at different distances and sound environments. Comparisons of specifically the following methods are done and discussed: generalised cross correlation, template matching using Least Squares and RKHS with different kernels, as well as Support Vector Machines. The comparison between the methods includes complexity, which can be measured in terms of the speed of execution and the accuracy of the different algorithms. 1.5 FUNCTIONAL BREAKDOWN OF SYSTEM Figure 1.1 shows a functional block diagram of a gunshot detection and localisation system parsed into smaller functional units. 4 FU 1 Gunshot FU 9 Calculate Coordinates FU 2 Acoustic Sound Receiver FU 8 Gunshot Triangulation FU 3 Noise & Low Pass Filter FU 7 Time delay estimator FU 4 Signal Amplifier FU 5 Analog to Digital Converter FU 6 Gunshot Detector System Input DSP Implementation Figure 1.1: Functional block diagram of the gunshot detector. FU 1: This unit shows the input to the system, which is the sound of a gunshot. FU 2: The acoustic sound receiver is the sensor array that should receive the incoming sounds. Sensitive microphones with a high signal-to-noise ratio should be used to implement this unit. This unit converts sound wave energy to an electrical signal for processing. 5 FU 3: The electrical signal received from FU 2 is fed to a Low Pass Filter (LPF), in the 0 kHz – 2.5 kHz range. This also serves as a noise filter, which should be implemented before amplification of the signal. Otherwise excessive noise will be amplified as well. FU 4: The small signal received from the filter should be amplified to levels that fall into the full range of what an ADC (Analog-to-Digital Converter) can quantise. FU 5: The analog signal should be converted to a digital format for processing by a Digital Signal Processor (DSP). An ADC can be used for this task. FU 6: The digital words will then be fed to a DSP for processing of the digitised audio signal. Because the system will be constantly analysing the incoming sounds, the Direct Memory Access (DMA) capabilities of a DSP can be utilised for real-time monitoring. This unit will analyse the incoming sounds to verify whether it is a gunshot sound or not. FU 7: When verified that the analysed sound is a gunshot by a gunshot detection algorithm, the time delay between the different sensors will be calculated. Since the position of all the sensors will be known, the direction from where the gunshot sound originated can be triangulated. FU 8: Taking the time delays, the speed of sound through air and positions of the sensors into consideration, the position of the gunshot sound can be triangulated. FU 9: When the relative position is calculated the coordinates in latitude and longitude can be calculated. 6 1.6 DISSERTATION LAYOUT Background on the need for a system that can protect large natural habitats and also characteristics, applications and strategies for gunshot detection and localisation are discussed in chapter 2. A mathematical modelling review is given in chapter 3. It includes some mathematical background on acoustic gunshot detection and time delay estimation algorithms. Signal detection is also discussed as well as some template matching and machine learning algorithms like Reproducing Kernel Hilbert Spaces and Support Vector Machines. Chapter 4 discusses the experimental layout and how some preliminary data was obtained. The method of how the final data was recorded in a real outdoor game farm environment is also given in chapter 4. This data was used to compare the performance of the different detection algorithms. The method of implementation and the output of the different algorithms are given in chapter 5 with graphs showing the respective outputs relative to the input recordings. Also comparisons on the performance of the different algorithms are shown in chapter 5. Chapter 6 gives a conclusion and summarises the results obtained in chapter 5. Also some recommendations on future research are given. The References chapter lists all the articles and work of authors that were used in this dissertation. 7 2 LITERATURE REVIEW Chapter 2 reviews the need for a gunshot detection system and then progresses to gunshot detection techniques found in current literature. 2.1 THE MULTI-BILLION DOLLAR INDUSTRY OF POACHING In a society that gets more gun driven every day, a solution is needed that keeps track of all the shots fired. In the USA, gunshot detection is incorporated in some of the Police‘s anti-crime strategies already, especially in densely populated areas where crimes with guns, are reaching staggering figures (Green, et al., 1999). On the other hand in Africa and especially South Africa, with its wealth of natural resources, a mechanism is needed that can help to curb the poaching of endangered species. From 1 January 2000 to 30 April 2002, Zambia’s population of elephants has decreased by 1000. For the same period of time Kenya and India reported 5953 kg of illegal ivory seized. This is despite the CITES‡ ban on ivory, which is still smuggled globally (Roberts, 2002). The U.S. Department of State estimates that black-market trade in illegal ivory and other wildlife and wildlife products generates between 10 and 20 billion dollars per year (Raffensperger, 2008). The number of seizures of more than a ton of ivory increased to 32 between 1998 and 2006, compared to the 17 seizures reported between 1989 and 1997 ‡ Convention on International Trade in Endangered Species 8 (Milliken, Burn and Sangalakula, 2007), thus indicating a rise in this lucrative business trend. But elephants are not the only animals which population’s numbers have succumbed to this multi-billion dollar industry. Poachers have killed 448 rhinos in South Africa during 2011, with 252 of them killed at the Kruger National Park (KNP). (WWF South Africa, 2012) Table 2.1 shows the number of rhinos poached per year in South Africa during the past 6 years. It also shows the alarming rate at which the poaching of rhinos has increased over the past 6 years (SavingRhinos.org, 2012). In 2012 the number of rhinos that has been killed illegally, has more than doubled since 2010 (SavingRhinos.org, 2013). Table 2.1: Yearly statistics of rhino poaching in South Africa Year Rhinos killed in R.S.A. 2007 13 2008 83 2009 122 2010 333 2011 448 2012 668 The recent upsurge in rhino poaching has been tied to an increased demand for rhino horn in Asia, and in particularly Vietnam according to the World Wildlife Fund (WWF). In the 9 Asian market rhino horn carries prestige as a luxury item, a post-partying cleanser, and has also been flaunted as a cure for cancer. But rhino horn has no proven cancer treating properties or uses as an aphrodisiac, according to traditional Chinese medicine experts (WWF South Africa, 2012). The street market value of powdered Rhino horn in Vietnam and China has driven the price as high as US $50 000 per kilogram in 2011 (Environment News Service, 2011). The price per kilogram of rhino horn has since increased to an estimated $65 000 in 2012 (The Register, 2012). The African Black Rhinoceros remains on the red list (critically endangered) of the International Union for Conservation of Nature (IUCN). By the end of 2010, there were only about 4800 left in existence, which is a 97% decline in population since 1960 (IUCN, 2011). The Rhinoceros Sondaicus Annamiticus subspecies of the Javan Rhinoceros (Vietnam rhino) was declared extinct by the end of October 2011 by the WWF. The likely cause of death of the last remaining Vietnam rhino was poaching (WWF, 2011). 2.2 GUNSHOT DETECTION THEORECTICAL OVERVIEW Gunshot detection systems in current literature and on the commercial and military market are primarily based on two sensing techniques, acoustic and optical. Gunshot detection based on the acoustic characteristic (muzzle blast or shock wave) of gunfire uses microphones (Maher, 2007), while electro-optical or optical detectors are employed to detect the muzzle flash or the bullet’s path in optical schemes (Zhang et al., 2009). 10 2.2.1 ACOUSTIC SENSING Systems that use only acoustic techniques for detection of weapon’s discharge, utilise the muzzle blast and/or the shockwave characteristic of the gunshot. The hot expanding gases of the explosion in the weapon’s chamber, create a muzzle blast that emerges from the barrel of the gun. For most fire arms, the sound level is the highest in the direction the barrel is pointing (Maher, 2007). The second source of acoustic information, the shock wave, is present when the bullet travels at a speed higher than the speed of sound. The acoustic shock wave propagates away from the bullet’s path at the speed of sound and expands as a cone behind the bullet (Maher, 2007). Another sound source that can be used to detect the presence of a gunshot according to Maher (2007) is the mechanical action of a firearm. These include the sounds of the trigger and hammer mechanism, the positioning of new ammunition by the gun's loading system and also the ejection of spent cartridges. The mechanical action sounds are generally much quieter than the muzzle blast or the projectile shock wave, thus the microphones need to be in a much closer proximity to the gun to pick up these sounds. Acoustic vibration may also be carried through the ground or other solid surfaces according to Maher (2007). The sounds of gunshots cause detectable vibratory signals propagating through the ground many tens of meters from the source. Sound propagation in rock and soil is generally at least 5 times faster than the speed of sound in air. Maher (2007) suggests that calculations can be made to correlate surface vibratory motion and the subsequent airborne sound of arrival. 11 A challenge with acoustically based systems is deconvolving the gunshot from the reflected sound and the reverberant clutter and also to distinguish between gunfire and non-gunfire sounds (Maher, 2007). Pure acoustically based detection systems will react slower than their optical counterparts because they rely on the propagation of sound waves at approximately 330 m/s (Zhang et al., 2009). Therefore the sound from a gunshot reaching a sensor 1 km from its origin will take almost 3 seconds 2.2.2 OPTICAL SENSING Systems that employ optical or electro-optical techniques for gunfire detection and localisation, detect either the muzzle flash of a bullet being fired or the heat caused by the friction of the bullet as it moves through the air, or incorporates both afore mentioned strategies . These systems necessitate a clear line of sight to the weapon being fired or the projectile while it is in motion. Muzzle flashes can be defeated by specialized Flash suppressors (Zhang et al., 2009). Optical detection systems are used successfully in military environments where response time is of critical and life threatening importance (Defense Update, 2008). Usually multiple optical sensors must be used for a 360 degree detection capability. An optimal system would incorporate both acoustic and optical sensing techniques, which would enable it to detect and calculate location of gunfire with greater precision (Pauli et al., 2004). 12 2.3 GUNSHOT DETECTION AND LOCALISATION APPLICATIONS AND STRATEGIES Gunshot detection and localisation are mostly used in military environments and also as a tool to reduce crime in populated areas. It has also found uses in video and film, where protection of sensitive groups in a community is necessary. 2.3.1 GUNSHOT DETECTION IN THE VIDEO AND FILM INDUSTRY Pikrakis, Giannakopoulos & Theodoridis (2008) showed that gunshot detection from a movie’s audio stream could be treated as a maximisation task, where the solution was obtained by means of dynamic programming and Bayesian Networks (BN). Pikrakis, Giannakopoulos & Theodoridis (2008) describes a method which seeks the sequence of segments and divide the respective class labels, by gunshots against all other audio types in the stream that maximise the product of posterior class label probabilities, given the segments’ data. By combining soft classification decisions from a set of Bayesian Network combiners, the required posterior probabilities are estimated. Pikrakis, Giannakopoulos & Theodoridis (2008) concludes that almost 80% of gunshot data was correctly detected by this method with a 20% false alarm rate if the measurement was event-based. Ten percent of gunshots were not detected implementing this method. Another approach for detecting a gunshot event in video and film by Chen, Abdallah and Wolf (2006) uses both the audio and visual aspects of a gunshot scene. By separating the sound and video at a preprocessing stage and then building a gunshot sound model based on a 4 state continuous Hidden Markov Model (HMM). The gunshot sound model is then 13 trained with different gun type sounds and non-gunshot sounds. A model for human emotion is also built drawing on audio features ie. speech patterns and video features incorporating different facial expressions. A Support Vector Machine (SVM) classifier is trained to determine the emotion of the scene. Lastly a pure visual model is built that is trained with different human activities. Chen, Abdallah and Wolf (2006) combine the three models to classify the scene in four categories namely, gunshot, normal, threatening and wounded victim. 2.3.2 MUZZLE BLAST DETECTION AND LOCALISATION USING A JOINT TACTICAL RADIO SYSTEM Gunshot detection using a JTRS (Joint Tactical Radio System) radio is proposed by Smith, Buscemi, and Xu (2010). In this strategy, each radio acts as a sensor node to determine and share muzzle blast time of arrival information in order to determine a shooter’s location. A rake-correlation filter loop is used as a detection algorithm to accurately pinpoint the arrival of the muzzle blast of a gunshot at a single microphone. The location algorithm (realised by an extended time-invariant Kalman filter) uses the position and time of arrival information, gathered from multiple sources to determine the shooter’s location. Smith, Buscemi, and Xu (2010) used a combination of a correlation filter and rake receiver to detect a gunshot. The correlation filter helped to resolve the signal from uncorrelated surrounding noise, while the rake receiver helped to eliminate much of the multipath present in the signal. 14 The Kalman filter is very adaptable according to Smith, Buscemi, and Xu (2010), and can incorporate additional measurements or sensors such as terrain information to eliminate large vertical errors. It would also be able to incorporate Angle of Arrival (AoA) information from additional gunshot systems into the network. Their results show that with only time of arrival information, gunshot location accuracy is dictated by radio positioning and orientation. Their research is based on a single microphone setup per radio, that doesn’t use fixed positions for node placement but rather uses GPS data to calculate AoA information. Thus in the absence of a fixed microphone array the system is forced to work with less information. Each sensor alone cannot calculate AoA information and can therefore not operate autonomously. The research of Smith, Buscemi, and Xu (2010) demonstrated the ability for a multiple radio system to identify the location of a gunshot and also the ability of such a system to incorporate stand alone systems, to improve the accuracy over both systems independently. It is proposed in their work that the detection algorithm could be implemented on a SRW (Soldier Radio Waveform) and the localisation algorithm be implemented on the WNW (Wideband Networking Waveform) of a JTRS. 2.3.3 MUZZLE BLAST AND SHOCKWAVE DETECTION USING LIGHTNING PROTOCOL Gunshot detection implementing Lightning Protocol is proposed by Wang (2009). It is proposed that the muzzle blast is detected by low-end (low power, low computation complexity) nodes on a wireless sensor network, which in turn wakes up hibernating 15 high-end nodes before the shock wave, generated from the supersonic bullet reaches them. High-end nodes, located at distances much further away and on both sides of the of the bullet’s trajectory, must detect the shockwave front and its propagation direction. These high-end nodes would also be able to “catch” the trailing muzzle blast arrival before it reaches them. According to Wang (2009), would it be much more energy efficient for high-end nodes that are capable of complex processing functions, to only be awake when a gunshot event occurs. Comparing the BBN Boomerang II tactical anti-sniper system which has a 25W power consumption when fully turned on, with a MICA§ mote’s power consumption of 27 mW also when fully turned on. When only RF listening is active on the Boomerang system it will only consume 12 mW (microphone array and localisation modules switched off). Thus the low-end wireless sensor network is used to detect and localise the muzzle blast and then wake up the relevant high-end nodes. 2.4 CONCLUSION This chapter gave a review on the increasing illegal trade of endangered animal parts and why this might deem a need for a system that can protect large habitats. The chapter also discussed some of the current methods, applications and strategies for the application of gunshot detection and localisation. Perhaps by combining the different aforementioned strategies an optimal solution to protect nature reserves might be obtained. § www.polastre.com/papers/hotchips-2004-mote-table.pdf (Accessed 25 July 2012) 16 3 MATHEMATICAL MODELLING REVIEW This chapter will give a review of the different mathematical methods that are found in literature that can be used in the implementation of gunshot detection and localisation. A brief review of systems identification, adaptive filters and time delay estimation are given. Then the different mathematical methods used for gunshot detection that was used in the algorithms, to obtain the results in chapter 5, will be discussed. 3.1 SYSTEM IDENTIFICATION Systems can be described as some input changed or altered to produce a desired output (Ifeachor & Jervis, 2002). For electronic systems the same principle applies, where an input can be altered by a function to produce a desired output as shown in Figure 3.1: X(S) H(S) Y(S) Figure 3.1: Input X(S) changed by function H(S) to produce output Y(S) H(S) can be defined as the transfer function of the system that changes the input X(S), to the desired output Y(S). Convolution amongst other things describes how the input interacts with a system to produce an output (Ifeachor & Jervis, 2002). Thus equation (3.1) describes the output in terms of multiplication of X(S) with H(S) in the Laplace domain. Y (S ) = X (S ) × H (S ) 17 (3.1) Equation (3.1) gives the relation between the input to system, x(t) and its output y(t) (in the Laplace domain). The term system identification refers to the determination of h(t) (the impulse response) when it’s unknown (Ifeachor & Jervis, 2002). If the impulse response and the output of the system are known, then the procedure to obtain the input is known as deconvolution. For system identification blind deconvolution can be used. Basic blind deconvolution is the process of determining the input from the output signal when the impulse response of the system is unknown, thus making it “blind” (Ifeachor & Jervis, 2002). As shown in Figure 3.2, the required unknown source signal x(t) is passed through a system of impulse response h(t) and thus the measurable output would be the convolution of x(t) and h(t). When little knowledge about an impulse response and the temporal characteristics and statistic of a source signal is known, blind digital deconvolution can be used to recover the source signal distorted by a linear system from observations of the system’s response only. In vector notation, the linear input-output system’s model can be seen in equation (3.2) x(t) = h T s(t) + N(t) . (3.2) In equation (3.2), s(t) is the input sample vector, i.e. s(t) = [s(t); s(t − 1); s(t − 2);…; s(t − k +1)]T where k is the number of entries into h. N(t) is a zero-mean additive noise that originates from many simultaneous sources or effects; it can be measurement errors, additive external disturbances, measurement errors, sampling and round off errors. 18 3.2 ADAPTIVE FILTERS Signal detection plays an integral part in digital systems these days. It is the process of recovering a wanted or specific signal from an array of signals, like noise or other unwanted signals. Methods of signal detection include adaptive filter techniques and various pattern matching methods used in conjunction with cancellation functions. Adaptive filters can be used in a wide range of applications, from echo cancellation in cell phones to filtering of ocular artefacts from the human EEG in the biomedical engineering field (Ifeachor & Jervis, 2002). With increasing processing power of digital signal processing chips, more applications are using adaptive filters. An adaptive filter is in essence a digital filter with self adjusting coefficients. It adapts, and changes automatically to the changes in its input signal. It has the capability to learn from an environment and then to adjust its output accordingly, to converge to a final solution (Ifeachor & Jervis, 2002). 3.2.1 NOISE CANCELLATION An adaptive filter consists of two parts, the digital filter with the adjustable filter weights, and also an adaptive algorithm which is to be used to adjust the filter weights. Both yk and xk (from Figure 3.2) are applied simultaneously to the adaptive filter (Ifeachor & Jervis, 2002). The signal yk is the recorded signal containing both the gunshot impulse and the noise. 19 yk = sk + nk (signal+ noise) + xk (noise) Digital Filter ñk (noise estimate) ∑ êk = ŝk (signal estimate) Adaptive algorithm Figure 3.2: Block diagram of an adaptive filter as a noise canceller The signal xk is a measure of the contaminating signal, in this case the noise ñk, which is correlated in some way with nk.. The signal xk is fed into the digital filter to produce an estimate of the noise which can be subtracted from the signal corrupted with noise As long as the input noise xk, remains correlated to the unwanted noise accompanying the desired signal yk, the adaptive filter adjusts its coefficients to reduce the value of the difference between nk and ñk, thus removing the noise and resulting in a cleaner signal in êk. In this application, the error signal converges to the input data signal, rather than converging to zero (Matlab, 2002). 3.3 TIME DELAY ESTIMATION AND IMPULSE DETECTION USING GENERALISED CORRELATION Time Delay Estimation (TDE) is a research field that has been in existence for a long time. It is usually one of the principles that are used in radar, to detect objects at large 20 distances away from the radar. In this configuration, time delay estimation is used to determine how far the transmitted pulse is removed from the received pulse (from the reflection of the object) in time. Thus knowing the properties of the signal used and the medium it is sent through, one can determine the distance that the object is from the radar array. 3.3.1 TDE USING GENERALISED CORRELATION The same principle can be applied to acoustic shot detection and localisation. To determine at what angle the gun was fired, relative to the position of the microphone array, the time delay or time difference between the gunshot impulses in the microphone channels must be calculated. Conventional time delay estimation uses generalised cross and auto correlation (Hertz, 1986). Cross correlation in signal processing defines the degree of interdependence or similarity between two signals (Ifeachor & Jervis, 2002). Whereas in this application, the auto correlation maxima peak of the signal defines where the signal is in time, relative to the cross correlated maxima peak of the signal with which it is correlated. The maxima of the correlation function indicates where the two signals correlate the most. If you have three signals, i.e. x1[k], x2[k] and x3[k], then the cross correlation of the first two signals can be obtained by equation (3.3) (Ifeachor & Jervis, 2002), r12 = 1 N N −1 ∑ x [k ]x [k + τ ] , 1 2 (3.3) k =0 where N is the number of samples in the current data window. The cross correlation for the last two signals and the first and last of the signals can be obtained the same way, by 21 replacing x1[k] and x2[k] with x2[k] and x3[k] for the second correlation, and with x1[k] and x3[k] respectively for the third correlation in equation (3.3). The next step will be to do an autocorrelation on all three signals, where the auto correlation has the following equation, taking x1[k] as an example: r11 = 1 N N −1 ∑ x [k ]x [k + τ ] . 1 1 (3.4) k =0 The same must be calculated for x2[k] and x3[k] with equation (3.4) (Ifeachor & Jervis, 2002). Thus when one has all six maxima from the correlations, the delays between the three signals can be obtained by subtracting the sample indices of the maxima. Because of the digital domain, the delay will be measured in samples and thus knowing the sample rate, the delay in milliseconds can be obtained. Let sd be the sample delay, and let k11max and k12max be the time instants where r11[k] and r12[k] respectively have the maxima, then: s d = k11 max − k12 max . (3.5) The time delay can then be calculated as follows: Td = sd , Fs (3.6) where Fs is the sampling frequency of the signal. 3.3.2 PULSE DETECTION USING GENERALISED CROSS CORRELATION Generalised Cross Correlation (GCC) can also be used to detect a specific or wanted pulse from a signal. The first step would be to create a template of the pulse that you want to 22 detect from the signal (Ifeachor & Jervis, 2002). Then using equation (3.3) where x1[k] is the template pulse and x2[k] is the signal, the maxima of r12[k] will be where the template pulse and the signal correlates the most. The position of the maxima of r12[k] would then be where the wanted pulse is located in the signal. 3.4 LEAST SQUARES The term Least Squares (LS) describes an approach to solving over-determined or inexactly specified systems of equations in an approximate sense. Instead of solving the equations exactly, we seek only to minimise the sum of the squares of the residuals. A residual is the difference between an observed value and the fitted value provided by a model (Moler, 2008). In radar environments, incorporating pulse compression, when the pulse is received back, a matched filter is employed to maximise the signal-to-noise ratio. The waveforms which are transmitted are chosen to have an auto-correlation function with a narrow peak at zero time shift and values as low as possible at other at all other times. These low values are called sidelobes. These sidelobes have the undesirable effect of masking smaller objects if it is in the same proximity as larger objects (Cilliers and Smit, 2007). A technique of sidelobe mimimisation will be extended to gunshot detection where sound from the microphones will be matched with pre-existing gunshot sound templates using a least squares algorithm proposed by Cilliers and Smit (2007) to detect a valid gunshot. 23 3.4.1 LEAST SQUARES SIDELOBE MINIMISATION The output b of the detection filter, which is the convolution sequence of the gunshot sound from the microphones and the detection filter can be written in matrix form as b = AFx , (3.7) where b = [b1 b2 ... b2 N −1 ] T , (3.8) x = [x1 x 2 ... x N ] T , (3.9) and a1 0 AF = M 0 0 K a2 aN K 0 aN 0 0 K K O O O O O O L L a1 0 a2 a1 L a2 aN L a2 a1 0 0 In equation (3.10), A F is the full convolution matrix, and T T 0 0 M . 0 a N (3.10) denotes the transpose of the vector or matrix. The formulation in equations (3.7) to (3.10) leads to the following expression for the sum-of-squares of the convolution sequence (Cilliers and Smit, 2007) b H b = || b1 || 2 + || b2 || 2 + L + || b2 N −1 || 2 . (3.11) The complex conjugate transpose is denoted by H. The gunshot pulse’s sidelobe measure cost function, according to Cilliers and Smit (2007), can now be formulated by defining a new matrix, A , similar to AF , except that the rows in AF which produce the gunshot peak are removed. The sidelobe measure cost function to be minimised can therefore be written as 24 f (x ) = b H b = b H A H Ax = x H Cx (3.12) where C = AH A . (3.13) Using the method of Lagrange multipliers, a solution for x can be found that will minimise the sidelobe measure cost function while satisfying the constraint that a gunshot pulse peak with amplitude bpeak must be produced (Cilliers and Smit, 2007). This constraint can be written as ax = b peak , (3.14) a = [a N a N −1 ... a1 ] . (3.15) g (x ) = ax − b peak (3.16) where This leads to the constraint function No symmetry constraints are placed on the filter response and also no constraint is placed on the samples adjacent to the peak sample. The samples adjacent to the peak allow the optimisation processes to force energy from the sidelobes into these two samples. According to Cilliers and Smit (2007), this allows the optimisation algorithm the freedom to widen the gunshot pulse peak to achieve a lower and flatter sidelobe response. A system of simultaneous equations arises from the complex Lagrangian d d ( f (x)) + (Re{λg (x)}) = 0 , dx dx 25 (3.17) and the constraint given in equation (3.16). This extended system of simultaneous equations can now be solved to obtain the value of x that minimises the sidelobe measure. The closed form solution for x is then given by equation (3.18): x= b peak C -1a H aC -1a H . (3.18) This solution for x produces a mismatched receive filter for the gunshot template pulse {an} that minimises the gunshot pulse sidelobes in the least-squares sense (Cilliers and Smit, 2007). 3.5 REPRODUCING KERNEL HILBERT SPACES In recent years kernel-based algorithms have become the state-of-the-art methods for many machine learning problems. The common feature of these methods is that they are based on an optimisation problem over a Reproducing Kernel Hilbert Space (RKHS) (Steinwart, Hush and Scovel, 2006). Let X be a non-empty set. Then a function k : X × X → K is called a kernel on X if there exists a K-Hilbert space H and a map Φ : X → H such that for all x, x ′ ∈ X we have k ( x, x ′) = Φ(x ′), Φ(x) . (3.19) Φ is a feature map and H is a feature space of K. Now let H be a Hilbert function space over X, thus a Hilbert space which consists of a function mapping from X into K. The 26 space H is called an RKHS over X if for all x ∈ X , the Dirac functional δ x : H → K defined by δ x ( f ) := f ( x), f ∈ H is continuous (Steinwart, Hush & Scovel, 2006). Also function k : X × X is called a reproducing kernel of H if we have k (., x) ∈ H for all x ∈ X and the reproducing property (where denotes the inner product), f ( x) = f , k (., x) , (3.20) holds for all f ∈ H and all x ∈ X . 3.5.1 NON-LINEAR TEMPLATE MATCHING FRAMEWORK Van Wyk, van Wyk, and Noel (2004) proposes that if an input-ouput signal containing a ~ gunshot F , that has to be identified belongs to an RKHS Hn and provided that we are given a set of test input-output pairs {(x i ∈ ℜ N , y i )}im=1 (3.21) where xi, i = 1,…,m, are linearly independent elements of ℜ N , the problem has a unique minimum norm solution expressed by m ~ F ( x) = ∑ C i K ( x i , x) , i =1 27 (3.22) where K (x i ,⋅) is a reproducing kernel of Hilbert space Hn. The coefficients Ci in equation (3.22) are given by equation (3.23) C = G −1 y (3.23) where C = (C1 , ... , C m ) T , y = ( y1 , ... , y m ) T (3.24) and the Gram matrix G, is given by G = (Gi j ) (3.25) where Gi j = K (x i , x j ), 3.5.2 i, j = 1,..., m. (3.26) TEST INPUT-OUTPUT PAIRS For the Desirable Gunshot Templates (DGT) (valid gunshot sounds) the yi in equation (3.21) are usually chosen equal to some positive value, for instance γ . The rest of the yi values for the Undesirable Gunshot Templates (UGT) (not gunshot sounds) would be set to α , where α would be normally 0 (Van Wyk, van Wyk, and Noel, 2004). 3.5.3 REPRODUCING KERNEL TYPES According to Van Wyk, van Wyk, and Noel (2004) kernel types could include the following kernels; the linear kernel, K ( x, z ) = x T z , 28 (3.27) the polynomial kernel, K (x, z ) = (1 + x T z ) d , 3.5.4 d ≥ 1. (3.28) MINIMUM NORM TEMPLATES A Minimum Norm Template (MNT) can be inferred once the interpolation coefficients are obtained. According to Van Wyk, van Wyk, and Noel (2004), if a linear kernel is used the MNT has the form m ~ x = ∑ Ci x i , (3.29) i =1 and that K (~ x ,⋅) will satisfy γ for i = 1,..., k K (~ x , x) = . α for i > k (3.30) For a second order polynomial kernel it can be shown that m ~ x = ∑ Ci ~ xi , (3.31) i =1 where ~ x i = [[1 x Ti ] ⊗ [1 x Ti ]]T and ⊗ denotes the Kronecker Tensor Product and it is also similar to the linear kernel, γ for i = 1,..., k K (~ xT , ~ xi ) = . α for i > k 29 (3.32) Thus for the matching process the MNT should be first calculated offline using a gunshot training template. The interpolation coefficients to construct the MNT are obtained by inverting the Gram matrix. Once the RKHS network is trained based on the training data, the template ~ x in equation (3.31) is used to calculate the ith output of the RKHS network as ~ xT ~ x i (Van Wyk, van Wyk, and Noel, 2004). 3.6 SUPPORT VECTOR MACHINES The Support Vector Machine (SVM) is a powerful methodology for solving problems in nonlinear classification, density estimation and function estimation which has also led to many other developments in kernel based learning methods. SVMs have been introduced within the context of structural risk minimisation and statistical learning theory. The SVM methodology solves convex optimisation problems, usually with quadratic programming. Least Squares Support Vector Machines (LS-SVMs) are reformulations to standard SVMs which lead to solving linear Karush-Kuhn-Tucker (KKT) systems. LS-SVMs are closely related to Gaussian processes and regularisation networks but also accentuate and exploit primal-dual interpretations (De Brabanter et al., 2011). Originally first developed for binary classification problems, the key concept of an SVM is the use of hyperplanes to define decision boundaries, which separates the data points of different classes. SVMs can be used for linear classification tasks, and also for more complex nonlinear classification problems. The concept behind SVMs is to map the original data points from the input space to a higher dimensional or infinite dimensional feature space, in such a way that the classification problem becomes simpler in the feature space (Lutsa, et al., 2010). 30 3.6.1 LINEAR SUPPORT VECTOR MACHINES Figure 3.3 shows a maximum-margin hyperplane (H3) and margins (two dotted lines, H1 and H2 parallel to H3) for an SVM trained with samples from two classes (Burges, 1998). This example is a linearly separable case. Samples on the margin are called the support vectors. Margin x2 w −b || w || H2 H3 H1 x1 Figure 3.3: Maximum-margin hyperplane and margins for an SVM The support vectors are circled in Figure 3.3. Given a set of training data points of the form, {x i , y i },1 ≤ i ≤ l , y i ∈ {−1,1}, x i ∈ ℜ n , 31 (3.33) the yi (where yi is either 1 or −1) indicates the class to which x i belongs. The maximummargin hyperplane divides the points having yi = 1 from those having yi = − 1 (Burges, 1998). The points x which lie on the hyperplane satisfy w ⋅x + b = 0. The vector w is a normal vector perpendicular to the hyperplane, where (3.34) |b| determines || w || the offset of the hyperplane from the origin along w (Burges, 1998). We want to choose w and b to maximise the distance between the parallel hyperplanes, while still separating the data. These hyperplanes can be described by the equations w ⋅ x i + b ≥ 1 for y i = 1 , (3.35) w ⋅ x i + b ≤ −1 for yi = −1 . (3.36) These can then be combined into one set of inequalities yi (w ⋅ x i + b) − 1 ≥ 0 ∀i . (3.37) If the training data is linearly separable, like in the previously mentioned case, then we can find the pair of hyperplanes which gives the maximum margin by minimising || w || 2 subject to constraints of equation (3.37) (Burges, 1998). The distance between the hyperplanes is then 2 . || w || Thus the primal form becomes, 32 (3.38) minimise (in w, b) , 1 || w || 2 subject to yi (w ⋅ x i + b) − 1 ≥ 0 ∀i . 2 (3.39) The classification rule can be written in its unconstrained dual form which shows that the maximum margin hyperplane is only a function of the support vectors. Thus, we introduce positive Lagrange multipliers, α i , 1 ≤ i ≤ n , one for each of the inequality constraints in equation (3.37) (Burges, 1998). The dual form of the SVM can be shown to be the following optimisation problem: n maximise (in α i ), ∑α i =1 i − 1 ∑α iα j yi y j xi ⋅ x j 2 i, j (3.40) n subject to, α ≥ 0 , ∑ α i y i = 0 . (3.41) i =1 3.6.2 NON-LINEAR SUPPORT VECTOR MACHINES SVMs can be extended to non-linear cases using kernel spaces, by replacing the dot product by a non-linear kernel function. This allows the algorithm to fit the maximummargin hyperplane in the transformed feature space. Figure 3.4 illustrates the mapping from input space to higher dimensional feature space (Lutsa, et al., 2010). 33 Figure 3.4: Mapping of non-linear input space to higher dimensional feature space According to Lutsa, et al. (2010) non-linear SVM classifiers take the from # SV f ( x) = sign ∑ α i y i K (x, x i ) + b, i =1 (3.42) where #SV represents the number of support vectors and K(·, ·) is the kernel function. According to Burges (1998) and Lutsa, et al. (2010) some kernels functions include linear: K (x, z ) = x T z , (3.43) polynomial (of degree d): K (x, z ) = (1 + x T z ) d , d ≥1 , 2 and a radial basis function: K (x, z ) = exp(−γ x − z ) for γ ≥ 0 . 34 (3.44) (3.45) 3.6.3 LEAST SQUARE SUPPORT VECTOR MACHINES Least Squares SVMs simplify the formulation by replacing the inequality constraint in SVMs with an equality constraint. Suykens and Vandewalle (1999) proposed to modify the SVM methodology by introducing a least squares loss function and equality instead of inequality constraints. The LS-SVM solution is obtained from a set of linear equations, rather than solving a quadratic programming problem. The LS-SVM methodology significantly reduces the computational effort and complexity. The LS-SVM classifier optimizes the following problem (Lutsa, et al., 2010) minimise (in w,e, b) , 1 T 1 w w + γ ∑ e i2 , 2 2 i, j (3.46) subject to y i ( w T ϕ ( x i ) + b) = 1 − e i i = 1,..., N , (3.47) where e = [e1 e2 ... e N ] T is a vector of error variables to tolerate misclassifications, and φ(·) : ℜ n → ℜ nh is a mapping from the input space into a high-dimensional feature space of dimension nh. The vector w is of the same dimension as φ and γ is a positive regularisation constant and b is a bias term. The primal problem is expressed in terms of the feature map and the dual problem in terms of the kernel function. The resulting classifier in the dual space is similar to the standard SVM classifier according to Lutsa, et al. (2010) and is given by N f ( x) = sign ∑ α i y i K (x, x i ) + b, i =1 35 (3.48) where K is the kernel matrix with K(x,xi) = φ(x)Tφ(xi) and αi is the Lagrange multipliers. The errors of the corresponding training data points are proportional to the support values αi. This implies usually that every training data point is a support vector and no sparseness property remains in the LS-SVM formulation. According to Lutsa, et al., (2010) high support values indicate a high contribution of the training data point on the decision boundary. 3.7 CONCLUSION Chapter 3 discussed the mathematical modelling found in literature that can be used for the implementation of a gunshot detection and localisation system. It also gave a review on the different detection algorithms (GCC, LS, RKHS and SVM) that was used in this project. 36 4 EXPERIMENTAL SETUP Two sessions of gunshot sound recordings were undertaken. The first session was a general gunshot sound recording at Swartkops military shooting range in Pretoria to obtain preliminary gunshot data. The second session of gun sound recordings (real environmental data) was obtained on a game farm up in the northern region of South Africa near Mussina. 4.1 SOUND RECORDING EQUIPMENT SETUP The 3 microphones were placed in a star topology at a 120 degree angle from one another. The cables connected from the microphones to the amplifier were 2 meters in length. Figure 4.1 shows the setup of the recording equipment used in both of the recording sessions. N Mic1 Gunshot Wave front 120 ° 120 ° Amp 120 ° Mic3 Mic2 ADC + PC Figure 4.1: Setup of recording equipment 37 The amplified signals were recorded with PCI A/D cards form National Instruments (NI) at different sampling frequencies, varying from 2 kHz to 40 kHz. The predominant sampling frequencies were 10 kHz and 20 kHz. 4.2 LABVIEW EXPERIMENTAL PREPARATION Figures A.1 to A.6 in Appendix A show screenshots of a Labview program, that implements a general cross correlation method for signal detection and direction finding. The data that is used was obtained with the first gunshot sound recording session (preliminary data). The program mixes uniform white noise with a recorded signal before it reaches the processing stage which calculates the angle of the gunshot. The additional noise was added to experiment with the GCC (Generalised Cross Correlation) algorithm’s detection accuracy in low signal-to-noise ratio conditions. Figure A.1 shows input signals with a 1.45V peak value mixed with a 0.10 V peak-topeak generated noise signal. As seen in Figure A.2 (from the correlation graphs) the generalised correlation calculations are performed which gives the time delay estimate and calculate the angles from the gunshot impulse. The maxima of the correlation graphs are clearly visible. Figure A.3 shows the boundary where the noise and the peak values of the signal are the same and the signal starts to get buried in the noise. The peak values for the signal is 0.10V and the noise is 0.10V peak-to- peak. 38 As can be seen in Figure A.4, the calculated angles stay the same as in the previous example, but the correlation graphs starts to falter and the maxima are not as clearly visible. Figure A.5 shows the gunshot impulses buried in the noise. The maximum value of the impulses is 0.04V and that of the noise is 0.10 V Because the signal is now buried in the noise, it is visible from the correlation graphs (Figure A.6) that the maxima from the correlation calculations disappear and the angle values become erroneous. 4.3 REAL ENVIRONMENT DATA GATHERING This section describes how the gunshot data was gathered on a game farm close to Mussina in South Africa. A game farm was chosen because the implementation of this project will be executed in an environment similar to the game farm. The same setup as described in Figure 4.1 was used for the recording of the gunshot data. Gunshots were fired and recorded at different distances from the microphone array. The distances between the gunfire and the microphone array were measured using a GPS (Global Positioning System) navigation device from Garmin**. ** http://www.garmin.com [accessed 1 November 2008] 39 D 1700m C 1500m B 1000m A 500m Microphone Array Position of Gunfire Figure 4.2: Top view of different positions of the gunshots fired relative to the microphone array Figure 4.2 shows the different positions of the gunshots fired relative to the microphone array, where position A is 500m and position D is 1700m away from the microphone array. Gunshots from a Mauser rifle (large calibre) and a Pistol (medium calibre) were mainly used in the recordings. 40 A B D C Game farm surface Microphone Array A Gunshots at 500m B Gunshots at 1000m C Gunshots at 1500m D Gunshots at 1700m Figure 4.3: Side view of gunshot positions showing the surface curvature of the game farm Figure 4.3 shows the side view of the gunshot positions on the game farm in Mussina. Position C of the gunshots which are 1500m away from the microphone array, is in a surface dip. Thus the direct path of the gunshot sound to the microphone array is obscured. 4.4 CONCLUSION Chapter 4 gave an overview on the sound recording equipment setup used for the recordings and also on the experimental preparation that was done in Labview. Then the chapter discussed how the data for this project was recorded, at different distances with 2 types of guns, on a game farm close to Mussina in South Africa. 41 5 RESULTS This chapter will discuss the results obtained from the various signal detection algorithms. The chapter starts out by giving a brief description on how the data that was used in this chapter was obtained. Then power spectrum estimates of the Mauser and Pistol gunshots recorded at different distances are shown. The chapter then continues by showing and discussing the results obtained from the GCC method, followed by the trainable template matching section. It is then followed by a brief tabled summary, comparing the accuracy of the 4 algorithms (GCC, LS, RKHS and SVM). Lastly the execution time of each algorithm is estimated and compared in a table. 5.1 OVERVIEW OF DATA RECORDING AND PRE-PROCESSING PROCEDURES The data used in this chapter was obtained by the following method: • Gunshots at various distances were recorded with microphones connected to a 3 channel amplifier through a LPF. • The amplified signals were then digitised with the NI PCI cards, primarily at sampling rates of 10 kHz and 20 kHz. • The samples were saved to the hard drive of a PC in PCM format. • Characteristics of each recorded gunshot event were saved to a spreadsheet file. • The recordings were imported into Matlab and saved as a 3 dimensional array (1 dimension for each channel) in the Matlab workspace. • The different recordings were resampled to 5 kHz to increase the speed of execution of the different detection algorithms. Most of the energy of a gunshot 42 that is detectable at large distances away from the microphones resides below 2 kHz (the reader is referred to sections 5.2.1 and 5.2.2) • All the recordings (test signals) and templates (used for training) were all normalised to a maximum value of 1. Thus there are no explicit SI units stated on the y-axis of the graphs of sections 5.3 to 5.4. The y-axis units are therefore in perunit values. Figure 5.1 shows an example of a 3 channel recording (from the game farm in Mussina) of a gunshot fired 1000m away from the recording equipment, imported and plotted in Matlab. This recording was at a sampling rate of 20 kHz. Figure 5.1: Example of a 3 channel recording of gunshots fired 1000m away 5.2 FREQUENCY SPECTRUM ANALYSIS The following two sections show the frequency response of the microphones and lowpass filter used for the gunshot recordings. 43 5.2.1 MAUSER POWER SPECTRUM ESTIMATE Figure 5.2 to Figure 5.5 illustrate the frequency spectra of the Mauser gunshots. The further away from the recording equipment the guns are fired, the less power there is in the higher frequencies. There is a estimated 5dB drop per 1000 Hz for every 500m the Mauser is fired further away. Figure 5.2: Power spectrum estimate of a Mauser gunshot at 500m away from microphones Figure 5.3: Power spectrum estimate of a Mauser gunshot at 1000m away from microphones 44 Figure 5.4: Power spectrum estimate of a Mauser gunshot at 1500m away from microphones Figure 5.5: Power spectrum estimate of a Mauser gunshot at 1700m away from microphones Looking at Figures 5.2 to 5.5 of the power spectrum density (PSD) estimate, there is an increase in power in the 7 kHz to 8 kHz band. 45 Figure 5.6 shows a power spectrum estimate of an audio stream without any gunshot impulse in the signal. Figure 5.6: PSD of audio stream with no recorded gunshot Thus the increase in power in the 7 kHz to 8 kHz band is not from the gunshot impulse, but might rather be a characteristic of the microphone used. 5.2.2 PISTOL POWER SPECTRUM ESTIMATE Figure 5.7 shows the frequency spectra of the Pistol gunshots. The further away from the recording equipment the gun is fired, the less power there is in the higher frequencies. In the case of the Pistol PSD, it is an estimated 8 dB to 10 dB drop per 1000 Hz for every 500m the Pistol is fired further away. 46 Figure 5.7: Power spectrum estimate of a Pistol gunshot at 500m and 1000m away from microphones 5.2.3 POWER SPECTRUM ESTIMATE CONCLUSION Following from the analysis of the power spectrum estimate, can it be concluded that all of the power of the recorded signals resides in the lower part of the spectrum between 0 kHz and 3 kHz. This result is expected since the low pass filter that was used was designed to only let through frequencies up to 2.5 kHz. Thus the lowest sampling frequency that can be used is 5 kHz without losing any information in the signals or recordings. This is useful to make complex impulse detection algorithms faster, because now it can use fewer samples to calculate a result. 47 5.3 GENERAL CROSS CORRELATION The results from the GCC algorithm will be discussed in the following section. 5.3.1 TEMPLATE GENERATION Pre-constructed templates of gunshots are needed to identify gunshot impulses in the audio stream of the recording microphones. The gunshot data used in the following sections were recorded in the second recording session on the game farm. The preconstructed templates in Figure 5.8 and Figure 5.9 were constructed by taking recorded gunshot waveforms, at the same distances from the fired gun, then using visual and cross correlating overlap-and-averaging methods. The template constructed for the pistol consists of samples taken from pistol gunshots fired 1000m away from the microphone array. Lastly the templates were normalised to a maximum value of 1. Figure 5.8 shows the template created for the Pistol gunshot. Figure 5.8: Template of Pistol gunshot 48 Figure 5.9 shows the template created for the Mauser gunshot. The template constructed for the Mauser consists of samples taken from Mauser gunshots fired 1500m away from the microphone array. Figure 5.9: Template for Mauser Gunshot 5.3.2 ANALYSIS This section shows the output of the cross correlation algorithm at various distances. The templates shown in Figure 5.8 and Figure 5.9 are used to cross correlate the recorded signals with. All the sound recordings used for the cross correlation algorithm were normalised to a maximum value of 1 (by dividing with the maximum) before applying the algorithm. 49 5.3.2.1 Cross Correlation of Pistol sound 500m away from array Figure 5.10 (a) shows a recording of two pistol gunshots and speech waveforms. Figure 5.10 (b) shows the output of the cross correlation algorithm. This output was obtained by cross correlating the recording with the pistol template shown in Figure 5.8, and then squaring the outcome. As can be seen from Figure 5.10 (b), two maximum peaks are obtained in the output as shown at the arrow points in Figure 5.10. These peaks in the output are in the same position as the pistol gunshots, thus positively identifying the gunshots and their positions in the recording. 2 Pistol Gunshots (a) GCC maxima (b) Figure 5.10: Cross correlation of pistol template with pistol gunshot fired 500m away 50 5.3.2.2 Cross Correlation of Mauser sound 500m away from array A recording of a Mauser gunshot mixed with speech waveforms are shown in Figure 5.11 (a). Figure 5.11 (b) shows the output of the cross correlation algorithm. This output was obtained by cross correlating the recording with the Mauser template shown in Figure 5.9, and then squaring the outcome. As can be seen in Figure 5.11 (b), a maximum peak is obtained in the output. This peak in the output is at the same position as the Mauser gunshot, thus positively identifying the gunshot and its position in the recording. Mauser gunshot (a) (b) Figure 5.11: Cross correlation of Mauser template with Mauser gunshot fired 500m away 51 Figure 5.12 (a) shows a recording of a Mauser and Pistol gunshots with speech waveforms. The output of the cross correlation algorithm can be seen in Figure 5.12 (b). The output was generated by cross correlating the recording with the pistol template shown in Figure 5.8, and then squaring the outcome. Figure 5.12 (b) reveals maximum peaks in the output at the same positions as the gunshots’ waveform positions. This identifies the gunshots and their positions in the recording. This means that the Pistol template from Figure 5.8 extracts both the Mauser and pistol waveforms from the recorded signal at a distance of 500m. 2 Pistol Gunshots Mauser Gunshot (a) (b) Figure 5.12: Cross correlation of Pistol template with Mauser and pistol gunshots fired 500m away 52 5.3.2.3 Cross Correlation of Pistol sound 1000m away from array A pistol waveform from a gunshot fired 1000m away from the microphone array and speech sound waveforms (recorded close to the microphones) are shown in Figure 5.13 (a). The output of the cross correlation algorithm is shown Figure 5.13 (b). The output was produced by cross correlating the recording with the pistol template shown in Figure 5.8, and then squaring the outcome. Figure 5.13 (b) shows that the speech waveform is sufficiently suppressed relative to the enhanced peak of the gunshot waveform in the output. This peak in the output is in the same position as the pistol gunshot, thus identifying the gunshot and its position in the recording. Pistol Gunshot (a) (b) Figure 5.13 Cross correlation of pistol template with pistol gunshot fired 1000m away 53 5.3.2.4 Cross Correlation of Mauser sound 1000m away from array A recording of a Mauser gunshot fired 1000m away from the microphone array can be seen in Figure 5.14 (a). Recorded speech waveforms are also seen in Figure 5.14 (a). Figure 5.14 (b) shows the output of the cross correlation algorithm. This output was obtained by cross correlating the recording with the Mauser template shown in Figure 5.9, and then squaring the outcome. The output reveals a maximum peak as shown in Figure 5.14 (b). This peak in the output is in the same position as the Mauser gunshot, thus positively identifying the gunshot and its position in the recording. Mauser Gunshot (a) (b) Figure 5.14 Cross correlation of Mauser template with Mauser gunshot fired 1000m away 54 A recording of a Mauser gunshot (at 1000m) with speech waveforms can be seen in the Figure 5.15 (a). The output shown in Figure 5.15 (b) was obtained by cross correlating the recording with the pistol template shown in Figure 5.8, and then squaring the outcome. As can be seen from Figure 5.15 (b), a maximum peak is obtained in the output. This peak in the output is at the same position as the Mauser gunshot, thus positively identifying the gunshot and its position in the recording. It shows that the Pistol template from Figure 5.8 extracts both the Mauser and pistol waveforms from the recorded signal at a distance of 1000m. Mauser Gunshot (a) (b) Figure 5.15 Cross correlation of Pistol template with Mauser and pistol gunshots fired 1000m away 55 5.3.2.5 Cross Correlation of Mauser sound 1500m away from array Figure 5.16 (a) shows a waveform recording of a Mauser gunshot fired 1500m away from the microphone array, with speech sound and car engine sound waveforms recorded near to the microphones. Figure 5.16 (a) shows that the speech and noise is almost indistinguishable from the Mauser gunshot waveform. Figure 5.16 (b) shows the output of the cross correlation algorithm. This output was obtained by cross correlating the recording with the Mauser template shown in Figure 5.9, and then squaring the outcome. As can be seen Figure 5.16 (b), a maximum peak is obtained in the output. This peak in the output is in the same position as the Mauser gunshot, thus positively identifying the gunshot and its position in the recording (a) Mauser Gunshot (b) Figure 5.16: Cross correlation of Mauser template with Mauser gunshot fired 1500m away 56 Figure 5.17 (a) shows a recording of a Mauser gunshot fired 1500m away from the microphone array with speech sounds close to the microphone, while Figure 5.17 (b) shows the output of the cross correlation algorithm. This output was obtained by cross correlating the recording with the pistol template shown in Figure 5.8, and then squaring the outcome. Figure 5.17 (b) shows that a maximum peak is obtained in the output, but that this peak does not coincide with the position of the Mauser gunshot’s waveform. It shows that the pistol template from Figure 5.8 does not extract the Mauser waveform from the recorded signal at a distance of 1500m. Thus the pistol template from Figure 5.8 is better suited to extract gunshot waveforms of more types of guns but at shorter distances. (a) Mauser Gunshot (b) Figure 5.17: Cross correlation of Pistol template with Mauser gunshot fired 1500m away 57 5.3.2.6 Cross Correlation of Mauser sound 1700m away from array Figure 5.18 (a) shows a recording of a Mauser gunshot fired 1700m away from the microphone array with speech sounds recorded close to the microphones. The output of the cross correlation algorithm is shown in Figure 5.18 (b). This output was obtained by cross correlating the recording with the Mauser template shown in Figure 5.9, and then squaring the outcome. As can be seen from Figure 5.18 (b), a maximum peak is obtained in the output. This peak in the output is at the same position as the Mauser gunshot, thus positively identifying the gunshot and its position in the recording. Mauser Gunshot (a) (b) Figure 5.18: Cross correlation of Mauser template with Mauser gunshot fired 1700m away 58 5.4 TRAINABLE TEMPLATE MATCHING ALGORITHMS The following algorithms (Least Squares, RKHS and SVM) discussed in this section need sets of data that can be used to train networks to recognise a pattern in the data stream. Gunshot waveform templates with different sample sizes were experimented with. It proved that if there are more samples in the training set, the longer it takes to obtain an answer for the various algorithms. Figure 5.19 shows the Mauser template waveform used for the LS algorithm. Figure 5.19: LS Mauser waveform template 5.4.1 TEMPLATE MATCHING WITH LEAST SQUARES To recognise a Mauser gunshot, a template of a Mauser gunshot was created, by averaging 6 gunshot waveforms at the same time instance. The audio template was also normalised to a maximum of 1. The following figures were obtained using the audio template waveform, as shown in Figure 5.19, with the LS based algorithm. As can be seen 59 in Figure 5.20, the output of the algorithm shows a maximum value where the gunshot impulse was found for a gunshot fired 500m away from the recording equipment. Figure 5.20: Output of Least Squares algorithm of Mauser gunshot at 500m away The LS algorithm was also tested on a gunshot from a Mauser that was fired 1000m meters away from the recorded microphones. Figure 5.21 shows that the gunshot was also detected in the output. Figure 5.21: Output of Least Squares algorithm of Mauser gunshot at 1000m away 60 Figure 5.22 shows the output of the LS algorithm for the gunshot recording where the gun was fired 1500m meters away from the recording microphones. Figure 5.22 also shows that the gunshot is not detected and that the recorded speech waveforms have higher peaks in the output than the gunshot. Mauser gunshot detection Figure 5.22: Output of Least Squares algorithm of Mauser gunshot at 1500m away The LS algorithm only gave reasonably good results with the Mauser gunshot data, compared to when the LS algorithm was applied to the pistol gunshot data. The results obtained were not conclusive for pistol data. 5.4.2 RKHS USING A SECOND ORDER POLYNOMIAL KERNEL The outputs of the RKHS algorithm using a second order polynomial kernel are described in the following section. Higher order polynomial kernels took extremely long to process and the linear kernel did not produce any conclusive results. 61 5.4.2.1 The Mauser Template used for RKHS Figure 5.23 shows the Mauser template created to be used as the training set for this algorithm. This training set shown in Figure 5.23 showed the best result in obtaining Mauser impulse detections on the different tested distances (i.e. 500m, 1000m, 1500m, and 1700m). The template in Figure 5.23 was created by averaging Mauser waveforms from gunshots fired 1500m away from the microphone array. Every 4th sample of the output of the averaging was then taken and interpolated in between the samples, but still keeping the original number of samples in the waveform. The waveform of the template was smoothed by interpolating over the samples. More accurate results were obtained with the RKHS algorithm using the smoothed waveform. Interpolation of the templates to smooth out the waveforms, stemmed from work done by Viola and Walker (2005). Lastly the audio template was normalised to a maximum of 1. Figure 5.23: Mauser template used in RKHS training sequence, smoothed by interpolation 62 5.4.2.2 The Pistol Template used for RKHS Figure 5.24 shows the pistol template created to be used as the pistol training set for this algorithm. This training set shown in Figure 5.24 showed the best result in obtaining pistol impulse detections on the different tested distances (i.e. 500m and 1000m) The template in Figure 5.24 was created by averaging pistol waveforms from gunshots fired 1000m away from the microphone array. The audio template of the pistol was also normalised to a maximum of 1. Figure 5.24: Pistol template used in RKHS training sequence, smoothed by interpolation 63 5.4.2.3 Mauser impulse detection using a 2nd order polynomial RKHS kernel at 500m Figure 5.25 (a) shows a waveform from a Mauser gunshot fired 500m away from the microphone array with speech sound waveforms. Figure 5.25 (b) shows the output of the RKHS algorithm using a 2nd order polynomial kernel. This output was obtained by creating a training set from the Mauser template shown in Figure 5.23, then feeding the signal shown in Figure 5.25 (a), into the RKHS network. Figure 5.25 (b) shows a maximum peak and a large undershoot compared to the rest of the output. This peak in the output is in the same position as the Mauser gunshot, thus identifying the gunshot and its position in the recording. Mauser Gunshot (a) (b) Figure 5.25: Mauser gunshot at 500m and output using a 2nd order polynomial RKHS kernel 64 5.4.2.4 Pistol impulse detection using a 2nd order polynomial RKHS kernel at 500m Two pistol waveforms from gunshots fired 500m away from the microphone array can be seen in Figure 5.26 (a). The recorded waveform in Figure 5.26 (a) also contains speech sound waveforms recorded close to the microphone array. Figure 5.26 (b) shows the output of the RKHS algorithm using a 2nd order polynomial kernel. This output was obtained by creating a training set from the pistol template shown in Figure 5.24 and then feeding the signal shown in Figure 5.26 (a), into the RKHS network. Figure 5.26 (b) shows that the maximum peaks from the speech waveform are also present with the peaks of the pistol waveforms in the output. If one would make a detection threshold of 105, then 4 gunshots would be detected, although there are only 2 shots fired. The RKHS algorithm in this case (pistol at 500m) doesn’t suppress the speech waveform peaks enough in the output. 2 Pistol Gunshots (a) (b) Figure 5.26: Pistol gunshots at 500m and output using a 2nd order polynomial RKHS kernel 65 5.4.2.5 Mauser impulse detection using a 2nd order polynomial RKHS kernel at 1000m Figure 5.27 (a) shows a waveform from a Mauser gunshot fired 1000m away from the microphone array with speech sound waveforms recorded close to the microphone array. Figure 5.27 (b) shows the output of the RKHS algorithm using a 2nd order polynomial kernel. This output was obtained by creating a training set from the Mauser template shown in Figure 5.23, then feeding the signal shown in Figure 5.27 (a) into the RKHS network. Figure 5.27 (b) shows a maximum peak and a large undershoot compared to the rest of the output. This peak in the output is in the same position as the Mauser gunshot, thus identifying the gunshot and its position in the recording. Mauser Gunshot (a) (b) Figure 5.27: Mauser gunshot at 1000m and output using a 2nd order polynomial RKHS kernel 66 5.4.2.6 Pistol impulse detection using a 2nd order polynomial RKHS kernel at 1000m Figure 5.28 (a) shows a pistol waveform from a gunshot fired 1000m away from the microphone array. Figure 5.28 (a) also contains speech sound waveforms. Figure 5.28 (b) shows the output of the RKHS algorithm using a 2nd order polynomial kernel. This output was obtained by creating a training set from the pistol template shown in Figure 5.24 and then feeding the signal shown in Figure 5.28 (a), into the RKHS network. Figure 5.28 (b) shows that the maximum peaks from the speech waveform are also present with the peaks of the pistol waveforms in the output. If one would make a detection threshold of 125, then only the pistol’s impulse is detected. The output of the RKHS algorithm in this case (pistol at 1000m) does identify the pistol impulse and its position in the recording. Pistol Gunshot (a) (b) Figure 5.28: Pistol gunshot at 1000m and output using a 2nd order polynomial RKHS kernel 67 5.4.2.7 Mauser impulse detection using a 2nd order polynomial RKHS kernel at 1500m In Figure 5.29 (a) the waveform from a Mauser gunshot fired 1500m away from the microphone array with speech sound waveforms can be seen. Figure 5.29 (b) shows the output of the RKHS algorithm using a 2nd order polynomial kernel. This output was obtained by creating a training set from the Mauser template shown in Figure 5.23, then feeding the signal shown in Figure 5.29 (a) into the RKHS network. Figure 5.29 (b) shows a large undershoot compared to the rest of the output. This large negative peak in the output is in the same position as the Mauser gunshot, thus positively identifying the gunshot and its position in the recording. It is evident that in this case that the speech and environmental noise are suppressed in the output by the RKHS algorithm. Mauser Gunshot (a) (b) Figure 5.29: Mauser gunshot at 1500m and output using a 2nd order polynomial RKHS kernel 68 5.4.2.8 Mauser impulse detection using a 2nd order polynomial RKHS kernel at 1700m A waveform from a Mauser gunshot fired 1700m away from the microphone array and speech sound waveforms are shown Figure 5.30 (a). The output of the RKHS algorithm using a 2nd order polynomial kernel is seen in Figure 5.30 (b). This output was obtained by creating a training set from the Mauser template shown in Figure 5.23, then feeding the signal shown in Figure 5.30 (a) into the RKHS network. Figure 5.30 (b) shows a large undershoot, compared to the rest of the output. This large negative peak in the output is in the same position as the Mauser gunshot, thus positively identifying the gunshot and its position in the recording. The speech and environmental noise are suppressed (compared to the input signal) relative to the detected Mauser impulse in the output by the RKHS algorithm. (a) (b) Figure 5.30: Mauser gunshot at 1700m and output using a 2nd order polynomial RKHS kernel 69 5.4.3 SUPPORT VECTOR MACHINES The outputs for the Least Square version of a Support Vector Machine (LS-SVM) algorithm developed by Katholieke Universiteit Leuven†† in Belgium are described in the following section. For simplicity the term SVM will be used in the following sections when referring to the Least Square implementation. Results from the second and third order polynomial kernels are shown, because they gave much better results compared to the linear kernel. The regularisation constant gamma (γ) set to a value of 1 also showed the best results. 5.4.3.1 The Mauser Template used for SVM network Figure 5.31 shows the Mauser template created to use as the training set for the SVM algorithm. The template shown in Figure 5.31 showed the best result in obtaining Mauser impulse detections with the SVM network, on the different recorded distances (i.e. 500m, 1000m, 1500m, and 1700m). Figure 5.31: Mauser template used in SVM network training sequence, smoothed by interpolation †† LS-SVMLab available at http://www.esat.kuleuven.be/sista/lssvmlab/ (Accessed 3 October 2012) 70 The template in Figure 5.31 was created by averaging Mauser waveforms from gunshots fired 1500m away from the microphone array. Lastly the audio template was normalised to a maximum of 1. 5.4.3.2 The Pistol Template used for SVM Figure 5.32 shows the pistol template created to use as the pistol training set for the SVM algorithm. This training set shown in Figure 5.32 showed the best result in obtaining pistol impulse detections on the different tested distances (i.e. 500m and 1000m). The template in Figure 5.32 was created by averaging pistol waveforms from gunshots fired 1000m away from the microphone array and then the audio template was normalised to a maximum of 1. Figure 5.32: Pistol template used in SVM network training sequence 71 5.4.3.3 Impulse detection using a 2nd order polynomial SVM kernel and a Pistol Training set for recordings 500m away In Figure 5.33 (a) two pistol waveforms and a Mauser waveform from gunshots fired 500m away from the microphone array can be seen. Figure 5.33 (a) also contains speech sound waveforms. Figure 5.33 (b) shows the output of the SVM algorithm using a 2nd order polynomial kernel. This output was obtained by creating a training set from the pistol template shown in Figure 5.32 and then feeding the signal shown in Figure 5.33 (a), into the SVM network. The output, in Figure 5.33 (b), reveals 3 maximum peaks at 1, indicating gunshot detections at those positions of the recording. These peaks are at the same position as the Mauser and pistol gunshot waveforms in the test signal (Figure 5.33 (a)). This indicates that this pistol training set (template) detects both the Mauser and pistol impulses with an SVM network using a 2nd order polynomial kernel. 2 Pistol Gunshots Mauser Gunshot (a) (b) Figure 5.33: Pistol and Mauser gunshots at 500m and output of 2nd order polynomial SVM kernel using a pistol template 72 5.4.3.4 Impulse detection using a 3rd order polynomial SVM kernel and a Mauser Training set for recordings 500m away Figure 5.34 (a) shows two pistol waveforms and a Mauser waveform from gunshots fired 500m away from the microphone. Figure 5.34 (a) also contains speech sound waveforms. Figure 5.34 (b) shows the output of the SVM algorithm using a 3rd order polynomial kernel. This output was obtained by creating a training set from the Mauser template shown in Figure 5.31 and then feeding the signal shown in Figure 5.34 (a), into the SVM network. The output in Figure 5.34 (b) reveals a maximum peak with amplitude of 1, indicating a gunshot detection in that position of the recording. This peak is at the same position as the Mauser waveform in the test signal shown Figure 5.34 (a). This indicates that this Mauser training set (template) detects only the Mauser impulse with an SVM network using a 3rd order polynomial kernel. 2 Pistol Gunshots Mauser Gunshot (a) (b) Figure 5.34: Pistol and Mauser gunshots at 500m and output of 3rd order polynomial SVM kernel using a Mauser template 73 5.4.3.5 Impulse detection using a 2nd order polynomial SVM kernel and a Pistol Training set for recordings 1000m away Figure 5.35 (a) shows a pistol waveform and a Mauser waveform from gunshots fired 1000m away from the microphone array. Figure 5.35 (a) also shows that the recording contains speech sound waveforms. Figure 5.35 (b) shows the output of the SVM algorithm using a 2nd order polynomial kernel. This output was obtained by creating a training set from the pistol template shown in Figure 5.32 and then feeding the signal shown in Figure 5.35 (a), into the SVM network. The output as seen in Figure 5.35 (b) reveals 2 maximum peaks with amplitude of 1, indicating gunshot detections in the same positions of the recording. These peaks are at the same position as the Mauser and pistol gunshot waveforms in the test signal (Figure 5.35 (a)). This indicates that this pistol training set (template) detects both the Mauser and pistol impulses (from gunfire 1000m away) with an SVM network using a 2nd order polynomial kernel. Pistol Gunshot Mauser Gunshot (a) (b) Figure 5.35: Pistol and Mauser gunshots at 1000m and output of 2nd order polynomial SVM kernel using a pistol template 74 5.4.3.6 Impulse detection using a 3rd order polynomial SVM kernel and a Mauser Training set for recordings 1000m away Figure 5.36 (a) shows a pistol waveform and Mauser waveform from gunshots fired 000m away from the microphone array. Figure 5.36 (a) also contains speech sound and environmental noise waveforms that are local to the microphone array. Figure 5.36 (b) shows the output of the SVM algorithm using a 3 rd order polynomial kernel. This output was obtained by creating a training set from the Mauser template shown in Figure 5.31 and then feeding the signal shown in Figure 5.36 (a), into the SVM network. The output as seen in Figure 5.36 (b) reveals a maximum peak with amplitude of 1, indicating gunshot detection in that position of the recording. This peak is at the same position as the Mauser waveform in the test signal (Figure 5.36 (a)). This indicates that this Mauser training set (template) detects only the Mauser impulse with an SVM network using a 3rd order polynomial kernel. Mauser Gunshot Pistol Gunshot (a) (b) Figure 5.36: Pistol and Mauser gunshots at 1000m and output of 3rd order polynomial SVM kernel using a Mauser template 75 5.4.3.7 Impulse detection using a 3rd order polynomial SVM kernel and a Mauser Training set for recordings 1500m away Figure 5.37 (a) shows a waveform from a Mauser gunshot fired 1500m away from the microphone. Figure 5.37 (a) also contains speech sound waveforms that are local to the microphone array. Figure 5.37 (a) shows that the speech and noise are almost indistinguishable from the Mauser gunshot waveform. Figure 5.37 (b) shows the output of the SVM network using a 3rd order polynomial kernel. This output was obtained by creating a training set from the Mauser template shown in Figure 5.31 and then feeding the signal shown in Figure 5.37 (a), into the SVM network. The output (Figure 5.37 (b)) reveals a maximum peak at 1, indicating gunshot detection in that position of the recording. This peak is at the same position as the Mauser waveform in the test signal (Figure 5.37 (a)). This indicates that this Mauser training set (template) detects the Mauser impulse with an SVM network using a 3rd order polynomial kernel at a distance 1500m away from the microphone array. (a) Mauser Gunshot (b) Figure 5.37: Mauser gunshot at 1500m and output of 3rd order polynomial SVM kernel using a Mauser template 76 5.4.3.8 Impulse detection using a 3rd order polynomial SVM kernel and a Mauser Training set for recordings 1700m away Figure 5.38 (a) shows a Mauser waveform from a gunshot fired 1700m away from the microphone. Figure 5.38 (a) also contains speech sound and environmental noise waveforms that are local to the microphone array. Figure 5.38 (b) shows the output of the SVM algorithm using a 3rd order polynomial kernel. This output was obtained by creating a training set from the Mauser template shown in Figure 5.31 and then feeding the signal shown in Figure 5.38 (a), into the SVM network. The output reveals a maximum peak at 1, indicating a gunshot detection in that position of the recording. This peak is at the same position as the Mauser waveform in the test signal (Figure 5.38 (a)). This indicates that this Mauser training set (template) detects the Mauser impulse with an SVM network using a 3rd order polynomial kernel at a distance of 1700m away from the microphone array. (a) (b) Figure 5.38: Mauser gunshot at 1700m and output of 3rd order polynomial SVM kernel using a Mauser template 77 5.5 DETECTION ALGORITHM ACCURACY COMPARISON This section will compare the accuracy of GCC, LS, RKHS and the SVM algorithms used for impulse detection. In Tables 5.1 to 5.6 a value of 1, indicates a positive detected gunshot, while a value of 0 indicates a non-detection. Table 5.1 shows that the GCC and SVM algorithms are the most accurate using pistol templates, for shots fired 500m away from the recording device. At this distance, using Pistol templates, the GCC and SVM algorithms detects both the Pistol and the Mauser as a gunshot sound. Although the LS and the RHKS algorithms indicates a gunshot detection from the Pistol sound, these algorithms make a false detection by indicating the speech and car sounds also as gunshot sounds. Table 5.1: Comparison of detection algorithms using Pistol templates for shots fired 500m away Pistol Templates 500m Sound Speech Car door Car engine Mauser Pistol GCC LS RHKS SVM 0 0 0 1 1 1 1 0 0 1 1 1 0 0 1 0 0 0 1 1 78 In Table 5.2 the GCC, RHKS, LS and the SVM algorithms make a positive detection on the Mauser gunshot sound using Mauser templates, for shots fired 500m away. The LS also makes a false detection, indicating the speech sound as a gunshot. Table 5.2: Comparison of detection algorithms using Mauser templates for shots fired 500m away Mauser Templates 500m Sound Speech Car door Car engine Mauser Pistol GCC LS RHKS SVM 0 0 0 1 0 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 In Table 5.3, using Pistol Templates, all the listed algorithms detect the Pistol and the Mauser sounds as a gunshot, except the RHKS algorithm, which only detects a Pistol shot. The LS algorithm also makes a false detection on the speech sound. The gunshots were fired 1000m away from the microphones. Table 5.3: Comparison of detection algorithms using Pistol templates for shots fired 1000m away Pistol Templates 1000m Sound Speech Car door Car engine Mauser Pistol GCC LS RHKS SVM 0 0 0 1 1 1 0 0 1 1 0 0 0 0 1 0 0 0 1 1 79 As can be seen from the comparison of the 4 listed algorithms seen in Table 5.4, all of the algorithms using Mauser templates detect the Mauser gunshot. The LS algorithm also makes a false gunshot detection on the speech sound. These results are for gunshots fired at a distance of 1000m away from the microphones. Table 5.4: Comparison of detection algorithms using Mauser templates for shots fired 1000m away Mauser Templates 1000m Sound Speech Car door Car engine Mauser Pistol GCC LS RHKS SVM 0 0 0 1 0 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 The LS algorithm makes no gunshot detections, as shown in Table 5.5. The GCC, RKHS and the SVM all make positive gunshot detections on the Mauser gunshot sound using Mauser templates on shots fired 1500m away from the microphones. Table 5.5: Comparison of detection algorithms using Mauser templates for shots fired 1500m away Mauser Templates 1500m Sound Speech Car door Car engine Mauser Pistol GCC LS RHKS SVM 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 80 Table 5.6 shows the comparison of the 4 listed algorithms, on Mauser gunshots fired 1700m away from the microphones. All of the algorithms, except the LS, make a positive gunshot detection, using Mauser Templates. Table 5.6: Comparison of detection algorithms using Mauser templates for shots fired 1700m away Mauser Templates 1700m Sound Speech Car door Car engine Mauser Pistol GCC LS RHKS SVM 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 5.6 EXECUTION TIME COMPARISON BETWEEN DIFFERENT IMPULSE DETECTION ALGORITHMS THE The following execution times were obtained in the Matlab 2007b environment on an Intel® Core 2 Duo™ T7500 ‡‡ 2.2 GHz processor, and 1 GB RAM (Random Access Memory). The average length of the templates or training sets were 100 samples and the sample length of the test signals were 5000 samples. The training and test signals were sampled at 5 kHz. The training of the different algorithms were done before the prediction algorithms were applied, and is not incorporated in the calculation of the execution time. Only the time it takes to make a prediction based on per 5000 samples of the test signal is used in the calculation. ‡‡ http://www.intel.com [Accessed 20 September 2011] 81 The Matlab function “cputime” was used to measure the time a specific algorithm takes to process 5000 samples. The Least Square algorithm executed the fastest with 1.562 ms per 5000 samples. Secondly was the General Cross Correlation algorithm averaging 3.120 ms per 5000 samples. The Support Vector Machine algorithm with 2nd and 3rd order polynomial kernels took 1078 ms to process 5000 samples. The RKHS algorithm with a 2nd order polynomial took the longest to execute by averaging 4287 ms to process 5000 samples. Figure 5.39 shows the different execution times in millisecond per 5000 samples, for the different impulse detection algorithms. 1.0E+04 4187 ms Time (ms) per 5000 samples 1078 ms 1.0E+03 1.0E+02 1.0E+01 3.120 ms 1.562 ms 1.0E+00 LS GCC SVM RKHS Figure 5.39: Comparison of execution time, milliseconds per 5000 samples, between the different impulse detection algorithms 82 Table 5.7 shows the execution time for the different detection algorithms in µs/sample. It is shown in ascending order, where LS is the fastest algorithm and RKHS takes the longest to execute. Table 5.7: Execution time of the detection algorithms in µs/sample Algorithm LS GCC SVM RKHS µs/sample 0.313 0.624 216 837 5.7 CONCLUSION Chapter 5 showed and discussed the graphed Matlab results obtained from the various detection algorithms (GCC, LS, RKHS and SVM). It also gave tabled summaries on the performance of the algorithms compared with one another. 83 6 CONCLUSION This chapter discusses the results obtained in chapter 5, and will state the conclusions drawn from the discussion. Recommendations are also made on future research and a habitat protection strategy. 6.1 THE ACCURACY OF THE ALGORITHMS The results in chapter 5 show that the GCC and SVM impulse detection algorithms are the most accurate, over all the different distances (500m, 1000m, 1500m and 1700m). In the shorter ranges (500m and 1000m), the RKHS algorithm sometimes generates false detections on speech waveforms, but becomes more accurate for the longer ranges. This might be attributed to a bigger impulse waveform of the gunshot, at shorter distances, that correlates too much with the waveform of the speech that was recorded close to the microphones. The LS algorithm performs the worst of all 4. It does detect gunshots at short range (500m and 1000m), but it also creates false detections on speech waveforms. At longer ranges (1500m and 1700m) the LS detection algorithm fails completely. 6.2 COMPLEXITY MEASURED IN ALGORITHM PROCESSING TIME The LS detection algorithm executes the fastest at 313 ns per sample, followed by the GCC detection algorithm at 624 ns per sample. The RKHS detection algorithm takes the longest of all 4 algorithms to process at 837 µs per sample, while the SVM algorithm executes almost 4 times faster than the RKHS algorithm at 216 µs per sample. The 84 execution time of all the algorithms can be greatly reduced if implemented in a programming language like C instead of Matlab. 6.3 OVERALL PERFORMANCE OF THE IMPULSE DETECTION ALGORITHMS Overall the GCC detection algorithm performed the best measured on its short execution time (less complex) and its accuracy. The SVM detection algorithm’s accuracy is the same as the GCC’s accuracy measured at 500m to 1700m, but with a higher complexity (longer processing time). For gunshots fired at distances larger than 1700m, the SVM detection algorithm might be more accurate than the GCC, if the SVM algorithm could be implemented with more sound templates on equipment with higher processing power. 6.4 FUTURE RESEARCH Future research might include the accuracy of the GCC and SVM impulse detection algorithms over larger distances, up to a range of 7km or more. Also a wider range of gun calibres might be included in the study, as well as more types of sounds, for instance the mechanical action sounds of firearms, helicopters and other automobiles which might be added as a possible threat to the preservation of wild life. Another area of further research could be to train multiple classifiers for the specific gunshots which they are good at, and then combining the outputs of the different detection algorithms. Also different strategies for implementing gunshot detection with the aim of protecting large habitats might also be researched. Optical threat detection systems could also be incorporated into the conservation strategy. 85 6.4.1 ANTI-POACHING APPLICATION AND STRATEGY By incorporating the research from Smith, Buscemi, and Xu (2010) and Wang (2009) and also building on the conclusions reached in sections 6.1 to 6.3, further research might be warranted for different gunshot detection and localisation strategies that can form part of a Habitat Management System (HMS). This HMS should be able to protect and monitor nature reserves like the Kruger National Park or high risk poaching areas in a reserve. 6.4.1.1 Low-end gunshot detection modules The GCC algorithm can be implemented on low-end gunshot detection modules where the operation of the module might rely solely on solar energy or batteries. Because the GCC algorithm is less complex, it might be implemented on a low-power device. A detection node could be integrated into the communications radios, as suggested by Smith, Buscemi, and Xu (2010), of park rangers or patrol personnel. Implementation of Lightning Protocol, as suggested by Wang (2009), could also be incorporated into the strategy to wake-up high-end detection modules in hibernation. Apart from waking-up high-end modules, the low-end modules would share muzzle blast and/or shock wave time of arrival information with the high-end nodes. The low-end modules can either be at fixed positions or have an integrated GPS on mobile units (communications radios). 6.4.1.2 High-end gunshot detection modules Implementation of the SVM algorithm can occur on high-end detection and localisation modules. Because of the higher complexity and possible higher accuracy at larger 86 distances of the SVM algorithm, modules that are capable of higher computational complexity might be used for the implementation. Thus requiring more power to operate, these modules can be integrated into the patrol vehicles and also at rest camps of the reserve. Also the existing structures of cell phone base stations can be utilised to implement the high-end detection nodes. In remote areas where batteries or solar energy is the only option for power, these modules can be put into hibernation until woken up by a low-end module as suggested by Wang (2009). Other functionalities of the high-end modules can include the localisation of gunshots and also some additional measurements or sensors for terrain information for increased accuracy. The terrain information can be incorporated with a Kalman adaptive filter as suggested by Smith, Buscemi, and Xu (2010). The high-end modules could also be used to relay the gunshot event data to a centralised node or control room at the rest camps. 6.4.1.3 Habitat Protection Strategy Figure 6.1 illustrates a strategy to protect a nature reserve. High-end and low-end gunshot detection modules are shown dispersed across the nature reserve. Low-end gunshot detection modules are also incorporated in the communications radios of the reserve’s personnel. High-end modules are furthermore incorporated into structures of cell phone base stations and into the patrol vehicles of the reserve. High-end modules in remote areas can operate from solar-power. Figure 6.1 also shows optical threat detectors on locations with higher elevations. The whole reserve is protected by a perimeter intrusion detection system in addition. 87 Cell phone Base Station High-end Gunshot Detection Module Reserve Patrol Vehicle Low-end Gunshot Detection Module Patrol Personnel Perimeter Intrusion Detector Optical Threat Detectors Rest camp and Central Control Room Figure 6.1: Illustration of a habitat protection strategy 88 7 BIBLIOGRAPHY Burges, C.J.C., 1998. ‘A Tutorial on Support Vector Machines for Pattern Recognition’, Data Mining and Knowledge Discovery, 2, 121–167, 1998. Chen, C., Abdallah A. and Wolf W., 2006. ‘Audiovisual Gunshot Event Recognition’, in Proc IEEE International Conference on Systems, Man and Cybernetics, (SMC '06), Taipei, Taiwan, pp. 4807-4812, October 2006. Cilliers, J.E. and Smit, J.C, 2007. ‘Pulse compression sidelobe reduction by minimization of Lp norms’, IEEE Transactions on Aerospace and Electronic Systems, 2007. De Brabanter, K., Karsmakers, P., Ojeda, F., Alzate, C., De Brabanter, J., Pelckmans, K., De Moor, B., Vandewalle, J. and Suykens, J.A.K., 2011. ‘LS-SVMlab Toolbox User’s Guide, version1.8’. [online] Available at: <http://www.esat.kuleuven.be/sista/lssvmlab/> [Accessed 3 July 2012]. Defense Update, 2008. ‘Sniper Location & Gunshot Detection Systems’. [online] Available at: <http://defense- update.com/features/2008/november/231108_sniper_detection.html> October 2012]. 89 [Accessed 3 Environment News Service, 2011. ‘Powdered Rhino Horn as Pricey as Street Cocaine’. [online] Available at: <http://www.ens-newswire.com/ens/feb2011/2011-02-15-02.html> [Accessed 3 July 2012]. Green, M.L., Watkins, C., Rogan D. and Frank, J., 1999. ‘Random Gunfire Problems and Gunshot Detection Systems’, National Institute of Justice, December 1999. Hertz, D., 1986. ‘Time delay Estimation by combining efficient algorithms and generalized cross-correlation methods’. IEEE transactions on acoustics, speech, and signal processing, vol. asp-34 number 1, February 1986. Ifeachor, E.C. and Jervis, B.W., 2002. Digital signal processing a practical approach. 2nd ed., Pearson education limited, England. IUCN, 2011. ‘The IUCN Red List of Threatened Species, Diceros bicornis’. [online] Available at: <http://www.iucnredlist.org/apps/redlist/details/6557/0> [Accessed 3 July 2012]. Lutsa, J., Ojeda F., Van de Plas, R., De Moor, B., Van Huffel, S. and Suykens, J.A.K., 2010. ‘A tutorial on support vector machine-based methods for classification problems in chemometrics’. Analytica Chimica Acta, vol. 665, pp. 129–145, 2010. Maher, R.C., 2007 ‘Acoustical Characterization of Gunshots’, IEEE Workshop on Signal Processing Applications for Public Security and Forensics, April, 2007. 90 Matlab, 2002. ‘Adaptive Filters in the Filter Design Toolbox’. [online] Available at: <http://ee.tamu.edu/matlab-help/toolbox/filterdesign/adaptive.html> [Accessed 27 June 2007]. Milliken T., Burn R.W. and Sangalakula L, 2007. ‘The Elephant Trade Information System (ETIS) and the Illicit Trade in Ivory’, A report to the 14th meeting of the Conference of the Parties to CITES. Moler, C., 2008. ‘Least Squares’, Numerical Computing with MATLAB. [online] Available at: <www.mathworks.com/moler/leastsquares.pdf> [Accessed 27 September 2012]. Pauli, M., Seisler, W., Price, J., Williams, A., Maraviglia, C., Evans, R., Moroz, S., Ertem, M. C., Heidhausen, E. and Burchick, D.A., 2004. ‘Infrared Detection and Geolocation of Gunfire and Ordnance Events from Ground and Air Platforms’. [online] Available at: <www.dtic.mil/cgi-bin/GetTRDoc?AD=ADA460225> [Accessed 27 September 2012]. Pikrakis, A., Giannakopoulos T., and Theodoridis S., 2008. ‘Gunshot detection in audio streams from movies by means of dynamic programming and Bayesian networks’, in Proc IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2008), Las Vegas, NV, pp. 21-24, March 2008. Raffensperger, L, 2008. ‘Illegal Animal Trade Finances War in Africa’. [online] Available at: <http://earthtrends.wri.org/updates/node/291> [Accessed 9 June 2009]. 91 Roberts A.M, 2002. ‘Elephants still under the gun’, Animal welfare institute quarterly, vol. 51, nr. 3, 2002. SavingRhinos.org, 2012. ‘South Africa: 251 Rhinos Killed in 172 Days’. [online] Available at: <http://www.rhinoconservation.org/2012/06/21/south-africa-251-rhinos- killed-in-172-days/> [Accessed 3 July 2012]. SavingRhinos.org, 2013. ‘South Africa: 668 Rhinos Killed in 2012’. [online] Available at: <http://www.rhinoconservation.org/2013/01/10/south-africa-668-rhinos-killed-in-2012/> [Accessed 26 February 2013]. Smith, M., Buscemi, S and Xu, DJ., 2010. ‘Gunshot Detection System for JTRS Radios’, The 2010 Military Communications Conference’, p.266-271, October 2010. Steinwart I, Hush D and Scovel C, 2006. ‘An explicit description of the reproducing kernel Hilbert spaces of Gaussian RBF kernels’, Modeling, Algorithms and Informatics Group, CCS-3 Los Alamos National Laboratory, February 6, 2006. Suykens, J.A.K., Vandewalle, J., 1999. ‘Least squares support vector machine classifiers’, Neural Processing Letters, vol. 9, issue 3, pp. 293–300, 1999. The Register, 2012. ‘Rhino horn price spike drives record poaching’. [online] Available at: <http://www.theregister.co.uk/2012/01/03/quacks_and_crims_take_rhinos> [Accessed 3 July 2012]. 92 Van Wyk, B.J., van Wyk, M.A. and Noel, G., 2004. ‘Lecture Notes in Computer Science’, LNCS: Structural, Syntactic, and Statistical Pattern Recognition, vol. 3138, pp. 831-839, 2004. Viola, F. and Walker, W.F., 2005. ‘A Spline-Based Algorithm for Continuous TimeDelay Estimation Using Sampled Data’, IEEE transactions on ultrasonics, ferroelectrics, and frequency control, vol. 52, nr. 1, January 2005. Wang Q, 2009. ‘Applying Lightning Protocol to Gunshot Localization’, Department of Computer Science, University of Illinois at Urbana-Champaign. [online] Available at: <http://www-rtsl.cs.uiuc.edu/papers/lightningingunshotlocalization.pdf> [Accessed 3 July 2012]. WWF, 2011. ‘Javan rhinos extinct in Viëtnam’. [online] Available at: <http://www.worldwildlife.org/who/media/press/2011/WWFPresitem24582.html> [Accessed 3 July 2012 ]. WWF South Africa, 2012. ‘Rhino poaching deaths continue to increase in South Africa’. [online] Available at: http://www.wwf.org.za/?5203/rhino2011 [Accessed 3 July 2012 ]. Zhang, Y., Li, X., Jin, Y., Amin, M.G., 2009. ’Distributed Radar Network for Real-Time Tracking of Bullet Trajectory’, Wireless Sensing and Processing IV, Proc. of SPIE Vol. 7349, 2009. 93 APPENDIX A A.1 LABVIEW EXPERIMENTAL PREPARATION Figure A.1: Labview program that mixes incoming channels with noise in a good signal-to-noise ratio 94 Figure A.2: Labview program that shows the correlation graphs and angle calculation for a optimal signal-to-noise ratio 95 Figure A.3: Shows where the noise and gunshot signal peaks are the same, gunshot impulses start to get buried in the noise 96 Figure A.4: Correlation peaks start to disappear 97 Figure A.5: Gunshot signal peaks are buried in the noise, noise values are greater than the impulse values 98 Figure A.6: The correlation calculation becomes unstable giving wrong values for the angle 99