Audio Processor Engine
Transcription
Audio Processor Engine
Real Time Audio Processor Student : Asaf Bercovich Supervisor : Dr. Ilana David 1 Contents Abstract.......................................................................................................................7 1 Introduction ..............................................................................................................8 1.1 Motivation ..........................................................................................................8 1.2 Primary Goal .......................................................................................................8 1.3 Secondary Goal....................................................................................................8 2 Background ...............................................................................................................9 2.1 Audio Channel Configuration.................................................................................9 2.1.1 "2 Channel Stereo" ........................................................................................9 2.1.2 "5.1 Channel Surround" ..................................................................................9 2.2 Audio Signal ...................................................................................................... 10 2.2.1 Audio Signal Definition ................................................................................. 10 2.2.2 Audio Sample Definition ............................................................................... 10 2.2.3 Audio Signal Example ................................................................................... 10 2.2.4 Audio Signal Resolution ................................................................................ 11 2.3 Complex Functions ............................................................................................. 12 2.4 Consumer Audio Specifications............................................................................ 12 2.5 Digital Audio Processor ....................................................................................... 13 2.5.1 Digital Audio Processor Definition.................................................................. 14 2.5.2 Linear Audio Processor ................................................................................. 14 2.6 Audio Processor Representation .......................................................................... 15 2.6.1 Block Diagram ............................................................................................. 15 2.6.2 Shock Response ........................................................................................... 16 2.6.3 Frequency Response .................................................................................... 17 3 Common Software Architecture................................................................................. 18 3.1 Target Machine and Operation System ................................................................. 18 3.2 Development Environment ................................................................................. 18 3.3 Programming Languages ..................................................................................... 18 3.4 Programming Frameworks .................................................................................. 18 3.5 Exceptions ........................................................................................................ 18 3.5.1 Exception Base Class .................................................................................... 18 3.5.2 HRESULT Exception ...................................................................................... 19 3.6 "2 Channel Stereo" Signal Representation ............................................................ 20 3.6.1 Audio Signal Sample ..................................................................................... 20 2 3.6.2 Audio Signal ................................................................................................ 20 4 Audio Streaming Engine ............................................................................................ 21 4.1 Audio Streaming Engine Requirements ................................................................. 21 4.1.1 Audio Streaming .......................................................................................... 21 4.1.2 Generic Audio Processing ............................................................................. 21 4.1.3 Modularity .................................................................................................. 21 4.1.4 Programming API for C++ and .NET ................................................................ 21 4.2 Streaming Engine Architecture Diagram ............................................................... 22 4.3 Audio Streaming Engine Classes........................................................................... 23 4.3.1 Audio Streamer Class – CStreamer ................................................................. 23 4.3.2 Input Device Abstract Class - IInputDevice ...................................................... 23 4.3.3 Input Device Factory Abstract Class – IInputDeviceFactory ................................ 23 4.3.4 Wave Input Class – CWaveInput .................................................................... 23 4.3.5 Output Device Abstract Class – IOutputDevice................................................. 23 4.3.6 Output Device Factory Abstract Class – IOutputDeviceFactory .......................... 23 4.3.7 XAudio2Output Class – CXAudio2Output ........................................................ 23 4.3.8 Audio Processor Abstract Class – IAudioProcessor ........................................... 23 4.4 Audio Streamer Class – CStreamer ....................................................................... 24 4.4.1 Audio Streaming Process Diagram ................................................................. 25 4.4.2 Audio Streamer Class Diagram ...................................................................... 26 4.5 Input and Output Devices ................................................................................... 28 4.5.1 Input Device Abstract Class - IInputDevice ...................................................... 28 4.5.2 Input Device Class Diagram ........................................................................... 28 4.5.3 Output Device Abstract Class - IOutputDevice ................................................. 29 4.5.4 Output Device Class Diagram ........................................................................ 29 4.6 Wave Input Device Class - CWaveInput................................................................. 30 4.7 XAudio2 Output Device Class – CXAudio2Output ................................................... 31 4.8 Audio Processor Abstract Class – IAudioProcessor.................................................. 33 5 Audio Processing Engine ........................................................................................... 34 5.1 Audio Processing Engine Requirements ................................................................ 34 5.1.1 Linear System .............................................................................................. 34 5.1.2 Programming by Shock Response .................................................................. 35 5.1.3 Programming by Frequency Response ............................................................ 35 5.1.4 Programming by Block Diagram ..................................................................... 36 3 5.1.5 Real Time.................................................................................................... 37 5.1.6 Scale .......................................................................................................... 37 5.1.7 C++ and .NET API ......................................................................................... 37 5.2 Audio Processing Engine Challenge ...................................................................... 38 5.2.1 Block Diagram Simulation ............................................................................. 38 5.2.2 Shock Response Simulation........................................................................... 39 5.3 Audio Processing Engine Architecture .................................................................. 40 5.3.1 Simulation by Polynomial Multiplication......................................................... 40 5.3.2 Fast Polynomial Multiplication (Fast Convolution) ........................................... 41 5.3.3 Block Diagram Topology Analyzer .................................................................. 42 5.4 Audio Processing Engine Architecture Diagram ..................................................... 43 5.5 Block Diagram Topology Analysis ......................................................................... 44 5.5.1 Example...................................................................................................... 44 5.5.2 Block Diagram Concatenation ....................................................................... 45 5.5.2 Operator Tree ............................................................................................. 46 5.5.3 Block Diagram Representation as Operator Tree ............................................. 46 5.5.4 Operator Tree Optimization .......................................................................... 48 5.5.5 Deriving Shock Response from an Optimized Operator Tree ............................. 49 5.5.6 Time Complexity of Block Diagram Analysis .................................................... 49 5.6 Audio Processing Engine Classes .......................................................................... 50 5.6.1 LTI System Class – CLTISystem ....................................................................... 50 5.6.3 Topology Class – CTopology .......................................................................... 50 5.6.4 Adder Class – CAdder ................................................................................... 50 5.6.5 Multiplier Class – CMultiplier ........................................................................ 50 5.6.6 Delay Class – CDelay .................................................................................... 50 5.6.7 Splitter Class – CSplitter ................................................................................ 50 5.6.8 Block Class - CBlock ...................................................................................... 51 5.6.9 Input Class – CIn .......................................................................................... 51 5.6.10 Output Class – COut ................................................................................... 51 5.6.11 Low Pass Class – CLowPass .......................................................................... 51 5.6.12 High Pass Class – CHighPass ........................................................................ 51 5.6.13 Band Pass Class – CBandPass ....................................................................... 51 5.6.14 Multi Band Class – CMultiBand .................................................................... 51 5.7 LTI System Class - CLTISystem .............................................................................. 52 4 5.9 Block-Diagram Public Class Diagram ..................................................................... 53 5.10 Block-Diagram Public Exceptions Class Diagram ................................................... 54 5.10 Filter Public Class Diagram ................................................................................ 55 5.11 Topology Analyzer Internal Classes..................................................................... 56 5.11.1 Topology Analyzer Class – CTopologyAnalyzer ............................................... 56 5.11.2 Transfer Function Class – CTransferFunction ................................................. 57 5.12 Multi Core Fast Fourier Transform (FFT).............................................................. 58 5.12.1 Discrete Fourier Transform – DFT ................................................................ 58 5.12.2 Radix-4 "Decimation in Time" ...................................................................... 59 5.12.3 Radix-4 Recursive FFT ................................................................................. 60 5.12.4 Recursive Call Tree of Radix-4 FFT ................................................................ 61 5.12.5 Look-Up Tables Optimization....................................................................... 62 5.12.6 Radix-4 FFT Parallelism ............................................................................... 63 5.12.7 FFT Class Diagram ...................................................................................... 64 6 Visual Audio Processor.............................................................................................. 66 6.1 Visual Audio Processor Requirements .................................................................. 66 6.1.1 Windows MDI (Multiple Document Interface) GUI ........................................... 66 6.1.2 Block Diagram User Interface ........................................................................ 67 6.1.3 Frequency Response User Interface ............................................................... 67 6.1.4 Audio Streamer User Interface ...................................................................... 68 6.2 Visual Audio Processor Architecture ..................................................................... 69 6.2.1 Document Form Base Class ........................................................................... 69 6.2.2 Block Diagram Form Class ............................................................................. 70 6.2.3 Frequency Response Form Class .................................................................... 71 6.2.4 Project Class................................................................................................ 72 6.2.5 XML Serialization Class ................................................................................. 73 6.2.6 Signal Flow Graph Builder Class ..................................................................... 74 6.2.7 Audio Engine Form Class............................................................................... 75 7 Visual Studio Solution Structure ................................................................................. 76 7.1 Visual Studio Projects ......................................................................................... 76 7.2 Audio Engine C++ Namespace ............................................................................. 77 7.3 Audio Engine .NET Namespace ............................................................................ 78 8 System Requirements ............................................................................................... 79 8.1 Audio Streaming Engine Requirements ................................................................. 79 5 8.2 Audio Processing Engine Requirements ................................................................ 79 8.3 Visual Audio Processor Requirements .................................................................. 79 9 Summary ................................................................................................................ 80 9.1 Further Development ......................................................................................... 80 9.2 Thanks and Gratitude ......................................................................................... 80 10 References ............................................................................................................ 81 6 Abstract Audio Processors are widely common among home entertainment products and professional audio equipment. They intentionally alter an auditory signal into a form of a new signal for useful purposes such as filtering/enhancement and other effects. Most audio processors are dedicated to a certain operation and can't be programmed. This project is about design and development of a software system which implements an Audio Processing Engine that can be programmed in high flexibility by using block diagrams among other design methodologies. The Audio Processing Engine can be executed in real time to process incoming signal and provides high performance. Our Audio Processing Engine is a software system written in C++ exposing an API both for C++ and .NET users. The internals of the systems are coded in C++ to achieve high performance and maximize the potential of software optimizations and parallel programming. The .NET API allows C#, J#, VB and even VB script users to use the system the same way it can be used directly from its original C++ API. Furthermore, A Windows GUI is provided as well to expose the capabilities of the Audio Processing Engine, offer easy design, composition, real-time execution and exploration of different audio processors visually. Windows GUI Application 7 1 Introduction Today, digital audio processors are dominating the market and they come in various software / hardware forms: A hardware audio processor chip. A hardware board which includes an audio processor inside. A complete hardware rack audio processor dedicated for echoing effects. A software audio processor as part of Windows Media Player. 1.1 Motivation Mostly the processors available for the home entertainment are limited to a certain purpose and we can't program / change their operation. 1.2 Primary Goal Design and development of a software system which implements and Audio Processing Engine which can be programmed in terms of Block Diagrams among other design methodologies such as Frequency Response and System Shock Response. The Audio Processing Engine should be able to handle large block diagrams and simulate them in real time. 1.3 Secondary Goal Design and development of a Windows MDI (Multiple Document Interface) GUI to expose the capabilities of the Audio Processing Engine, offer easy design, composition and exploration of different audio processors visually. 8 2 Background 2.1 Audio Channel Configuration Audio playback both for personal use and professional use is characterized by a configuration defining the number of audio channels in the signal, number of loudspeakers and their deployment around the audience. 2.1.1 "2 Channel Stereo" The most common audio configuration is the "2 Channel Stereo": CD Audio Player (Playback device) Left Audio Channel Right Audio Channel In this example, the output signal outgoing from the playback device carries 2 channels of auditory signals. Each loudspeaker reflects a specific and only one channel. 2.1.2 "5.1 Channel Surround" Another common deployment is the "5.1 Channel Surround": DVD Player (Playback device) The "5.1 Channel Surround" carries 6 channels of auditory signals. Each is reflected by the corresponding loudspeaker. Center Low Frequency Channel Front Right Front Left Rear Right Rear Left 9 2.2 Audio Signal In general there are many audio signal configurations in addition to the "2 Channel Stereo" and the "5.1 Channel Surround". We provide a general method/definition for the digital representation of any audio signal configuration. 2.2.1 Audio Signal Definition Definition – Digital Audio Signal A Digital Audio Signal is a vector function x n in a vector space V such as: V x n | x n a n , a n ,..., a 1 2 N n , ai is a scalar function, n x n is a vector of N coefficients of a discrete time variable n where each coefficient is a scalar function. Each coefficient represents the auditory signal played by the corresponding loudspeaker. Defining an audio signal in a form of a vector function with N coefficients allows representing any number audio channels. 2.2.2 Audio Sample Definition Let x n be a vector function representing an audio signal. The value of x n at a specified time n n0 is called a Sample or Audio Sample. 2.2.3 Audio Signal Example x n Sin 2 n , Cos 5 n Here in this example x n is a vector function of 2 coefficients where the first coefficient a1 n Sin 2 n and the second coefficient is a2 n Cos 5 n CD Audio Player (Playback device) Left Audio Channel Right Audio Channel This example is about a vector function represents a Stereo signal in which a1 n is the left auditory signal and a2 n is the right auditory. 10 2.2.4 Audio Signal Resolution Audio signal resolution measures the density of Samples per second and the resolution in the digital representation of the sample value itself. The more samples per second and the more precision on representing the value of the sample, the more the digital audio signal can reflect delicate changes. High fidelity audio is characterized by high signal resolution. 11 2.3 Complex Functions A Complex Function is a private case of a 2 coefficient vector function and is notated as follows: x n L n iR n 2 Channels audio configuration can be represented in a form of a complex function as well as a vector function. L n is the real part representing the auditory signal of the left channel. R n is the imaginary part representing the auditory signal of the right channel. 2.4 Consumer Audio Specifications Consumer audio is a high fidelity audio exceeding the limits of the human ear with the following specifications: Channel Configuration: "2 Channel Stereo" (2 Channels) Signal Resolution: 44100 Samples per second. Each sample is a 16Bit integer. 44100 Samples per Second This is the most common audio format. We are being exposed to this resolution everyday on Digital TV, Audio CD, portable players and other home entertainment devices. 12 2.5 Digital Audio Processor Audio Processors are widely common among home entertainment products and professional audio equipment. They intentionally alter an auditory signal into a form of a new signal for useful purposes such as filtering/enhancement and other effects. CD Audio Player (Playback device) Audio Processor Left Audio Channel Right Audio Channel In the example above the Audio Processor accepts a "2 Channel Stereo" signal and generates an output of the same configuration type. In the general case an Audio Processor can accepts one type of signal configuration and generates an output of a different type signal configuration. 13 2.5.1 Digital Audio Processor Definition Digital Audio Processors are entities which mathematically operate on the digital representation of a signal. Definition - Digital Audio Processor A Digital Audio Processor is a transformation T : V K from a signal vector space V to a signal vector space K . The transformation can also be illustrated as follows: Audio Processor T x V is the input signal and y K is the transformation result or the output signal of the Digital Audio Processor. Signal vector spaces V and K are the same type of vector space defined in 2.2.1, however we explicitly repeat their definition for the sake of definition integrity: V K x n | x n a n , a n ,..., a y n | y n b n , b n ,..., b 1 1 2 N n , ai is a scalar function, n 2 M n , bi is a scalar function, n x n and y n are signals of N and M scalar coefficients correspondingly. Each of them is a function of a discrete time variable n . 2.5.2 Linear Audio Processor A linear audio processor T is an Audio Processor defined in section 2.5.1 for which the following two additional properties hold: Assuming x1 n is the input signal and its corresponding output is y1 n . If we define x n x1 n as the input of the system, then the output is the signal y n such as y n y1 n . Assuming x1 n and x2 n are two separate signals and their output are the signals y1 n and y2 n correspondingly. If we define x n x1 n x2 n as the input of the system, then the output of the system y n is the summation of the separate outputs y1 n and y2 n such as y n y1 n y2 n . 14 2.6 Audio Processor Representation Audio Processors are transformations from one vector space to another. Such transformations can be represented in more than one way. The following sections are examples of different methods to represent linear signal transformations. 2.6.1 Block Diagram Audio Processors can be represented as Block Diagrams. out in D −2 Diagram Block D in out Semantics Block Type Register / Delay by 1 Sample Unary operator Multiplication by Unary operator Summation Binary Operator Splitter Special Operator Input Source Special Operator Output Source Special Operator In this example the audio processing procedure is represented in a form of a Block Diagram. The above block diagram should be interpreted as follows: y n x n 2 x n 1 In general a block diagram can contain more types of operators. A block diagram that is limited to the above operators is a Linear Transformation or a Linear Audio Processor. Consider section 2.5.2 about linearity. 15 2.6.2 Shock Response A Shock Response is a vector h n V where V is signal vector space of the type defined in section 2.2.1. The output y n of an audio processor defined by Shock Response h n and an input signal x n is defined as follows: y n h m x n m m This operation is known as "Convolution". Any Linear Time Invariant (LTI) processing system can be represented in a form of a Convolution. That includes the Block Diagram and the Frequency Response representation. Consider section 2.6.1 where we had a specific block diagram defining the output y n such as * y n x n 2 x n 1 . If we define h n 1, 2 and put it into the Convolution operation we retrieve the following: y n 1 m m0 h m x n m h m x n m h 0x n h 1 x n 1 x n 2x n 1 1 2 * A Block Diagram always defines y n as a linear dependency by a series of x n m where m are any integers such as m 0 and therefore is equivalent to a "Convolution" with a Shock Response vector. 16 2.6.3 Frequency Response Audio Processors can be represented by their Frequency Response The above chart representing an audio processor taking the role of a frequency filter. It shows what frequencies pass through the processor and how they are amplified / attenuated. The Frequency Response is a graph defined by a function H of a continuous variable such as . stands for the lowest achievable frequency ( 22050Hz ( and stands for the highest achievable frequency ( 22050Hz (. The horizontal axis scale is limited reaches up to 22050 Hz which is the highest frequency supported by Consumer Audio (see section 2.4 for more about Consumer Audio Specifications). Frequency Filtering is a linear operation and therefore can be represented in a form of Block Diagram and Shock Response as well. To convert the Frequency Response representation to a Shock Response vector the following calculation is required: 1 h n 2 H e i n d This is a version of a Fourier Transformation which converts from "Frequency Domain" to a discrete "Time Domain". A Shock Response h n is the "Time Domain" where the Frequency Response H is the "Frequency Domain". 17 3 Common Software Architecture This chapter introduces the common software development environment, framework and fundamental data types used across the project. 3.1 Target Machine and Operation System The project target machine is any x86/amd64 compatible running Windows 7. 3.2 Development Environment Microsoft Visual Studio 2010 is used to author and to be used as a complete software development environment across the project. 3.3 Programming Languages C++ is used as a main programming language. Most of the project code is aimed to achieve performance. Parallel programming and software optimizations are used in this project and are further maximized under native C++ code. .NET C# is used only as a complementary development language to code only graphic user interface. 3.4 Programming Frameworks Standard Template Library (STL) is used extensively across the project for the following types: vectors, strings, lists queues maps and multimaps. .NET Framework 4.0 is used naturally whenever .NET C# code takes place. 3.5 Exceptions Exceptions are used extensively across the project. The following sections introduce the data types which represents basic exceptions across the project. 3.5.1 Exception Base Class The base class for exceptions is the std::exception. All exceptions in our project inherit from this type. 18 3.5.2 HRESULT Exception HRESULT Exception represents an HRESULT failure returned by a WIN32 API function. All errors returned in a form of an HRESULT failure are translated to CHRException type which represents the failure as an exception rather than an error code. The benefit from using CHRException is that after being initialized by the specific error code returned by the Windows API, it automatically translates the error code to a readable std::string. Conversion from HRESULT to a readable string is done internally using FormatMessage (Windows API). 19 3.6 "2 Channel Stereo" Signal Representation The following section introduce a data type for representing "2 Channel Stereo" audio signal. 3.6.1 Audio Signal Sample A single audio signal sample is a complex number represented by the type TComplex. L is the real part of the complex number and it represents a left audio channel value. R is the imaginary part of the complex number and it represents a right audio channel value. 3.6.2 Audio Signal A 2 channel audio configuration can be represented in form of a complex function. A discrete time complex function is a series of samples. We use TComplex * as a pointer to a continuous block of audio samples introduced in section 3.2.1. Such series of samples pointed by TComplex * are a discrete time complex function and it therefore represents a discrete time 2 channel audio signal. All audio signals in our project are of type TComplex *. 20 4 Audio Streaming Engine "Audio Streaming Engine" is our software system solution for a playback process integrating a generic audio processing. It is coded entirely in C++. In addition to the C++ API it exposes a .NET API and therefore all .NET languages such as C#, J# and VB.NET can make use of the streaming engine. 4.1 Audio Streaming Engine Requirements The following sections introduce major requirements from the playback system - Audio Streaming Engine. 4.1.1 Audio Streaming Audio streaming is the process in which a piece of audio is played continuously on PC or any other digital system. The streaming solution should be able to monitor and control playback of consumer audio. 4.1.2 Generic Audio Processing The system should allow integration and appliance of an external audio processor as part of the audio streaming process. 4.1.3 Modularity The system should be composed of logical components to benefit from modularity. The streaming engine should not be aware of tasks such as decoding the format of consumer audio files. It should not be aware of tasks such coding into a sound hardware/other output device. 4.1.4 Programming API for C++ and .NET The system exposes both a C++ API and an equivalent .NET API. 21 4.2 Streaming Engine Architecture Diagram "Input Device" IInputDevice Wave File "Audio Streamer" "Output Device" CStreamer IOutputDevice Input Signal Output Signal DirectX XAudio2 "Audio Processor" IAudioProcessor Type CStreamer IInputDevice IOutputDevice IAudioProcessor Semantics Concrete class representing the streaming engine. Abstract class representing the Input Device. Abstract class representing the Output Device. Abstract class representing the Audio Processor. 22 4.3 Audio Streaming Engine Classes The following sections summarize the major public classes which forms Audio Streaming Engine. 4.3.1 Audio Streamer Class – CStreamer CStreamer is a concrete class implementing the audio streaming procedure. It allows continues playback from an "Input Device" to an "Output Device". 4.3.2 Input Device Abstract Class - IInputDevice IInputDevice is a pure abstract class representing an Input Device – The audio signal source. An Input Device can represent an Audio-CD or a Wave file or even a live audio source from the Internet. 4.3.3 Input Device Factory Abstract Class – IInputDeviceFactory IInputDeviceFactory is a pure abstract class representing a factory for an Input Device. The Audio Streamer class does not accept an Input Device directly but rather accept the appropriate factory to instantiate the Input Device. 4.3.4 Wave Input Class – CWaveInput CWaveInput is a concrete class representing a Wave file audio source (Microsoft RIFF media format PCM 44.1Khz 16Bit Stereo). It is implementing the IInputDevice interface. 4.3.5 Output Device Abstract Class – IOutputDevice IOutputDevice is a pure abstract class representing an Output Device. An Output Device can represent the sound hardware or even a Wave file as the final target for streaming. 4.3.6 Output Device Factory Abstract Class – IOutputDeviceFactory IOutputDeviceFactory is a pure abstract class representing a factory for an Output Device. The Audio Streamer class does not accept an Output Device directly but rather accept the appropriate factory to instantiate the Output Device. 4.3.7 XAudio2Output Class – CXAudio2Output CXAudio2Output is a concrete class representing the sound hardware. It is based on the XAudio2 API to send audio data for playback on the sound hardware. It is implementing the IOutputDevice interface. 4.3.8 Audio Processor Abstract Class – IAudioProcessor IAudioProcessor is a pure abstract class representing the Audio Processor to be applied during the streaming process. Any object implementing this interface can serve as an Audio Processor for the Audio Streaming Engine 23 4.4 Audio Streamer Class – CStreamer CStreamer class is a concrete implementation for an audio streaming process. A streaming process involves negotiation between an Input Device and an Output Device. It is a special case of a known "Consumer Producer Problem". The Output Device is the consumer where the Input Device is the producer. The primary goal of the Audio Streamer is to make sure an Output Device connected to the streamer will never be starved. The streamer accepts basic commands such as Play, Rewind, Pause, Stop and Seek. A generic audio processor can also be connected to the streamer so it will be applied on the audio signal during streaming. A starvation is a case in which the Output Device has no more audio data to play and is waiting for additional delivery of audio data. A starving Output Device representing a sound hardware waiting for audio data. During waiting, no more sound is generated and the playback is ceased and no more fluent. A starvation should never happen unless a failure of the Input Device to decode/extract the audio data from the media source. Such failure may occur if a removable disk is removed or an Internet address is no longer available. In order to prevent an Output Device from getting starvation, it notifies the audio streamer when it requires more audio data. The Output Device issues the notification enough time before it actually enters into a starvation status. When the streamer receives the notification originated in the Output Device, it orders the Input Device to decode more audio data to be delivered into the Output Device. The streamer is implemented as a listening thread waiting on a queue of notifications originated in the Output Device. When the queue is no longer empty, the listening thread consumes the event accumulated on the queue and further process it. Thread procedure (Pseudo Code): (1) Wait here as long as the Queue is empty. Dequeue an event from the Queue. Process the event. Go back to (1). Implementation as a thread makes the Audio Streamer API a NON-BLOCKING API. It allows playback in the background while the program flow goes on with the rest of the code. The streamer protectedly inherits from CEventDrivenThread and benefits automation from an event driven thread which is exactly what an audio streaming process is. 24 4.4.1 Audio Streaming Process Diagram CStreamer CEventDrivenThread IInputDevice "Control" Event Processor Dequeue Event Blocking Queue(FIFO) Event Event .... Enqueue RequestFeed() Event Audio Data Audio Data IAudioProcessor Audio Data (pX, nXLength) Audio Data (ppY, pnYLength) 25 IOutputDevice 4.4.2 Audio Streamer Class Diagram The audio streamer is a non-blocking API which allows control over the playback of an audio signal. It exposes basic commands such as Play, Rewind, Pause, Stop and Seek. 26 27 4.5 Input and Output Devices Input and Output Devices are entities which represent the source and the target in a streaming process. 4.5.1 Input Device Abstract Class - IInputDevice IInputDevice is an abstract class representing the Input Device. An Input Device in the streaming process is a decoder for a specific audio data. The audio data can be a Wave File or Audio CD or any other well-defined audio format including a live source of audio from the Internet. The decoded audio is delivered to the Audio Streamer in a digital representation that is valid and acceptable by the Audio Streamer. The Audio Streamer itself does not aware of the decoding task nor it know how to actually decode a Wave file or any other audio format. Input Device is the representative of the audio media source in front of the Audio Streamer. 4.5.2 Input Device Class Diagram A concrete Input Device ships with the appropriate concrete factory. The Audio Streamer does not accept an Input Device dynamically but rather accepts the appropriate factory during initialization of the streamer. Input Device is an entity which implements basic methods such as Read and SetPosition to allow the streamer to command the Input Device reading or setting the position on the linear time line of the audio signal. 28 4.5.3 Output Device Abstract Class - IOutputDevice IOutputDevice is an abstract class representing the Output Device. An Output Device in the streaming process is an encoder for a specific type of audio target. The audio target can be a sound hardware or even a Wave file. The audio streamer delivers the signal in a digital representation that is valid and acceptable by the Output Device. The Audio Streamer itself does not aware of the encoding task nor it know how to actually handle a sound hardware or a target Wave file. Output Device is usually the representative of the sound hardware. 4.5.4 Output Device Class Diagram A concrete Output Device ships with the appropriate factory. The Audio Streamer does not accept an Output Device dynamically but rather accepts the appropriate factory during initialization of the streamer. Output Device is an entity which implements basic methods such as Write to allow the streamer to command the Output Device to write more audio data to the Output Device. 29 4.6 Wave Input Device Class - CWaveInput CWaveInput is a concrete implementation for IInputDevice. It represents a Wave File decoder compatible for the audio streamer. It ships with an appropriate factory CWaveInputFactory which is a concrete implementation for IInputDeviceFactory. 30 4.7 XAudio2 Output Device Class – CXAudio2Output CXAudio2Output is a concrete implementation for IOutputDevice. It represents the sound hardware as an Output Device for the audio streamer. It is using Microsoft DirectX-XAudio2 API to handle the sound hardware. It ships with an appropriate factory CXAudio2OutputFactory which is a concrete implementation for IOutputDeviceFactory. 31 CXAudio2Output may throw an exception of type CXAudio2OutputException. The exception may be for example thrown when a sound hardware is not available. Microsoft XAudio2 API is a COM interface. Since every COM Component requires initialization of the Windows COM Library, a helper class CComLibrary is internally used by CXAudio2Output to automatically initialize and automatically release the Windows COM Library when it is no longer needed. XAudio2 is a relatively new audio API. XAudio2 was originally and exclusively part of the Microsoft XBOX development framework. When it was evaluated as a successful API, easier to use than DirectSound, XAudio2 was officially made part of Windows DirectX. It replaces the deprecated Direct Sound. 32 4.8 Audio Processor Abstract Class – IAudioProcessor IAudioProcessor is a pure abstract class representing the Audio Processor to be applied during the streaming process. Any object implementing this interface can serve as an Audio Processor for the Audio Streaming Engine Any audio processor compatible to the Audio Streaming Engine must implement and be obliged to the IAudioProcessor interface (Pure abstract class). The pure virtual function Process accepts an input signal pointed by pX with the number of samples notated by nXLength. The output of the audio processor is pointed by *ppY and the number of samples of the output is pointed by pnYLength. Our Audio Processing Engine is obliged to this interface as well. 33 5 Audio Processing Engine "Audio Processing Engine" is our software system solution for a real time audio processor. The main class for the audio processor is CLTISystem which is a Linear Time Invariant signal processing system. It is a concrete implementation for IAudioProcessor and therefore it can be connected to the audio streamer (CStreamer) introduced in chapter 3. It is coded entirely in C++ to further benefit software optimizations and parallel programming. In addition to the C++ API it exposes a .NET API and therefore all .NET languages such as C#, J# and VB.NET can make use of the streaming engine. 5.1 Audio Processing Engine Requirements The following sections introduce major requirements from the audio processing engine. 5.1.1 Linear System The audio processing engine is a Linear Time Invariant signal processing system. It follows the definition of a linear audio processor introduced in section 2.5.2. We repeat the definition for the sake of integrity. A linear audio processor T is an Audio Processor defined in section 2.5.1 for which the following two additional properties hold: Assuming x1 n is the input signal and its corresponding output is y1 n . If we define x n x1 n as the input of the system, then the output is the signal y n such as y n y1 n . Assuming x1 n and x2 n are two separate signals and their output are the signals y1 n and y2 n correspondingly. If we define x n x1 n x2 n as the input of the system, then the output of the system y n is the summation of the separate outputs y1 n and y2 n such as y n y1 n y2 n . 34 5.1.2 Programming by Shock Response The system can be programmed by Shock Response. See section 2.6.3. We repeat the definition for the sake of integrity: A Shock Response is a vector h n V where V is signal vector space of the type defined in section 2.2.1. The output of an audio processor defined by Shock Response h n and an input signal x n is defined as follows: y n h m x n m m This operation is known as "Convolution". 5.1.3 Programming by Frequency Response The system can be programmed by Frequency Response. See section 2.6.2. Audio Processors can be represented by their Frequency Response The above chart representing an audio processor taking the role of a frequency filter. It shows what frequencies pass through the processor and how they are amplified / attenuated. The horizontal axis scale is limited reaches up to 22050 Hz which is the highest frequency supported by Consumer Audio (see section 2.4 for more about Consumer Audio Specifications). Frequency Filtering is a linear operation and therefore can be represented in a form of Shock Response as well. 35 5.1.4 Programming by Block Diagram The system can be programmed by Block Diagrams. out in D −2 The Block Diagram is constrained by the following specifications: The diagram is a Connected Directed Acyclic Graph (C-DAG). The nodes of the diagram are the collection defined in the following table: Diagram Block D in out H Semantics Block Type Register / Delay by 1 Sample Unary operator Multiplication by Unary operator Summation Binary Operator Splitter Special Operator Input Source Special Operator Output Source Special Operator A hierarchy node which represents a nested diagram/other design method. Unary Operator Consider section 2.6.1 about representing an audio processor in form of a Block Diagram. 36 5.1.5 Real Time The system can be used to process consumer audio in real time. 5.1.6 Scale The system should be able to handle real time audio processing with Block Diagrams containing ten thousands of basic components. 5.1.7 C++ and .NET API The system exposes both a C++ API and an equivalent .NET API. 37 5.2 Audio Processing Engine Challenge The Audio Processing Engine is a real time simulation for systems represented by Block Diagrams, Frequency Response and Shock Response Representation. See section 2.6 for Audio Processor Representation and section 5.1 about Audio Processing Engine Requirements. The following sections introduce common simulation methods to apply real time audio processing represented by Block Diagram. out in D −2 5.2.1 Block Diagram Simulation Block Diagram Simulation method means managing a complete repository of values among all diagram nodes and connections. It requires taking each audio sample from x n and processing it through each and every diagram node and connection. Considering requirement 5.1.6 about ten thousands of components in such diagram and Consumer Audio Specification from section 2.4 about 44100 samples per second, such simulation for about a second of Consumer Audio requires billions of calculation by complex values per second. A modern PC does not handle billions of complex values multiplications / summations per second. Not even close to it. The advantage of such simulation is simplicity and ability to handle any Block Diagrams, not just linear systems. The disadvantage of such simulation is that it does not handle more than a few hundreds components in a diagram under Consumer Audio resolution. Therefore such simulation is ruled out for our Audio Processing Engine. 38 5.2.2 Shock Response Simulation Another method for simulation is taking the Block Diagram, analyzing it to extract the values of the Shock Response vector h n and then simulation the output y n according to section 2.6.2. y n h m x n m m The disadvantage in such method is the requirement for a preliminary analyzing stage on the Block Diagram which must take place before actual simulation. It handles in real time a Block Diagram which generates a Shock Response vector of few hundreds of components only. The bigger the Block Diagram the bigger the Shock Response vector is. The commulative multiplications and summation in complex values per second required to handle large block diagrams is billions. Therefore such simulation is ruled out for out Audio Processing. 39 5.3 Audio Processing Engine Architecture The following sections introduce how real time and capacity requirements introduced in section 5.1 are handled by this audio processor. 5.3.1 Simulation by Polynomial Multiplication Consider the Block Diagram example from section 2.6.1. out in D −2 Assuming x n is the input signal x n * y n 3, 4 then the output y n is x n 2 x n 1 3, 4 2 0,3, 4 3, 2, 8 This calculation can be represented as a simple multiplication between two Polynoms. One Polynom represents the input signal: x n 3, 4 3 4 x The other polynom represents the Block Diagram: 1x n 2 x n 1 1 2 x y n The output of the system can be obtained by multiplying the Polynoms: 3 4 x 1 2 x 3 2 x 8x2 Pay attention to the coefficients of the resulted Polynom. They are the same values resulted in * . A Block Diagram can always be reduced to a Polynom multiplication where one polynom represents the input signal and the other represents the Block Diagram. Simulation by Polynomial multiplication requires handling with large Polynoms to achieve the capacity requirement of the system. 40 5.3.2 Fast Polynomial Multiplication (Fast Convolution) Efficient multiplication between two Polynoms can be done in time complexity of O N log N instead of standard complexity of O N 2 . Let h n be the Shock Response of the system and x n is the input signal of the system. The output of the system y n can be calculated by using the calculation introduced in section 2.6.2 such as: h m x n m y n m ; ON2 This calculation is actually the same calculation in conventional Polynom multiplications. Instead the output of the system y n can be calculated Instead of executing the calculation for y n introduced in section 2.6.2 the output y n can be obtained by the following method: y n DFT 1 DFT h n DFT x n ; O N log N Where DFT x n and DFT 1 v k are defined as follows: N 1 x n e DFT x n k 2 ikn N n 0 2 ikn 1 N 1 DFT v k v k e N n N k 0 k , n , 0 k , n N 1 1 x n and v k are finite signals the size of N samples. DFT stands for Discrete Fourier Transform The DFT can be executed under time constraint of O N log N using FFT and therefore reducing the total time required for such Polynomial multiplication from O N 2 to O N log N . See section 5.12 about FFT. 41 5.3.3 Block Diagram Topology Analyzer In order to process the input signal by Fast Polynomial Multiplication introduced by sections 5.3.1 and 5.3.2, a Block Diagram topology must be analyzed and converted to a series of coefficients representing the characteristic Polynom of the Block Diagram. Deriving the characteristic Polynom or the Shock Response vector of a Block Diagram requires an algorithm whose input is a Block Diagram and its output are the coefficients of the Shock Response vector. See section 5.5 about Block Diagram analysis. 42 5.4 Audio Processing Engine Architecture Diagram The following diagram shows how the Audio Processing Engine reducing each of its programming methods whether it is a Block Diagram or Frequency to a Shock Response vector and then applying Fast Polynomial Multiplication to process the input signal with the appropriate Shock Response. Audio Processing Engine Block Diagram Topology Analyzer Frequency Response Shock Response vector Fast Convolution Input Signal FFT Output Signal 43 5.5 Block Diagram Topology Analysis This section introduces a method for reducing a Block Diagram to its corresponding characteristic Polynom or Shock Response vector. This algorithm accepts a Block Diagram introduced in section 5.1 as an input and generates an output of the corresponding Shock Response vector. The algorithm is divided to 3 main stages: Converting Block Diagram to an Operator Tree. Refactoring the Operator Tree to obtain an Optimized Operator Tree. Deriving the Shock Response vector from the Optimized Operator Tree. 5.5.1 Example Consider the following Block Diagram. out in D 2 D 3 Given the above Block Diagram the output of the algorithm is the following Shock Response vector h n 1, 2,3 . See section 2.6.2 about Shock Response. 44 5.5.2 Block Diagram Concatenation Consider the following Block Diagram. out in D D 2 4 D 3 Given the above Block Diagram the output of the algorithm is the following Shock Response vector h n 1, 6,11,12 . See section 2.6.2 about Shock Response. The topology of the Block Diagram in this example is case where two Block Diagrams are connected by concatenation. It is the topology introduced in section 5.5.1 concatenated with the topology introduced in section 2.6.1. The final Shock Response vector h n can be obtained by two separate Shock Responses each for each topology in the concatenation. Let h1 n 1, 2,3 be the Shock Response of the first topology in the concatenation. Let h2 n 1, 4 be the Shock Response of the second topology in the concatenarion. h n can be obtained by Convolution such as h n h m h n m 1 2 m Since Convolution can be reduced to Fast Polynomial Multiplication, such concatenations can be handled and analyzed by time complexity of O N log N assuming each of the topologies analysis is bound under that complexity as well. See section 5.3.2 about Fast Polynomial Multiplication. 45 5.5.2 Operator Tree An Operator Tree is a directed tree with the following nodes. Tree Node Graphic Symbol D Node Name Node Type "Delay Node" Operand (Leaf) "Multiplication Node" Binary Operator "Summation Node" Binary Operator "Numerical Constant Node" Operand )Leaf) 5.5.3 Block Diagram Representation as Operator Tree The Block Diagram from section 5.5.1 is represented as the following Operator Tree. 1 D 2 3 D This operator tree should be interpreted as follows: * 1 D 2 3 D If we treat D as an operand in a multiplication then we can open the parentheses and rewrite * as follows: 1 D 2 3 D 1 2 D 3D 2 which is a Polynom of variable D Furthermore, it is the characteristic Polynom for Block Diagram 5.5.1 required to process an input signal with Fast Polynomial Multiplication introduced in section 5.3.2. 46 The Block Diagram from section 5.5.2 is represented as the following Operator Tree. 1 D D 4 2 D 1 3 4 D 1 D 4 This operator tree should be interpreted as follows: 1 4 D D 2 1 4 D 3 D 1 4 D This interpretation is the characteristic Polynom or the Shock Response vector of the Block Diagram introduced in section 5.5.2. Pay attention how the same sub-tree surrounded by dashed line is repeated / pointed 3 times and how 1 4 D is repeated / pointed 3 times in the characteristic Polynom. A node which is pointed more than once is a key to optimization in the next section. 47 5.5.4 Operator Tree Optimization Consider the following algebraic maneuver of "Common Term Extraction" to the above Polynom: 1 D 2 1 3D 1 1 4D The following operator tree reflects "Common Term Extraction" optimization: 1 1 4 D D 2 1 D 3 1 The "Common Term Extraction" leaves unit constants as place holders for the original place of the common term. Such factoring generates additional multiplication node which takes the previous refactored tree and the common extracted term as operands. This optimization reducing the original Polynom to a multiplication between two smaller Polynoms: 1 D 2 1 3D 1 1 4D An Optimized Operator Tree is an Operator Tree with no repetition of sub-trees. This optimization is repeated until an Optimized Operator Tree is obtained. 48 5.5.5 Deriving Shock Response from an Optimized Operator Tree An Optimized Tree has a typical structure which describes a series of multiplications between Polynoms such as: Polynom B Polynom A Polynom A Polynom B Multiply By Fast Convolution Applying Fast Polynomial Multiplication (or Fast Convolution) repeatedly on the Optimized Operator Tree will reduce the tree to a series of coefficients which forms the Shock Response vector or the coefficients of the characteristic Polynom of the original Block Diagram. 5.5.6 Time Complexity of Block Diagram Analysis The algorithm is divided to 3 main stages: Converting Block Diagram to an Operator Tree. Using DFS (Depth First Search) an Operator Tree can be generated by O E N where E is the size of the connections group on the Block Diagram and N is the size of the nodes group on the Block Diagram. Refactoring the Operator Tree to obtain an Optimized Operator Tree. A repeated sub-tree can not repeat more than E times (the number of total connections on the Block Diagram). Therefore replacing the original places of the sub-tree with the unit constant and inserting the artificial multiplier can not exceed time complexity of O E . Deriving the Shock Response vector from the Optimized Operator Tree. Derivation of the Shock Response vector is bounded by the time complexity of Fast Polynomial Multiplication which is O E log E . Total Block Diagram analysis complexity is bounded under O N E log E . 49 5.6 Audio Processing Engine Classes The following sections summarize the major public classes which forms Audio Processing Engine. 5.6.1 LTI System Class – CLTISystem CLTISystem is a concrete class implementing the IAudioProcessor abstract class (See secion 4.7 about IAudioProcessor(. CLTISystem is an abstraction for an LTI System which can be executed in real time to process input signals and generates output signals. LTI System accepts a Shock Response vector as a processing operation instruction. Any other form of design method such as Block Diagram or Frequency Response can be reduced to its corresponding Shock Response and then used by CLTISystem to apply the audio processing. 5.6.3 Topology Class – CTopology CTopology is a concrete class representing operations on a Block Diagram topology such as connection/disconnection of nodes on a diagram. The Topology class is then used to analyze the Block Diagram to derive the corresponding Shock Response vector. See section 5.1.4 about Block Diagram and section 5.5 about Block Diagram Topology analysis. The derived Shock Response vector is then used by the LTI System (CLTISystem) to apply the audio processing in real time. 5.6.4 Adder Class – CAdder CAdder is a concrete class represents an Adder node in a Block Diagram. See section 5.1.4 about Block Diagram. Adder is a summation over 2 or more incoming audio signals. 5.6.5 Multiplier Class – CMultiplier CMultiplier is a concrete class represents a Multiplier in a Block Diagram. See section 5.1.4 about Block Diagram. A Multiplier takes an incoming signal and amplify / attenuate it by a constant complex factor. The Multiplier is an unary operator in a Block Diagram. 5.6.6 Delay Class – CDelay CDelay is a concrete class represents a Delay in a Block Diagram. See section 5.1.4 about Block Diagram. A Delay takes an incoming signal and delays it by one sample or more according to initialization of the node. The Delayis an unary operator in a Block Diagram. 5.6.7 Splitter Class – CSplitter CSplitter is a concrete class represents a splitter in a Block Diagram. See section 5.1.4 about Block Diagram. A Splitter takes an incoming signal and generates 2 identical copies of the original signal. 50 5.6.8 Block Class - CBlock CBlock is a concrete class represents an arbitrary nested design (Block Diagram or Frequency Response or any arbitrary Shock Response vector). See section 5.1.4 about Block Diagram. This node allows nesting designs inside designs. 5.6.9 Input Class – CIn CIn is a concrete class represents an input node in a Block Diagram. See section 5.1.4 about Block Diagram. Only one input is allowed per Block Diagram. 5.6.10 Output Class – COut COut is a concrete class represents an output node in a Block Diagram. See section 5.1.4 about Block Diagram. Only one output is allowed per Block Diagram. 5.6.11 Low Pass Class – CLowPass CLowPass is a concrete class represents the Shock Response of a Low Pass filter. This audio filter is initialized by construction with specifications required from this filter and then generates the appropriate Shock Response vector. The Shock Response vector is then consumed by the LTI System class (CLTISystem) to apply the filter in real time. 5.6.12 High Pass Class – CHighPass CHighPass is a concrete class represents the Shock Response of a High Pass filter. This audio filter is initialized by construction with specifications required from this filter and then generates the appropriate Shock Response vector. The Shock Response vector is then consumed by the LTI System class (CLTISystem) to apply the filter in real time. 5.6.13 Band Pass Class – CBandPass CBandPass is a concrete class represents the Shock Response of a Band Pass filter. This audio filter is initialized by construction with specifications required from this filter and then generates the appropriate Shock Response vector. The Shock Response vector is then consumed by the LTI System class (CLTISystem) to apply the filter in real time. 5.6.14 Multi Band Class – CMultiBand CMultiBand is a concrete class represents the Shock Response of a Multi Band Pass filter. This audio filter is initialized by construction with specifications required from this filter and then generates the appropriate Shock Response vector. The Shock Response vector is then consumed by the LTI System class (CLTISystem) to apply the filter in real time. 51 5.7 LTI System Class - CLTISystem CLTISystem is a concrete class implementing the IAudioProcessor abstract class (See secion 4.7 about IAudioProcessor(. CLTISystem is an abstraction for an LTI System which can be executed in real time to process input signals and generates output signals.LTI System accepts a Shock Response vector only. Any other form of design method such as Block Diagram or Frequency Response can be reduced to its corresponding Shock Response and then used by CLTISystem to apply the audio processing. Process function is called by the user whenever a new signal is required to be processed. SetTransferFunction is called by the user whenever a new Shock Response vector is set upon the system. 52 5.9 Block-Diagram Public Class Diagram The following are the public classes which forms the Block Diagram abstraction in our Audio Processing Engine. 53 5.10 Block-Diagram Public Exceptions Class Diagram The following are the public exceptions which forms exception for the Block Diagram abstraction in our Audio Processing Engine. 54 5.10 Filter Public Class Diagram The following section introduces the audio filters class diagram. See section 5.6 about Audio Processing Engine classes. 55 5.11 Topology Analyzer Internal Classes The following section introduces the internal classes implementing the Block Diagram Analysis introduced in section 5.5. 5.11.1 Topology Analyzer Class – CTopologyAnalyzer The following is an internal private class which implements the algorithm introduced in section 5.5. 56 5.11.2 Transfer Function Class – CTransferFunction The following is an internal private class which internally represents a "Transfer Function" or a Shock Response for a sub-tree in an Operator Tree. 57 5.12 Multi Core Fast Fourier Transform (FFT) FFT is a primary functional unit in our Audio Processing Engine. It serves both the actual signal processing and both the Block Diagram topology analysis. The following section introduces a specific version of FFT known as Radix-4 none-recursive supported by Trigonometric and Bit-Reversal look-up tables. Fast Fourier Transform is a family of fast algorithms which execute the DFT under time constraint of O N log N instead of O N 2 where N is the number of samples transformed by the FFT. 5.12.1 Discrete Fourier Transform – DFT The DFT is defined as follows: DFTN x k N 1 x n e 2 ikn N n 0 Where x is defined such as x n is a valid sample for n 0,1, 2,..., N 1 . Calculation of DFTN for a specific k k0 requires N summations and multiplications of complex values. Calculation of DFTN for k 0,1, 2,3,..., N 1 requires the time complexity of O N 2 The next section will introduce Radix-4 "Decimation In Time" algebraic maneuver on the original DFT. This allows reducing the computational time complexity from O N 2 to O N log N . 58 5.12.2 Radix-4 "Decimation in Time" The following algebraic maneuvers show how to define the original DFT the size of N samples by terms of 4 smaller DFTs the size of N/4 samples each. N 1 x n e DFTN x k N /4 1 x 4n e x 4n e 2 ikn N n 0 2 ik 4 n N N /4 1 n 0 N /4 1 x 4n 1 e 2 ik 4 n 1 N n 0 2 ikn N /4 n 0 e 2 ik N N /4 1 x 4n 1 e n 0 DFTN /4 x1 k N /4 1 N /4 1 x 4n 2 x 4n 3 e n 0 2 ikn N /4 e 2 ik 4 n 3 N n 0 4 ik N N /4 1 x 4n 2 e n 0 2 ikn N /4 6 ik e N 2 ikn N /41 N /4 x 4 n 3 e n 0 DFTN /4 x3 k DFTN /4 x2 k DFTN /4 x4 k 2Nik 4Nik 6Nik DFTN /4 x1 k e DFTN /4 x2 k e DFTN /4 x3 k e DFTN /4 x4 k * DFTN x k 2Nik 4Nik 6Nik DFTN /4 x1 k e DFTN /4 x2 k e DFTN /4 x3 k e DFTN /4 x4 k We obtained an alternative definition * to the DFT by means of smaller DFTs the size of N/4 samples operate on the following signals: x1 n x2 n x 4n 1 , x3 n x 4n 2 and x4 n x 4n , x 4n 3 . This result is known as Radix-4 "Decimation in Time". Result * is the recursive step in case of the Radix-4 recursive version. It is the start point when untying the recursion to obtain a non-recursive procedure. 59 5.12.3 Radix-4 Recursive FFT The following is a pseudo code for a recursive Radix-4 FFT defined by the recursive step obtained in section 5.12.2 FFTN x begin if N=1 then return x 0 for n 0,1, 2,3,.., N / 4 1 define the following signals begin x1 n x 4n x2 n x 4n 1 x3 n x 4n 2 x4 n x 4n 3 end for k = 0 to N-1 do 2 ik x k FFTN /4 x1 k e N return x end 4Nik FFT x e N /4 2 k 6Nik FFT x e N / 4 3 k 60 FFTN /4 x4 k 5.12.4 Recursive Call Tree of Radix-4 FFT The following is the recursive call tree of a Recusrive Radix-4 FFT. The leaves are FFTs the size of a single sample. The call tree has the depth of log 4 N . Each row on the tree is executed under O N time constraint and therefore under total complexity of O N log N . 61 5.12.5 Look-Up Tables Optimization An FFT requires the values of e 2 in N 2 in 2 in Cos iSin for n 0,1, 2,3,..., N 1 . N N In order to prevent repeated recalculation of trigonometric values an appropriate look-up table is constructed. Index 0 1 2 3 Cosine Value 1 Sine Value 0 2 i1 Cos N 2 i 2 Cos N 2 i3 Cos N 2 i1 Sin N 2 i 2 Sin N 2 i3 Sin N . . . N-1 2 i N 1 Cos N 2 i N 1 Sin N On the none-recursive version another Look-Up table for Bit-Reversal is required to re-order the values of the final FFT result. We shall not discuss here the Bit-Reversal Look-Up table nor we discuss why a re-order is needed. 62 5.12.6 Radix-4 FFT Parallelism The following is an illustration of how Radix-4 FFT is being paralleled by 4 different threads, each running on a separate core. Thread 0 Thread 1 Thread 2 63 Thread 3 5.12.7 FFT Class Diagram The following is the class diagram of the Multi Core Radix-4 none-recursive FFT supported by Trigonometric and Bit-Reversal look-up tables. 64 The following is an internal class representing a parallel unit running under a thread in our FFT. 65 6 Visual Audio Processor Visual Audio Processor is a Windows MDI (Multiple Document Interface) GUI exposing the capabilities of the Audio Processing Engine, offer easy design, composition and exploration of different audio processors visually. It is coded entirely using .NET C# 4.0 and using the .NET API provided by the Audio Streaming Engine and Audio Processing Engine. 6.1 Visual Audio Processor Requirements The following section introduces the requirements from Visual Audio Processor. 6.1.1 Windows MDI (Multiple Document Interface) GUI The software has a graphical user interface of a main window encapsulation several documents opened and visualized simultaneously. Project Documents List Multiple Documents Area 66 6.1.2 Block Diagram User Interface The system allows creating, editing loading and saving of a Block Diagram visually. See section 5.1.4 about the Block Diagram supported by the Audio Processing Engine. Node Properties Block Diagram Area Diagram Nodes 6.1.3 Frequency Response User Interface The system allows creating, editing, loading and saving of a Frequency Response visually. See section 5.1.3 about Frequency Response supported by the Audio Processing Engine. Frequency Response Chart Frequency Constraints List 67 6.1.4 Audio Streamer User Interface The system provides an audio streaming window for which media files can be loaded and played while Audio Processing Engine is applied on the audio signal in real time. 68 6.2 Visual Audio Processor Architecture Visual Audio Processor is a GUI representation for the Audio Processing Engine and the Audio Streaming Engine. It is coded entirely using .NET C# 4.0 and using the .NET API provided by the Audio Streaming Engine and Audio Processing Engine. Since the .NET API for audio processing and audio streaming is a complete API analogues to the original C++ API, the C# code is clean from any logic involving streaming/processing of audio signal. Everything is executed using the object model provided by the audio streaming and audio processing engine. The focus of the design of this windows application is about finding the common behavior among the main user interfaces such as Block Diagram and Frequency Response user interfaces. Such common behavior is implemented by base classes. 6.2.1 Document Form Base Class The document form base class conceals the common behavior of two major user interfaces – Block Diagram and Frequency Response. This base class defines the common behavior for the following scenarios: User Interface form is closing. "Save" action has been called on the user interface form. "Save As" action has been called on the form. Writing contents of the form to file. (Abstract scenario, implemented by inheritance). 69 6.2.2 Block Diagram Form Class The Block Diagram form class is a concrete implementation to the Document Form abstract class. It implements the Block Diagram user interface. See section 6.1.2 about Block Diagram user interface requirement. This class heavily relies on Microsoft Visio 2010 by using Visio Drawing Control as a drawing solution (ActiveX Control). The shapes which are valid for a drawing in a Block Diagram are defined in a Visio Shape Template file authored especially for Visual Audio Processor. 70 6.2.3 Frequency Response Form Class The Frequency Response form class is a concrete implementation to the Document Form abstract class. It implements the Frequency Response user interface. See section 6.1.3 about Frequency Response user interface requirement. This class heavily relies on Microsoft Chart Control for visualize the Frequency Response graph. 71 6.2.4 Project Class The project class is an abstraction to a well-defined collection of file-references to documents such as Block Diagrams and Frequency Response documents. Such filereferences can be added/removed from the project collection. The project collection can be saved/loaded to/from a file. 72 6.2.5 XML Serialization Class Visual Audio Processor relies on .NET natural serialization. Most of the classes/data structures provided by .NET are immediately serializable to XML. This property is heavily used by Visual Audio Processor to save Frequency Response Files and Project files to disk in XML format. Saving a Block Diagram is handled entirely by Microsoft Visio 2010 drawing control and not by XML serialization introduced in this section. XML Serialization class is a generic class for exporting/importing an object to/from an XML file. 73 6.2.6 Signal Flow Graph Builder Class SFGBuilder class is a primary functional unit which converts the Visio document containing the drawing of the Block Diagram to the actual diagram objects which are acceptable by the Audio Processing Engine. For each Visio Shape on the drawing a corresponding node acceptable by the Audio Processing Engine is created. For each Visio connection between two Visio Shapes, a corresponding connection is made between the appropriate Audio Processing Engine nodes. See section 5.6 about Audio Processing Engine classes. When the complete Block Diagram Visio drawing is converted to the representation acceptable by the Audio Processing Engine, SFGBuilder commands the Audio Processing Engines to derive the Shock Response vector coefficients representing the Block Diagram. 74 6.2.7 Audio Engine Form Class Audio engine form class is the user interface defined in section 6.1.4. It summons the Audio Streaming Engine along with the Audio Processing Engine to playback audio signals originated in Wave files and applying real time audio processing on the streaming signal. 75 7 Visual Studio Solution Structure This chapter introduces the structure of the Visual Studio Solution implementing the Audio Streaming Engine, Audio Processing Engine and Visual Audio Processor. A Visual Studio Solution is a collection of Visual Studio Projects. Each Project is one unit of compilation generating a single binary target file such as EXE, DLL or LIB. The project in our solution may be a native C++ code project or .NET C# project or a CLR. Our complete solution is made of 16 Visual Studio projects. 7.1 Visual Studio Projects The following section introduces the Visual Studio projects which forms our software solution. Project AudioLib\Common AudioLib\Streamer AudioLib\Devices\WaveInput AudioLib\Devices\XAudio2Output AudioLib\DSP\FFTLib AudioLib\DSP\LTISystem AudioLib\DSP\SFG AudioLib\DSP\Filters AudioLib\AudioLib Test\AudioTerminal Test\ConsoleApplication1 Test\FFTCorrectnessTest Test\FFTSpeedTest Test\MyTest .Net.AudioLib\.Net.AudioLib VisualAudioProcessor Abstract Common Data Types and Functionality Audio Streaming Engine Wave input device XAudio2 output device Multi-Core Fast Fourier Transform LTI Audio Processing System Signal Flow Graph (Block Diagram) Frequency Filters Commulative library both the Audio Streaming Engine and Audio Processing Engine Console Application for testing the C++ API of the Audio Streaming Engine and Audio Processing Engine. Console Application for testing the .NET API of the Audio Streaming Engine and Audio Processing Engine. Correctness test of the Multi Core FFT. Performance test of the Multi Core FFT versus the Single Core FFT. Multiband filter test .NET Commulative library both for the Audio Streaming Engine and Audio Processing Engine Visual Audio Processor Application 76 Target Binary NONE LIB LIB LIB NONE LIB LIB LIB LIB Language C++ C++ C++ C++ C++ C++ C++ C++ C++ EXE C++ EXE C++ EXE EXE C++ C++ EXE DLL CLR EXE .NET C# 7.2 Audio Engine C++ Namespace `` audio dsp CStreamer devices IOutputDevice IAudioProcessor CLTISystem : public IAudioProcessor sfg IInputDevice CWaveInput : public IInputDevice fgIIIAudioProcessor nodes filters CFilter CNode : public CGraph::CNode CLowPass : public CFilter CIn : public CNode CHighPass : public CFilter COut : public CNode CBandPass : public CFilter CMultiplier : public CNode CMultiBand : public CFilter CXAudio2Output : public IOutputDevice CDelay : public CNode CAdder : public CNode CSplitter : public CNode CBlock : public CNode CTopology 77 CTopologyException : public std::excpetion 7.3 Audio Engine .NET Namespace `` Audio DSP Streamer Devices OutputDevice AudioProcessor LTISystem : IAudioProcessor SFG InputDevice WaveInput : IInputDevice fgIIIAudioProcessor Nodes Filters Filter Node LowPass : CFilter In : CNode HighPass : CFilter Out : CNode BandPass : CFilter Multiplier : CNode CustomFilter(*) : Filter CDAudioInput(*):IInputDevice XAudio2Output : IOutputDevice Delay : CNode Adder : CNode Splitter : CNode Block : CNode Topology TopologyException : System::Excpetion 78 8 System Requirements The following section summarizes system requirements. 8.1 Audio Streaming Engine Requirements Microsoft Windows 7 Microsoft DirectX SDK June 2010 8.2 Audio Processing Engine Requirements Microsoft Windows 7 Microsoft DirectX SDK June 2010 8.3 Visual Audio Processor Requirements Microsoft Windows 7 Microsoft DirectX SDK June 2010 Microsoft Visio 2010 79 9 Summary 9.1 Further Development The primary goal was to design an audio processing engine. Our processing engine is a linear system which can be programmed, as a main feature, by means of Block Diagrams. The Block Diagram object model can be extended to support Directed Graphs rather than Directed Acyclic Graphs and by this allow even more flexible designs. Extending the Audio Processing Engine and the Audio Streaming Engine to support Multiple Inputs Multiple Outputs (MIMO System). The Audio Processing Engine can be extended to support an array of Processing Engines and a "Super Block Diagram" defining connections and operators between them. The operators between them can be non-linear operations. Visual Audio Processor can be rewritten in C++ instead of its current implementation in C# and abandon its dependency from Microsoft Visio by coding a proprietary drawing solution dedicated for this system and by this increasing the scale of Visual Audio Processor. 9.2 Thanks and Gratitude I would like to thank the following: Dr.Ilana David for supervising and personal care about this project. Victor Kulikov for supplying all the necessary tools to complete the project. 80 10 References 234122 - Introduction to Systems Programming. Effective C++ by Scot Meyers. More Effective C++ by Scot Meyers. 104214 – Fourier-Series and Integral Transformations. 044130 – Signals and Systems. 044198 – Introduction to Digital Signal Processing. 046745 – Digital Signal Processing. 234247 – Algorithms 1. 81