Audio Processor Engine

Transcription

Audio Processor Engine
Real Time Audio Processor
Student
:
Asaf Bercovich
Supervisor :
Dr. Ilana David
1
Contents
Abstract.......................................................................................................................7
1 Introduction ..............................................................................................................8
1.1 Motivation ..........................................................................................................8
1.2 Primary Goal .......................................................................................................8
1.3 Secondary Goal....................................................................................................8
2 Background ...............................................................................................................9
2.1 Audio Channel Configuration.................................................................................9
2.1.1 "2 Channel Stereo" ........................................................................................9
2.1.2 "5.1 Channel Surround" ..................................................................................9
2.2 Audio Signal ...................................................................................................... 10
2.2.1 Audio Signal Definition ................................................................................. 10
2.2.2 Audio Sample Definition ............................................................................... 10
2.2.3 Audio Signal Example ................................................................................... 10
2.2.4 Audio Signal Resolution ................................................................................ 11
2.3 Complex Functions ............................................................................................. 12
2.4 Consumer Audio Specifications............................................................................ 12
2.5 Digital Audio Processor ....................................................................................... 13
2.5.1 Digital Audio Processor Definition.................................................................. 14
2.5.2 Linear Audio Processor ................................................................................. 14
2.6 Audio Processor Representation .......................................................................... 15
2.6.1 Block Diagram ............................................................................................. 15
2.6.2 Shock Response ........................................................................................... 16
2.6.3 Frequency Response .................................................................................... 17
3 Common Software Architecture................................................................................. 18
3.1 Target Machine and Operation System ................................................................. 18
3.2 Development Environment ................................................................................. 18
3.3 Programming Languages ..................................................................................... 18
3.4 Programming Frameworks .................................................................................. 18
3.5 Exceptions ........................................................................................................ 18
3.5.1 Exception Base Class .................................................................................... 18
3.5.2 HRESULT Exception ...................................................................................... 19
3.6 "2 Channel Stereo" Signal Representation ............................................................ 20
3.6.1 Audio Signal Sample ..................................................................................... 20
2
3.6.2 Audio Signal ................................................................................................ 20
4 Audio Streaming Engine ............................................................................................ 21
4.1 Audio Streaming Engine Requirements ................................................................. 21
4.1.1 Audio Streaming .......................................................................................... 21
4.1.2 Generic Audio Processing ............................................................................. 21
4.1.3 Modularity .................................................................................................. 21
4.1.4 Programming API for C++ and .NET ................................................................ 21
4.2 Streaming Engine Architecture Diagram ............................................................... 22
4.3 Audio Streaming Engine Classes........................................................................... 23
4.3.1 Audio Streamer Class – CStreamer ................................................................. 23
4.3.2 Input Device Abstract Class - IInputDevice ...................................................... 23
4.3.3 Input Device Factory Abstract Class – IInputDeviceFactory ................................ 23
4.3.4 Wave Input Class – CWaveInput .................................................................... 23
4.3.5 Output Device Abstract Class – IOutputDevice................................................. 23
4.3.6 Output Device Factory Abstract Class – IOutputDeviceFactory .......................... 23
4.3.7 XAudio2Output Class – CXAudio2Output ........................................................ 23
4.3.8 Audio Processor Abstract Class – IAudioProcessor ........................................... 23
4.4 Audio Streamer Class – CStreamer ....................................................................... 24
4.4.1 Audio Streaming Process Diagram ................................................................. 25
4.4.2 Audio Streamer Class Diagram ...................................................................... 26
4.5 Input and Output Devices ................................................................................... 28
4.5.1 Input Device Abstract Class - IInputDevice ...................................................... 28
4.5.2 Input Device Class Diagram ........................................................................... 28
4.5.3 Output Device Abstract Class - IOutputDevice ................................................. 29
4.5.4 Output Device Class Diagram ........................................................................ 29
4.6 Wave Input Device Class - CWaveInput................................................................. 30
4.7 XAudio2 Output Device Class – CXAudio2Output ................................................... 31
4.8 Audio Processor Abstract Class – IAudioProcessor.................................................. 33
5 Audio Processing Engine ........................................................................................... 34
5.1 Audio Processing Engine Requirements ................................................................ 34
5.1.1 Linear System .............................................................................................. 34
5.1.2 Programming by Shock Response .................................................................. 35
5.1.3 Programming by Frequency Response ............................................................ 35
5.1.4 Programming by Block Diagram ..................................................................... 36
3
5.1.5 Real Time.................................................................................................... 37
5.1.6 Scale .......................................................................................................... 37
5.1.7 C++ and .NET API ......................................................................................... 37
5.2 Audio Processing Engine Challenge ...................................................................... 38
5.2.1 Block Diagram Simulation ............................................................................. 38
5.2.2 Shock Response Simulation........................................................................... 39
5.3 Audio Processing Engine Architecture .................................................................. 40
5.3.1 Simulation by Polynomial Multiplication......................................................... 40
5.3.2 Fast Polynomial Multiplication (Fast Convolution) ........................................... 41
5.3.3 Block Diagram Topology Analyzer .................................................................. 42
5.4 Audio Processing Engine Architecture Diagram ..................................................... 43
5.5 Block Diagram Topology Analysis ......................................................................... 44
5.5.1 Example...................................................................................................... 44
5.5.2 Block Diagram Concatenation ....................................................................... 45
5.5.2 Operator Tree ............................................................................................. 46
5.5.3 Block Diagram Representation as Operator Tree ............................................. 46
5.5.4 Operator Tree Optimization .......................................................................... 48
5.5.5 Deriving Shock Response from an Optimized Operator Tree ............................. 49
5.5.6 Time Complexity of Block Diagram Analysis .................................................... 49
5.6 Audio Processing Engine Classes .......................................................................... 50
5.6.1 LTI System Class – CLTISystem ....................................................................... 50
5.6.3 Topology Class – CTopology .......................................................................... 50
5.6.4 Adder Class – CAdder ................................................................................... 50
5.6.5 Multiplier Class – CMultiplier ........................................................................ 50
5.6.6 Delay Class – CDelay .................................................................................... 50
5.6.7 Splitter Class – CSplitter ................................................................................ 50
5.6.8 Block Class - CBlock ...................................................................................... 51
5.6.9 Input Class – CIn .......................................................................................... 51
5.6.10 Output Class – COut ................................................................................... 51
5.6.11 Low Pass Class – CLowPass .......................................................................... 51
5.6.12 High Pass Class – CHighPass ........................................................................ 51
5.6.13 Band Pass Class – CBandPass ....................................................................... 51
5.6.14 Multi Band Class – CMultiBand .................................................................... 51
5.7 LTI System Class - CLTISystem .............................................................................. 52
4
5.9 Block-Diagram Public Class Diagram ..................................................................... 53
5.10 Block-Diagram Public Exceptions Class Diagram ................................................... 54
5.10 Filter Public Class Diagram ................................................................................ 55
5.11 Topology Analyzer Internal Classes..................................................................... 56
5.11.1 Topology Analyzer Class – CTopologyAnalyzer ............................................... 56
5.11.2 Transfer Function Class – CTransferFunction ................................................. 57
5.12 Multi Core Fast Fourier Transform (FFT).............................................................. 58
5.12.1 Discrete Fourier Transform – DFT ................................................................ 58
5.12.2 Radix-4 "Decimation in Time" ...................................................................... 59
5.12.3 Radix-4 Recursive FFT ................................................................................. 60
5.12.4 Recursive Call Tree of Radix-4 FFT ................................................................ 61
5.12.5 Look-Up Tables Optimization....................................................................... 62
5.12.6 Radix-4 FFT Parallelism ............................................................................... 63
5.12.7 FFT Class Diagram ...................................................................................... 64
6 Visual Audio Processor.............................................................................................. 66
6.1 Visual Audio Processor Requirements .................................................................. 66
6.1.1 Windows MDI (Multiple Document Interface) GUI ........................................... 66
6.1.2 Block Diagram User Interface ........................................................................ 67
6.1.3 Frequency Response User Interface ............................................................... 67
6.1.4 Audio Streamer User Interface ...................................................................... 68
6.2 Visual Audio Processor Architecture ..................................................................... 69
6.2.1 Document Form Base Class ........................................................................... 69
6.2.2 Block Diagram Form Class ............................................................................. 70
6.2.3 Frequency Response Form Class .................................................................... 71
6.2.4 Project Class................................................................................................ 72
6.2.5 XML Serialization Class ................................................................................. 73
6.2.6 Signal Flow Graph Builder Class ..................................................................... 74
6.2.7 Audio Engine Form Class............................................................................... 75
7 Visual Studio Solution Structure ................................................................................. 76
7.1 Visual Studio Projects ......................................................................................... 76
7.2 Audio Engine C++ Namespace ............................................................................. 77
7.3 Audio Engine .NET Namespace ............................................................................ 78
8 System Requirements ............................................................................................... 79
8.1 Audio Streaming Engine Requirements ................................................................. 79
5
8.2 Audio Processing Engine Requirements ................................................................ 79
8.3 Visual Audio Processor Requirements .................................................................. 79
9 Summary ................................................................................................................ 80
9.1 Further Development ......................................................................................... 80
9.2 Thanks and Gratitude ......................................................................................... 80
10 References ............................................................................................................ 81
6
Abstract
Audio Processors are widely common among home entertainment products and
professional audio equipment. They intentionally alter an auditory signal into a form of a
new signal for useful purposes such as filtering/enhancement and other effects.
Most audio processors are dedicated to a certain operation and can't be programmed.
This project is about design and development of a software system which implements an
Audio Processing Engine that can be programmed in high flexibility by using block diagrams
among other design methodologies. The Audio Processing Engine can be executed in real
time to process incoming signal and provides high performance.
Our Audio Processing Engine is a software system written in C++ exposing an API both for
C++ and .NET users.
The internals of the systems are coded in C++ to achieve high performance and maximize
the potential of software optimizations and parallel programming.
The .NET API allows C#, J#, VB and even VB script users to use the system the same way it
can be used directly from its original C++ API.
Furthermore, A Windows GUI is provided as well to expose the capabilities of the Audio
Processing Engine, offer easy design, composition, real-time execution and exploration of
different audio processors visually.
Windows GUI
Application
7
1 Introduction
Today, digital audio processors are dominating the market and they come in various
software / hardware forms:
A hardware audio processor chip.
A hardware board which includes an audio
processor inside.
A complete hardware rack audio processor
dedicated for echoing effects.
A software audio processor as part of Windows
Media Player.
1.1 Motivation
Mostly the processors available for the home entertainment are limited to a certain purpose
and we can't program / change their operation.
1.2 Primary Goal
Design and development of a software system which implements and Audio Processing
Engine which can be programmed in terms of Block Diagrams among other design
methodologies such as Frequency Response and System Shock Response. The Audio
Processing Engine should be able to handle large block diagrams and simulate them in real
time.
1.3 Secondary Goal
Design and development of a Windows MDI (Multiple Document Interface) GUI to expose
the capabilities of the Audio Processing Engine, offer easy design, composition and
exploration of different audio processors visually.
8
2 Background
2.1 Audio Channel Configuration
Audio playback both for personal use and professional use is characterized by a
configuration defining the number of audio channels in the signal, number of loudspeakers
and their deployment around the audience.
2.1.1 "2 Channel Stereo"
The most common audio configuration is the "2 Channel Stereo":
CD Audio Player
(Playback device)
Left Audio Channel
Right Audio Channel
In this example, the output signal outgoing from the playback device carries 2 channels of
auditory signals. Each loudspeaker reflects a specific and only one channel.
2.1.2 "5.1 Channel Surround"
Another common deployment is the "5.1 Channel Surround":
DVD Player
(Playback device)
The "5.1 Channel
Surround" carries 6
channels of
auditory signals.
Each is reflected by
the corresponding
loudspeaker.
Center
Low Frequency Channel
Front Right
Front Left
Rear Right
Rear Left
9
2.2 Audio Signal
In general there are many audio signal configurations in addition to the "2 Channel Stereo"
and the "5.1 Channel Surround". We provide a general method/definition for the digital
representation of any audio signal configuration.
2.2.1 Audio Signal Definition
Definition – Digital Audio Signal
A Digital Audio Signal is a vector function x  n  in a vector space V such as:
V
x  n | x  n   a  n  , a  n  ,..., a
1
2
N
 n  , ai is a scalar function, n 

x  n  is a vector of N coefficients of a discrete time variable n where each coefficient is a
scalar function. Each coefficient represents the auditory signal played by the corresponding
loudspeaker.
Defining an audio signal in a form of a vector function with N coefficients allows
representing any number audio channels.
2.2.2 Audio Sample Definition
Let x  n  be a vector function representing an audio signal.
The value of x  n  at a specified time n  n0 is called a Sample or Audio Sample.
2.2.3 Audio Signal Example
x  n
 Sin  2 n  , Cos  5 n 
Here in this example x  n  is a vector function of 2 coefficients where the first coefficient
a1  n   Sin  2 n  and the second coefficient is a2  n   Cos  5 n 
CD Audio Player
(Playback device)
Left Audio Channel
Right Audio Channel
This example is about a vector function represents a Stereo signal in which a1  n  is the left
auditory signal and a2  n  is the right auditory.
10
2.2.4 Audio Signal Resolution
Audio signal resolution measures the density of Samples per second and the resolution in
the digital representation of the sample value itself.
The more samples per second and the more precision on representing the value of the
sample, the more the digital audio signal can reflect delicate changes.
High fidelity audio is characterized by high signal resolution.
11
2.3 Complex Functions
A Complex Function is a private case of a 2 coefficient vector function and is notated as
follows:
x  n   L  n   iR  n 
2 Channels audio configuration can be represented in a form of a complex function as well
as a vector function.
L  n  is the real part representing the auditory signal of the left channel.
R  n  is the imaginary part representing the auditory signal of the right channel.
2.4 Consumer Audio Specifications
Consumer audio is a high fidelity audio exceeding the limits of the human ear with the
following specifications:
Channel Configuration:

"2 Channel Stereo" (2 Channels)
Signal Resolution:


44100 Samples per second.
Each sample is a 16Bit integer.
44100 Samples per Second
This is the most common audio format. We are being exposed to this resolution everyday on
Digital TV, Audio CD, portable players and other home entertainment devices.
12
2.5 Digital Audio Processor
Audio Processors are widely common among home entertainment products and
professional audio equipment. They intentionally alter an auditory signal into a form of a
new signal for useful purposes such as filtering/enhancement and other effects.
CD Audio Player
(Playback device)
Audio Processor
Left Audio Channel
Right Audio Channel
In the example above the Audio Processor accepts a "2 Channel Stereo" signal and generates
an output of the same configuration type.
In the general case an Audio Processor can accepts one type of signal configuration and
generates an output of a different type signal configuration.
13
2.5.1 Digital Audio Processor Definition
Digital Audio Processors are entities which mathematically operate on the digital
representation of a signal.
Definition - Digital Audio Processor
A Digital Audio Processor is a transformation T : V  K from a signal vector space V to a
signal vector space K .
The transformation can also be illustrated as follows:
Audio Processor T
x V is the input signal and y  K is the transformation result or the output signal of the
Digital Audio Processor.
Signal vector spaces V and K are the same type of vector space defined in 2.2.1, however
we explicitly repeat their definition for the sake of definition integrity:
V
K
x  n  | x  n   a  n  , a  n  ,..., a
 y  n  | y  n   b  n  , b  n  ,..., b
1
1
2
N
 n  , ai is a scalar function, n 
2
M
 n  , bi is a scalar function, n 


x  n  and y  n  are signals of N and M scalar coefficients correspondingly. Each of them
is a function of a discrete time variable n .
2.5.2 Linear Audio Processor
A linear audio processor T is an Audio Processor defined in section 2.5.1 for which the
following two additional properties hold:

Assuming x1  n  is the input signal and its corresponding output is y1  n  .
If we define x  n 
  x1  n  as the input of the system, then the output is the
signal y  n  such as y  n     y1  n  .

Assuming x1  n  and x2  n  are two separate signals and their output are the
signals y1  n  and y2  n  correspondingly.
If we define x  n 
x1  n   x2  n  as the input of the system, then the output of
the system y  n  is the summation of the separate outputs y1  n  and y2  n  such
as y  n 
y1  n   y2  n  .
14
2.6 Audio Processor Representation
Audio Processors are transformations from one vector space to another. Such
transformations can be represented in more than one way. The following sections are
examples of different methods to represent linear signal transformations.
2.6.1 Block Diagram
Audio Processors can be represented as Block Diagrams.
out
in
D
−2
Diagram Block
D
in
out
Semantics
Block Type
Register / Delay by 1 Sample
Unary operator
Multiplication by 
Unary operator
Summation
Binary Operator
Splitter
Special Operator
Input Source
Special Operator
Output Source
Special Operator
In this example the audio processing procedure is represented in a form of a Block Diagram.
The above block diagram should be interpreted as follows: y  n   x  n   2 x  n  1
In general a block diagram can contain more types of operators. A block diagram that is
limited to the above operators is a Linear Transformation or a Linear Audio Processor.
Consider section 2.5.2 about linearity.
15
2.6.2 Shock Response
A Shock Response is a vector h  n  V where V is signal vector space of the type defined in
section 2.2.1.
The output y  n  of an audio processor defined by Shock Response h  n  and an input
signal x  n  is defined as follows:
y n

 h  m x  n  m
m 
This operation is known as "Convolution".
Any Linear Time Invariant (LTI) processing system can be represented in a form of a
Convolution. That includes the Block Diagram and the Frequency Response representation.
Consider section 2.6.1 where we had a specific block diagram defining the output y  n 
such as * y  n   x  n   2 x  n  1 .
If we define h  n 
1, 2 and put it into the Convolution operation we retrieve the
following:
y  n

1
m 
m0
 h  m  x  n  m    h  m  x  n  m   h  0x  n   h 1 x  n  1  x  n   2x  n  1
1
2
*
A Block Diagram always defines y  n  as a linear dependency by a series of x  n  m  where
m are any integers such as m  0 and therefore is equivalent to a "Convolution" with a Shock
Response vector.
16
2.6.3 Frequency Response
Audio Processors can be represented by their Frequency Response
The above chart representing an audio processor taking the role of a frequency filter.
It shows what frequencies pass through the processor and how they are amplified /
attenuated.
The Frequency Response is a graph defined by a function H   of a continuous variable 
such as      .
 stands for the lowest achievable frequency ( 22050Hz ( and  stands for the highest
achievable frequency ( 22050Hz (.
The horizontal axis scale is limited reaches up to 22050 Hz which is the highest frequency
supported by Consumer Audio (see section 2.4 for more about Consumer Audio
Specifications).
Frequency Filtering is a linear operation and therefore can be represented in a form of Block
Diagram and Shock Response as well.
To convert the Frequency Response representation to a Shock Response vector the following
calculation is required:
1
h  n 
2

 H  e
i n
d

This is a version of a Fourier Transformation which converts from "Frequency Domain" to a
discrete "Time Domain".
A Shock Response h  n  is the "Time Domain" where the Frequency Response H   is the
"Frequency Domain".
17
3 Common Software Architecture
This chapter introduces the common software development environment, framework and
fundamental data types used across the project.
3.1 Target Machine and Operation System
The project target machine is any x86/amd64 compatible running Windows 7.
3.2 Development Environment
Microsoft Visual Studio 2010 is used to author and to be used as a complete software
development environment across the project.
3.3 Programming Languages
C++ is used as a main programming language. Most of the project code is aimed to achieve
performance. Parallel programming and software optimizations are used in this project and
are further maximized under native C++ code.
.NET C# is used only as a complementary development language to code only graphic user
interface.
3.4 Programming Frameworks
Standard Template Library (STL) is used extensively across the project for the following
types: vectors, strings, lists queues maps and multimaps.
.NET Framework 4.0 is used naturally whenever .NET C# code takes place.
3.5 Exceptions
Exceptions are used extensively across the project. The following sections introduce the data
types which represents basic exceptions across the project.
3.5.1 Exception Base Class
The base class for exceptions is the std::exception. All exceptions in our project inherit from
this type.
18
3.5.2 HRESULT Exception
HRESULT Exception represents an HRESULT failure returned by a WIN32 API function. All
errors returned in a form of an HRESULT failure are translated to CHRException type which
represents the failure as an exception rather than an error code.
The benefit from using CHRException is that after being initialized by the specific error code
returned by the Windows API, it automatically translates the error code to a readable
std::string.
Conversion from HRESULT to a readable string is done internally using FormatMessage
(Windows API).
19
3.6 "2 Channel Stereo" Signal Representation
The following section introduce a data type for representing "2 Channel Stereo" audio signal.
3.6.1 Audio Signal Sample
A single audio signal sample is a complex number represented by the type TComplex.
L is the real part of the complex number and it represents a left audio channel value.
R is the imaginary part of the complex number and it represents a right audio channel value.
3.6.2 Audio Signal
A 2 channel audio configuration can be represented in form of a complex function. A
discrete time complex function is a series of samples. We use TComplex * as a pointer to a
continuous block of audio samples introduced in section 3.2.1.
Such series of samples pointed by TComplex * are a discrete time complex function and it
therefore represents a discrete time 2 channel audio signal.
All audio signals in our project are of type TComplex *.
20
4 Audio Streaming Engine
"Audio Streaming Engine" is our software system solution for a playback process integrating
a generic audio processing. It is coded entirely in C++. In addition to the C++ API it exposes a
.NET API and therefore all .NET languages such as C#, J# and VB.NET can make use of the
streaming engine.
4.1 Audio Streaming Engine Requirements
The following sections introduce major requirements from the playback system - Audio
Streaming Engine.
4.1.1 Audio Streaming
Audio streaming is the process in which a piece of audio is played continuously on PC or any
other digital system. The streaming solution should be able to monitor and control playback
of consumer audio.
4.1.2 Generic Audio Processing
The system should allow integration and appliance of an external audio processor as part of
the audio streaming process.
4.1.3 Modularity
The system should be composed of logical components to benefit from modularity. The
streaming engine should not be aware of tasks such as decoding the format of consumer
audio files. It should not be aware of tasks such coding into a sound hardware/other output
device.
4.1.4 Programming API for C++ and .NET
The system exposes both a C++ API and an equivalent .NET API.
21
4.2 Streaming Engine Architecture Diagram
"Input Device"
IInputDevice
Wave File
"Audio Streamer"
"Output Device"
CStreamer
IOutputDevice
Input Signal
Output Signal
DirectX XAudio2
"Audio Processor"
IAudioProcessor
Type
CStreamer
IInputDevice
IOutputDevice
IAudioProcessor
Semantics
Concrete class representing the streaming engine.
Abstract class representing the Input Device.
Abstract class representing the Output Device.
Abstract class representing the Audio Processor.
22
4.3 Audio Streaming Engine Classes
The following sections summarize the major public classes which forms Audio Streaming
Engine.
4.3.1 Audio Streamer Class – CStreamer
CStreamer is a concrete class implementing the audio streaming procedure. It allows
continues playback from an "Input Device" to an "Output Device".
4.3.2 Input Device Abstract Class - IInputDevice
IInputDevice is a pure abstract class representing an Input Device – The audio signal source.
An Input Device can represent an Audio-CD or a Wave file or even a live audio source from
the Internet.
4.3.3 Input Device Factory Abstract Class – IInputDeviceFactory
IInputDeviceFactory is a pure abstract class representing a factory for an Input Device. The
Audio Streamer class does not accept an Input Device directly but rather accept the
appropriate factory to instantiate the Input Device.
4.3.4 Wave Input Class – CWaveInput
CWaveInput is a concrete class representing a Wave file audio source (Microsoft RIFF media
format PCM 44.1Khz 16Bit Stereo). It is implementing the IInputDevice interface.
4.3.5 Output Device Abstract Class – IOutputDevice
IOutputDevice is a pure abstract class representing an Output Device. An Output Device can
represent the sound hardware or even a Wave file as the final target for streaming.
4.3.6 Output Device Factory Abstract Class – IOutputDeviceFactory
IOutputDeviceFactory is a pure abstract class representing a factory for an Output Device.
The Audio Streamer class does not accept an Output Device directly but rather accept the
appropriate factory to instantiate the Output Device.
4.3.7 XAudio2Output Class – CXAudio2Output
CXAudio2Output is a concrete class representing the sound hardware. It is based on the
XAudio2 API to send audio data for playback on the sound hardware. It is implementing the
IOutputDevice interface.
4.3.8 Audio Processor Abstract Class – IAudioProcessor
IAudioProcessor is a pure abstract class representing the Audio Processor to be applied
during the streaming process. Any object implementing this interface can serve as an Audio
Processor for the Audio Streaming Engine
23
4.4 Audio Streamer Class – CStreamer
CStreamer class is a concrete implementation for an audio streaming process.
A streaming process involves negotiation between an Input Device and an Output Device. It
is a special case of a known "Consumer Producer Problem". The Output Device is the
consumer where the Input Device is the producer.
The primary goal of the Audio Streamer is to make sure an Output Device connected to the
streamer will never be starved.
The streamer accepts basic commands such as Play, Rewind, Pause, Stop and Seek.
A generic audio processor can also be connected to the streamer so it will be applied on the
audio signal during streaming.
A starvation is a case in which the Output Device has no more audio data to play and is
waiting for additional delivery of audio data. A starving Output Device representing a sound
hardware waiting for audio data. During waiting, no more sound is generated and the
playback is ceased and no more fluent.
A starvation should never happen unless a failure of the Input Device to decode/extract the
audio data from the media source. Such failure may occur if a removable disk is removed or
an Internet address is no longer available.
In order to prevent an Output Device from getting starvation, it notifies the audio streamer
when it requires more audio data. The Output Device issues the notification enough time
before it actually enters into a starvation status. When the streamer receives the notification
originated in the Output Device, it orders the Input Device to decode more audio data to be
delivered into the Output Device.
The streamer is implemented as a listening thread waiting on a queue of notifications
originated in the Output Device. When the queue is no longer empty, the listening thread
consumes the event accumulated on the queue and further process it.
Thread procedure (Pseudo Code):
(1)
Wait here as long as the Queue is empty.
Dequeue an event from the Queue.
Process the event.
Go back to (1).
Implementation as a thread makes the Audio Streamer API a NON-BLOCKING API. It allows
playback in the background while the program flow goes on with the rest of the code. The
streamer protectedly inherits from CEventDrivenThread and benefits automation from an
event driven thread which is exactly what an audio streaming process is.
24
4.4.1 Audio Streaming Process Diagram
CStreamer
CEventDrivenThread
IInputDevice
"Control"
Event Processor
Dequeue
Event
Blocking Queue(FIFO)
Event
Event
....
Enqueue
RequestFeed()
Event
Audio Data
Audio Data
IAudioProcessor
Audio Data (pX, nXLength)
Audio Data (ppY, pnYLength)
25
IOutputDevice
4.4.2 Audio Streamer Class Diagram
The audio streamer is a non-blocking API which allows control over the playback of an audio
signal. It exposes basic commands such as Play, Rewind, Pause, Stop and Seek.
26
27
4.5 Input and Output Devices
Input and Output Devices are entities which represent the source and the target in a
streaming process.
4.5.1 Input Device Abstract Class - IInputDevice
IInputDevice is an abstract class representing the Input Device. An Input Device in the
streaming process is a decoder for a specific audio data. The audio data can be a Wave File
or Audio CD or any other well-defined audio format including a live source of audio from the
Internet.
The decoded audio is delivered to the Audio Streamer in a digital representation that is valid
and acceptable by the Audio Streamer.
The Audio Streamer itself does not aware of the decoding task nor it know how to actually
decode a Wave file or any other audio format.
Input Device is the representative of the audio media source in front of the Audio Streamer.
4.5.2 Input Device Class Diagram
A concrete Input Device ships with the appropriate concrete factory. The Audio Streamer
does not accept an Input Device dynamically but rather accepts the appropriate factory
during initialization of the streamer.
Input Device is an entity which implements basic methods such as Read and SetPosition to
allow the streamer to command the Input Device reading or setting the position on the
linear time line of the audio signal.
28
4.5.3 Output Device Abstract Class - IOutputDevice
IOutputDevice is an abstract class representing the Output Device. An Output Device in the
streaming process is an encoder for a specific type of audio target. The audio target can be a
sound hardware or even a Wave file.
The audio streamer delivers the signal in a digital representation that is valid and acceptable
by the Output Device.
The Audio Streamer itself does not aware of the encoding task nor it know how to actually
handle a sound hardware or a target Wave file.
Output Device is usually the representative of the sound hardware.
4.5.4 Output Device Class Diagram
A concrete Output Device ships with the appropriate factory. The Audio Streamer does not
accept an Output Device dynamically but rather accepts the appropriate factory during
initialization of the streamer.
Output Device is an entity which implements basic methods such as Write to allow the
streamer to command the Output Device to write more audio data to the Output Device.
29
4.6 Wave Input Device Class - CWaveInput
CWaveInput is a concrete implementation for IInputDevice. It represents a Wave File
decoder compatible for the audio streamer. It ships with an appropriate factory
CWaveInputFactory which is a concrete implementation for IInputDeviceFactory.
30
4.7 XAudio2 Output Device Class – CXAudio2Output
CXAudio2Output is a concrete implementation for IOutputDevice. It represents the sound
hardware as an Output Device for the audio streamer. It is using Microsoft DirectX-XAudio2
API to handle the sound hardware. It ships with an appropriate factory
CXAudio2OutputFactory which is a concrete implementation for IOutputDeviceFactory.
31
CXAudio2Output may throw an exception of type CXAudio2OutputException. The exception
may be for example thrown when a sound hardware is not available.
Microsoft XAudio2 API is a COM interface. Since every COM Component requires
initialization of the Windows COM Library, a helper class CComLibrary is internally used by
CXAudio2Output to automatically initialize and automatically release the Windows COM
Library when it is no longer needed.
XAudio2 is a relatively new audio API. XAudio2 was originally and exclusively part of the
Microsoft XBOX development framework. When it was evaluated as a successful API, easier
to use than DirectSound, XAudio2 was officially made part of Windows DirectX. It replaces
the deprecated Direct Sound.
32
4.8 Audio Processor Abstract Class – IAudioProcessor
IAudioProcessor is a pure abstract class representing the Audio Processor to be applied
during the streaming process. Any object implementing this interface can serve as an Audio
Processor for the Audio Streaming Engine
Any audio processor compatible to the Audio Streaming Engine must implement and be
obliged to the IAudioProcessor interface (Pure abstract class).
The pure virtual function Process accepts an input signal pointed by pX with the number of
samples notated by nXLength.
The output of the audio processor is pointed by *ppY and the number of samples of the
output is pointed by pnYLength.
Our Audio Processing Engine is obliged to this interface as well.
33
5 Audio Processing Engine
"Audio Processing Engine" is our software system solution for a real time audio processor.
The main class for the audio processor is CLTISystem which is a Linear Time Invariant signal
processing system. It is a concrete implementation for IAudioProcessor and therefore it can
be connected to the audio streamer (CStreamer) introduced in chapter 3.
It is coded entirely in C++ to further benefit software optimizations and parallel
programming.
In addition to the C++ API it exposes a .NET API and therefore all .NET languages such as C#,
J# and VB.NET can make use of the streaming engine.
5.1 Audio Processing Engine Requirements
The following sections introduce major requirements from the audio processing engine.
5.1.1 Linear System
The audio processing engine is a Linear Time Invariant signal processing system. It follows
the definition of a linear audio processor introduced in section 2.5.2. We repeat the
definition for the sake of integrity.
A linear audio processor T is an Audio Processor defined in section 2.5.1 for which the
following two additional properties hold:

Assuming x1  n  is the input signal and its corresponding output is y1  n  .
If we define x  n 
  x1  n  as the input of the system, then the output is the
signal y  n  such as y  n     y1  n  .

Assuming x1  n  and x2  n  are two separate signals and their output are the
signals y1  n  and y2  n  correspondingly.
If we define x  n 
x1  n   x2  n  as the input of the system, then the output of
the system y  n  is the summation of the separate outputs y1  n  and y2  n  such
as y  n 
y1  n   y2  n  .
34
5.1.2 Programming by Shock Response
The system can be programmed by Shock Response. See section 2.6.3. We repeat the
definition for the sake of integrity:
A Shock Response is a vector h  n  V where V is signal vector space of the type defined in
section 2.2.1.
The output of an audio processor defined by Shock Response h  n  and an input signal
x  n  is defined as follows:
y n

 h  m x  n  m
m 
This operation is known as "Convolution".
5.1.3 Programming by Frequency Response
The system can be programmed by Frequency Response. See section 2.6.2.
Audio Processors can be represented by their Frequency Response
The above chart representing an audio processor taking the role of a frequency filter.
It shows what frequencies pass through the processor and how they are amplified /
attenuated.
The horizontal axis scale is limited reaches up to 22050 Hz which is the highest frequency
supported by Consumer Audio (see section 2.4 for more about Consumer Audio
Specifications).
Frequency Filtering is a linear operation and therefore can be represented in a form of Shock
Response as well.
35
5.1.4 Programming by Block Diagram
The system can be programmed by Block Diagrams.
out
in
D
−2
The Block Diagram is constrained by the following specifications:


The diagram is a Connected Directed Acyclic Graph (C-DAG).
The nodes of the diagram are the collection defined in the following table:
Diagram Block
D
in
out
H
Semantics
Block Type
Register / Delay by 1 Sample
Unary operator
Multiplication by 
Unary operator
Summation
Binary Operator
Splitter
Special Operator
Input Source
Special Operator
Output Source
Special Operator
A hierarchy node which
represents a nested
diagram/other design method.
Unary Operator
Consider section 2.6.1 about representing an audio processor in form of a Block Diagram.
36
5.1.5 Real Time
The system can be used to process consumer audio in real time.
5.1.6 Scale
The system should be able to handle real time audio processing with Block Diagrams
containing ten thousands of basic components.
5.1.7 C++ and .NET API
The system exposes both a C++ API and an equivalent .NET API.
37
5.2 Audio Processing Engine Challenge
The Audio Processing Engine is a real time simulation for systems represented by Block
Diagrams, Frequency Response and Shock Response Representation. See section 2.6 for
Audio Processor Representation and section 5.1 about Audio Processing Engine
Requirements.
The following sections introduce common simulation methods to apply real time audio
processing represented by Block Diagram.
out
in
D
−2
5.2.1 Block Diagram Simulation
Block Diagram Simulation method means managing a complete repository of values among
all diagram nodes and connections.
It requires taking each audio sample from x  n  and processing it through each and every
diagram node and connection.
Considering requirement 5.1.6 about ten thousands of components in such diagram and
Consumer Audio Specification from section 2.4 about 44100 samples per second, such
simulation for about a second of Consumer Audio requires billions of calculation by complex
values per second.
A modern PC does not handle billions of complex values multiplications / summations per
second. Not even close to it.
The advantage of such simulation is simplicity and ability to handle any Block Diagrams, not
just linear systems.
The disadvantage of such simulation is that it does not handle more than a few hundreds
components in a diagram under Consumer Audio resolution.
Therefore such simulation is ruled out for our Audio Processing Engine.
38
5.2.2 Shock Response Simulation
Another method for simulation is taking the Block Diagram, analyzing it to extract the values
of the Shock Response vector h  n  and then simulation the output y  n  according to
section 2.6.2.
y n

 h  m x  n  m
m 
The disadvantage in such method is the requirement for a preliminary analyzing stage on the
Block Diagram which must take place before actual simulation. It handles in real time a Block
Diagram which generates a Shock Response vector of few hundreds of components only.
The bigger the Block Diagram the bigger the Shock Response vector is. The commulative
multiplications and summation in complex values per second required to handle large block
diagrams is billions.
Therefore such simulation is ruled out for out Audio Processing.
39
5.3 Audio Processing Engine Architecture
The following sections introduce how real time and capacity requirements introduced in
section 5.1 are handled by this audio processor.
5.3.1 Simulation by Polynomial Multiplication
Consider the Block Diagram example from section 2.6.1.
out
in
D
−2
Assuming x  n  is the input signal x  n 
* y  n 
3, 4 then the output y  n  is
 x  n   2 x  n  1  3, 4  2  0,3, 4  3, 2, 8
This calculation can be represented as a simple multiplication between two Polynoms.
One Polynom represents the input signal:
x  n   3, 4  3  4 x
The other polynom represents the Block Diagram:
1x  n   2 x  n  1  1  2 x
y n
The output of the system can be obtained by multiplying the Polynoms:
 3  4 x 1  2 x  
3  2 x  8x2
Pay attention to the coefficients of the resulted Polynom. They are the same values resulted
in  * .
A Block Diagram can always be reduced to a Polynom multiplication where one polynom
represents the input signal and the other represents the Block Diagram.
Simulation by Polynomial multiplication requires handling with large Polynoms to achieve
the capacity requirement of the system.
40
5.3.2 Fast Polynomial Multiplication (Fast Convolution)
Efficient multiplication between two Polynoms can be done in time complexity of
O  N log N  instead of standard complexity of O  N 2  .
Let h  n  be the Shock Response of the system and x  n  is the input signal of the system.
The output of the system y  n  can be calculated by using the calculation introduced in
section 2.6.2 such as:

 h  m x  n  m
y  n
m 
; ON2 
This calculation is actually the same calculation in conventional Polynom multiplications.
Instead the output of the system y  n  can be calculated
Instead of executing the calculation for y  n  introduced in section 2.6.2 the output y  n 
can be obtained by the following method:





y  n   DFT 1 DFT h  n   DFT x  n 



; O  N log N 

Where DFT x  n  and DFT 1 v  k  are defined as follows:

N 1
   x  n  e
DFT x  n 
k

2 ikn
N
n 0
2 ikn
1 N 1
DFT v  k 
 v k e N
 n
N k 0
k , n  , 0  k , n  N  1
1


x  n  and v  k  are finite signals the size of N samples.
DFT stands for Discrete Fourier Transform
The DFT can be executed under time constraint of O  N log N  using FFT and therefore
 
reducing the total time required for such Polynomial multiplication from O N 2 to
O  N log N  . See section 5.12 about FFT.
41
5.3.3 Block Diagram Topology Analyzer
In order to process the input signal by Fast Polynomial Multiplication introduced by sections
5.3.1 and 5.3.2, a Block Diagram topology must be analyzed and converted to a series of
coefficients representing the characteristic Polynom of the Block Diagram.
Deriving the characteristic Polynom or the Shock Response vector of a Block Diagram
requires an algorithm whose input is a Block Diagram and its output are the coefficients of
the Shock Response vector.
See section 5.5 about Block Diagram analysis.
42
5.4 Audio Processing Engine Architecture Diagram
The following diagram shows how the Audio Processing Engine reducing each of its programming methods whether it is a Block Diagram or Frequency to a Shock Response
vector and then applying Fast Polynomial Multiplication to process the input signal with the appropriate Shock Response.
Audio Processing Engine
Block Diagram Topology Analyzer
Frequency Response
Shock Response vector
Fast Convolution
Input Signal
FFT
Output Signal
43
5.5 Block Diagram Topology Analysis
This section introduces a method for reducing a Block Diagram to its corresponding
characteristic Polynom or Shock Response vector.
This algorithm accepts a Block Diagram introduced in section 5.1 as an input and generates
an output of the corresponding Shock Response vector.
The algorithm is divided to 3 main stages:



Converting Block Diagram to an Operator Tree.
Refactoring the Operator Tree to obtain an Optimized Operator Tree.
Deriving the Shock Response vector from the Optimized Operator Tree.
5.5.1 Example
Consider the following Block Diagram.
out
in
D
2
D
3
Given the above Block Diagram the output of the algorithm is the following Shock Response
vector h  n   1, 2,3 . See section 2.6.2 about Shock Response.
44
5.5.2 Block Diagram Concatenation
Consider the following Block Diagram.
out
in
D
D
2
4
D
3
Given the above Block Diagram the output of the algorithm is the following Shock Response
vector h  n   1, 6,11,12 . See section 2.6.2 about Shock Response.
The topology of the Block Diagram in this example is case where two Block Diagrams are
connected by concatenation.
It is the topology introduced in section 5.5.1 concatenated with the topology introduced in
section 2.6.1.
The final Shock Response vector h  n  can be obtained by two separate Shock Responses
each for each topology in the concatenation.
Let h1  n   1, 2,3 be the Shock Response of the first topology in the concatenation.
Let h2  n   1, 4 be the Shock Response of the second topology in the concatenarion.
h  n  can be obtained by Convolution such as h  n  

 h  m  h  n  m
1
2
m 
Since Convolution can be reduced to Fast Polynomial Multiplication, such concatenations
can be handled and analyzed by time complexity of O  N log N  assuming each of the
topologies analysis is bound under that complexity as well. See section 5.3.2 about Fast
Polynomial Multiplication.
45
5.5.2 Operator Tree
An Operator Tree is a directed tree with the following nodes.
Tree Node Graphic Symbol
D
Node Name
Node Type
"Delay Node"
Operand (Leaf)
"Multiplication Node"
Binary Operator
"Summation Node"
Binary Operator
"Numerical Constant Node"
Operand )Leaf)
5.5.3 Block Diagram Representation as Operator Tree
The Block Diagram from section 5.5.1 is represented as the following Operator Tree.
1
D
2
3
D
This operator tree should be interpreted as follows: * 1  D   2  3  D 
If we treat D as an operand in a multiplication then we can open the parentheses and
rewrite * as follows: 1  D   2  3  D   1  2 D  3D 2 which is a Polynom of variable D
Furthermore, it is the characteristic Polynom for Block Diagram 5.5.1 required to process an
input signal with Fast Polynomial Multiplication introduced in section 5.3.2.
46
The Block Diagram from section 5.5.2 is represented as the following Operator Tree.
1
D
D
4
2
D
1
3
4
D
1
D
4
This operator tree should be interpreted as follows:
1  4  D   D   2  1  4  D   3  D  1  4  D  
This interpretation is the characteristic Polynom or the Shock Response vector of the Block
Diagram introduced in section 5.5.2.
Pay attention how the same sub-tree surrounded by dashed line is repeated / pointed 3
times and how 1  4  D  is repeated / pointed 3 times in the characteristic Polynom. A
node which is pointed more than once is a key to optimization in the next section.
47
5.5.4 Operator Tree Optimization
Consider the following algebraic maneuver of "Common Term Extraction" to the above
Polynom:
1  D  2 1  3D 1  1  4D 
The following operator tree reflects "Common Term Extraction" optimization:
1
1
4
D
D
2
1
D
3
1
The "Common Term Extraction" leaves unit constants as place holders for the original place
of the common term. Such factoring generates additional multiplication node which takes
the previous refactored tree and the common extracted term as operands. This optimization
reducing the original Polynom to a multiplication between two smaller Polynoms:
1  D  2 1  3D 1  1  4D 
An Optimized Operator Tree is an Operator Tree with no repetition of sub-trees. This
optimization is repeated until an Optimized Operator Tree is obtained.
48
5.5.5 Deriving Shock Response from an Optimized Operator Tree
An Optimized Tree has a typical structure which describes a series of multiplications
between Polynoms such as:
Polynom B
Polynom A
 Polynom A   Polynom B 
Multiply By Fast Convolution
Applying Fast Polynomial Multiplication (or Fast Convolution) repeatedly on the Optimized
Operator Tree will reduce the tree to a series of coefficients which forms the Shock
Response vector or the coefficients of the characteristic Polynom of the original Block
Diagram.
5.5.6 Time Complexity of Block Diagram Analysis
The algorithm is divided to 3 main stages:

Converting Block Diagram to an Operator Tree.

Using DFS (Depth First Search) an Operator Tree can be generated by O E  N

where E is the size of the connections group on the Block Diagram and N is the
size of the nodes group on the Block Diagram.

Refactoring the Operator Tree to obtain an Optimized Operator Tree.
A repeated sub-tree can not repeat more than E times (the number of total
connections on the Block Diagram). Therefore replacing the original places of the
sub-tree with the unit constant and inserting the artificial multiplier can not exceed
 
time complexity of O E .

Deriving the Shock Response vector from the Optimized Operator Tree.
Derivation of the Shock Response vector is bounded by the time complexity of Fast


Polynomial Multiplication which is O E log E .


Total Block Diagram analysis complexity is bounded under O N  E log E .
49
5.6 Audio Processing Engine Classes
The following sections summarize the major public classes which forms Audio Processing
Engine.
5.6.1 LTI System Class – CLTISystem
CLTISystem is a concrete class implementing the IAudioProcessor abstract class (See secion
4.7 about IAudioProcessor(.
CLTISystem is an abstraction for an LTI System which can be executed in real time to process
input signals and generates output signals.
LTI System accepts a Shock Response vector as a processing operation instruction. Any other
form of design method such as Block Diagram or Frequency Response can be reduced to its
corresponding Shock Response and then used by CLTISystem to apply the audio processing.
5.6.3 Topology Class – CTopology
CTopology is a concrete class representing operations on a Block Diagram topology such as
connection/disconnection of nodes on a diagram. The Topology class is then used to analyze
the Block Diagram to derive the corresponding Shock Response vector. See section 5.1.4
about Block Diagram and section 5.5 about Block Diagram Topology analysis.
The derived Shock Response vector is then used by the LTI System (CLTISystem) to apply the
audio processing in real time.
5.6.4 Adder Class – CAdder
CAdder is a concrete class represents an Adder node in a Block Diagram. See section 5.1.4
about Block Diagram. Adder is a summation over 2 or more incoming audio signals.
5.6.5 Multiplier Class – CMultiplier
CMultiplier is a concrete class represents a Multiplier in a Block Diagram. See section 5.1.4
about Block Diagram. A Multiplier takes an incoming signal and amplify / attenuate it by a
constant complex factor. The Multiplier is an unary operator in a Block Diagram.
5.6.6 Delay Class – CDelay
CDelay is a concrete class represents a Delay in a Block Diagram. See section 5.1.4 about
Block Diagram. A Delay takes an incoming signal and delays it by one sample or more
according to initialization of the node. The Delayis an unary operator in a Block Diagram.
5.6.7 Splitter Class – CSplitter
CSplitter is a concrete class represents a splitter in a Block Diagram. See section 5.1.4 about
Block Diagram. A Splitter takes an incoming signal and generates 2 identical copies of the
original signal.
50
5.6.8 Block Class - CBlock
CBlock is a concrete class represents an arbitrary nested design (Block Diagram or Frequency
Response or any arbitrary Shock Response vector). See section 5.1.4 about Block Diagram.
This node allows nesting designs inside designs.
5.6.9 Input Class – CIn
CIn is a concrete class represents an input node in a Block Diagram. See section 5.1.4 about
Block Diagram. Only one input is allowed per Block Diagram.
5.6.10 Output Class – COut
COut is a concrete class represents an output node in a Block Diagram. See section 5.1.4
about Block Diagram. Only one output is allowed per Block Diagram.
5.6.11 Low Pass Class – CLowPass
CLowPass is a concrete class represents the Shock Response of a Low Pass filter. This audio
filter is initialized by construction with specifications required from this filter and then
generates the appropriate Shock Response vector. The Shock Response vector is then
consumed by the LTI System class (CLTISystem) to apply the filter in real time.
5.6.12 High Pass Class – CHighPass
CHighPass is a concrete class represents the Shock Response of a High Pass filter. This audio
filter is initialized by construction with specifications required from this filter and then
generates the appropriate Shock Response vector. The Shock Response vector is then
consumed by the LTI System class (CLTISystem) to apply the filter in real time.
5.6.13 Band Pass Class – CBandPass
CBandPass is a concrete class represents the Shock Response of a Band Pass filter. This audio
filter is initialized by construction with specifications required from this filter and then
generates the appropriate Shock Response vector. The Shock Response vector is then
consumed by the LTI System class (CLTISystem) to apply the filter in real time.
5.6.14 Multi Band Class – CMultiBand
CMultiBand is a concrete class represents the Shock Response of a Multi Band Pass filter.
This audio filter is initialized by construction with specifications required from this filter and
then generates the appropriate Shock Response vector. The Shock Response vector is then
consumed by the LTI System class (CLTISystem) to apply the filter in real time.
51
5.7 LTI System Class - CLTISystem
CLTISystem is a concrete class implementing the IAudioProcessor abstract class (See secion
4.7 about IAudioProcessor(.
CLTISystem is an abstraction for an LTI System which can be executed in real time to process
input signals and generates output signals.LTI System accepts a Shock Response vector only.
Any other form of design method such as Block Diagram or Frequency Response can be
reduced to its corresponding Shock Response and then used by CLTISystem to apply the
audio processing.
Process function is called by the user whenever a new signal is required to be processed.
SetTransferFunction is called by the user whenever a new Shock Response vector is set upon
the system.
52
5.9 Block-Diagram Public Class Diagram
The following are the public classes which forms the Block Diagram abstraction in our Audio Processing Engine.
53
5.10 Block-Diagram Public Exceptions Class Diagram
The following are the public exceptions which forms exception for the Block Diagram abstraction in our Audio Processing Engine.
54
5.10 Filter Public Class Diagram
The following section introduces the audio filters class diagram. See section 5.6 about Audio Processing Engine classes.
55
5.11 Topology Analyzer Internal Classes
The following section introduces the internal classes implementing the Block Diagram
Analysis introduced in section 5.5.
5.11.1 Topology Analyzer Class – CTopologyAnalyzer
The following is an internal private class which implements the algorithm introduced in
section 5.5.
56
5.11.2 Transfer Function Class – CTransferFunction
The following is an internal private class which internally represents a "Transfer Function" or
a Shock Response for a sub-tree in an Operator Tree.
57
5.12 Multi Core Fast Fourier Transform (FFT)
FFT is a primary functional unit in our Audio Processing Engine. It serves both the actual
signal processing and both the Block Diagram topology analysis.
The following section introduces a specific version of FFT known as Radix-4 none-recursive
supported by Trigonometric and Bit-Reversal look-up tables.
Fast Fourier Transform is a family of fast algorithms which execute the DFT under time
 
constraint of O  N log N  instead of O N 2 where N is the number of samples
transformed by the FFT.
5.12.1 Discrete Fourier Transform – DFT
The DFT is defined as follows:
DFTN  x k 
N 1
 x n e

2 ikn
N
n 0
Where x is defined such as x  n  is a valid sample for n  0,1, 2,..., N 1 .
Calculation of DFTN for a specific k  k0 requires N summations and multiplications of
complex values.
 
Calculation of DFTN for k  0,1, 2,3,..., N 1 requires the time complexity of O N 2
The next section will introduce Radix-4 "Decimation In Time" algebraic maneuver on the
 
original DFT. This allows reducing the computational time complexity from O N 2 to
O  N log N  .
58
5.12.2 Radix-4 "Decimation in Time"
The following algebraic maneuvers show how to define the original DFT the size of N samples by terms of 4 smaller DFTs the size of N/4 samples each.
N 1
 x n e
DFTN  x k 

N /4 1
 x  4n  e


x  4n  e
2 ikn
N

n 0

2 ik  4 n 
N

N /4 1
n 0
N /4 1

 x  4n  1 e

2 ik  4 n 1
N
n 0
2 ikn

N /4
n 0

 e

2 ik

N



N /4 1

x  4n  1 e
n 0
DFTN /4  x1 k 

N /4 1
N /4 1
 x  4n  2    x  4n  3  e
n 0
2 ikn

N /4

e


2 ik  4 n  3
N
n 0
4 ik

N



N /4 1

x  4n  2  e
n 0

2 ikn
N /4
  6 ik
e N

2 ikn

 N /41
N /4

x
4
n

3
e



 
 n 0
DFTN /4  x3  k 
DFTN /4 x2  k 
DFTN /4  x4  k 
  2Nik 
  4Nik 
  6Nik 
DFTN /4  x1 k    e
  DFTN /4  x2  k    e
  DFTN /4  x3  k    e
  DFTN /4  x4  k 






* DFTN  x k 
  2Nik 
  4Nik 
  6Nik 
 DFTN /4  x1 k    e
  DFTN /4  x2  k    e
  DFTN /4 x3 k    e
  DFTN /4 x4  k 






We obtained an alternative definition * to the DFT by means of smaller DFTs the size of N/4 samples operate on the following signals: x1  n 
x2  n 
x  4n  1 , x3  n 
x  4n  2  and x4  n 
x  4n  ,
x  4n  3 . This result is known as Radix-4 "Decimation in Time".
Result * is the recursive step in case of the Radix-4 recursive version. It is the start point when untying the recursion to obtain a non-recursive procedure.
59
5.12.3 Radix-4 Recursive FFT
The following is a pseudo code for a recursive Radix-4 FFT defined by the recursive step obtained in section 5.12.2
FFTN  x 
begin
if N=1 then return x  0 
for n  0,1, 2,3,.., N / 4  1 define the following signals
begin
x1  n 
x  4n 
x2  n 
x  4n  1
x3  n 
x  4n  2 
x4  n 
x  4n  3 
end
for k = 0 to N-1 do
  2 ik
x  k   FFTN /4  x1 k    e N

return x
end

  4Nik

FFT
x


e
N /4  2  k 



  6Nik

FFT
x


e
N / 4  3  k 


60

  FFTN /4  x4  k 

5.12.4 Recursive Call Tree of Radix-4 FFT
The following is the recursive call tree of a Recusrive Radix-4 FFT. The leaves are FFTs the size of a single sample. The call tree has the depth of log 4  N  .
Each row on the tree is executed under O  N  time constraint and therefore under total complexity of O  N log N  .
61
5.12.5 Look-Up Tables Optimization
An FFT requires the values of e
2 in
N
 2 in 
 2 in 
 Cos 
  iSin 
 for n  0,1, 2,3,..., N 1 .
 N 
 N 
In order to prevent repeated recalculation of trigonometric values an appropriate look-up
table is constructed.
Index
0
1
2
3
Cosine Value
1
Sine Value
0
 2 i1 
Cos 

 N 
 2 i 2 
Cos 

 N 
 2 i3 
Cos 

 N 
 2 i1 
Sin 

 N 
 2 i 2 
Sin 

 N 
 2 i3 
Sin 

 N 
.
.
.
N-1
 2 i  N  1 
Cos 

N


 2 i  N  1 
Sin 

N


On the none-recursive version another Look-Up table for Bit-Reversal is required to re-order
the values of the final FFT result. We shall not discuss here the Bit-Reversal Look-Up table
nor we discuss why a re-order is needed.
62
5.12.6 Radix-4 FFT Parallelism
The following is an illustration of how Radix-4 FFT is being paralleled by 4 different threads, each running on a separate core.
Thread 0
Thread 1
Thread 2
63
Thread 3
5.12.7 FFT Class Diagram
The following is the class diagram of the Multi Core Radix-4 none-recursive FFT supported by
Trigonometric and Bit-Reversal look-up tables.
64
The following is an internal class representing a parallel unit running under a thread in our
FFT.
65
6 Visual Audio Processor
Visual Audio Processor is a Windows MDI (Multiple Document Interface) GUI exposing the
capabilities of the Audio Processing Engine, offer easy design, composition and exploration
of different audio processors visually.
It is coded entirely using .NET C# 4.0 and using the .NET API provided by the Audio Streaming
Engine and Audio Processing Engine.
6.1 Visual Audio Processor Requirements
The following section introduces the requirements from Visual Audio Processor.
6.1.1 Windows MDI (Multiple Document Interface) GUI
The software has a graphical user interface of a main window encapsulation several
documents opened and visualized simultaneously.
Project Documents List
Multiple Documents Area
66
6.1.2 Block Diagram User Interface
The system allows creating, editing loading and saving of a Block Diagram visually. See
section 5.1.4 about the Block Diagram supported by the Audio Processing Engine.
Node Properties
Block Diagram Area
Diagram Nodes
6.1.3 Frequency Response User Interface
The system allows creating, editing, loading and saving of a Frequency Response visually. See
section 5.1.3 about Frequency Response supported by the Audio Processing Engine.
Frequency Response Chart
Frequency Constraints List
67
6.1.4 Audio Streamer User Interface
The system provides an audio streaming window for which media files can be loaded and
played while Audio Processing Engine is applied on the audio signal in real time.
68
6.2 Visual Audio Processor Architecture
Visual Audio Processor is a GUI representation for the Audio Processing Engine and the
Audio Streaming Engine.
It is coded entirely using .NET C# 4.0 and using the .NET API provided by the Audio Streaming
Engine and Audio Processing Engine.
Since the .NET API for audio processing and audio streaming is a complete API analogues to
the original C++ API, the C# code is clean from any logic involving streaming/processing of
audio signal.
Everything is executed using the object model provided by the audio streaming and audio
processing engine.
The focus of the design of this windows application is about finding the common behavior
among the main user interfaces such as Block Diagram and Frequency Response user
interfaces. Such common behavior is implemented by base classes.
6.2.1 Document Form Base Class
The document form base class conceals the common behavior of two major user interfaces –
Block Diagram and Frequency Response.
This base class defines the common behavior for the following scenarios:




User Interface form is closing.
"Save" action has been called on the user interface form.
"Save As" action has been called on the form.
Writing contents of the form to file. (Abstract scenario, implemented by
inheritance).
69
6.2.2 Block Diagram Form Class
The Block Diagram form class is a concrete implementation to the Document Form abstract
class. It implements the Block Diagram user interface. See section 6.1.2 about Block Diagram
user interface requirement.
This class heavily relies on Microsoft Visio 2010 by using Visio Drawing Control as a drawing
solution (ActiveX Control). The shapes which are valid for a drawing in a Block Diagram are
defined in a Visio Shape Template file authored especially for Visual Audio Processor.
70
6.2.3 Frequency Response Form Class
The Frequency Response form class is a concrete implementation to the Document Form
abstract class. It implements the Frequency Response user interface. See section 6.1.3 about
Frequency Response user interface requirement.
This class heavily relies on Microsoft Chart Control for visualize the Frequency Response
graph.
71
6.2.4 Project Class
The project class is an abstraction to a well-defined collection of file-references to
documents such as Block Diagrams and Frequency Response documents. Such filereferences can be added/removed from the project collection. The project collection can be
saved/loaded to/from a file.
72
6.2.5 XML Serialization Class
Visual Audio Processor relies on .NET natural serialization. Most of the classes/data
structures provided by .NET are immediately serializable to XML.
This property is heavily used by Visual Audio Processor to save Frequency Response Files and
Project files to disk in XML format.
Saving a Block Diagram is handled entirely by Microsoft Visio 2010 drawing control and not
by XML serialization introduced in this section.
XML Serialization class is a generic class for exporting/importing an object to/from an XML
file.
73
6.2.6 Signal Flow Graph Builder Class
SFGBuilder class is a primary functional unit which converts the Visio document containing
the drawing of the Block Diagram to the actual diagram objects which are acceptable by the
Audio Processing Engine.
For each Visio Shape on the drawing a corresponding node acceptable by the Audio
Processing Engine is created.
For each Visio connection between two Visio Shapes, a corresponding connection is made
between the appropriate Audio Processing Engine nodes. See section 5.6 about Audio
Processing Engine classes.
When the complete Block Diagram Visio drawing is converted to the representation
acceptable by the Audio Processing Engine, SFGBuilder commands the Audio Processing
Engines to derive the Shock Response vector coefficients representing the Block Diagram.
74
6.2.7 Audio Engine Form Class
Audio engine form class is the user interface defined in section 6.1.4. It summons the Audio
Streaming Engine along with the Audio Processing Engine to playback audio signals
originated in Wave files and applying real time audio processing on the streaming signal.
75
7 Visual Studio Solution Structure
This chapter introduces the structure of the Visual Studio Solution implementing the Audio
Streaming Engine, Audio Processing Engine and Visual Audio Processor.
A Visual Studio Solution is a collection of Visual Studio Projects.
Each Project is one unit of compilation generating a single binary target file such as EXE, DLL
or LIB.
The project in our solution may be a native C++ code project or .NET C# project or a CLR.
Our complete solution is made of 16 Visual Studio projects.
7.1 Visual Studio Projects
The following section introduces the Visual Studio projects which forms our software
solution.
Project
AudioLib\Common
AudioLib\Streamer
AudioLib\Devices\WaveInput
AudioLib\Devices\XAudio2Output
AudioLib\DSP\FFTLib
AudioLib\DSP\LTISystem
AudioLib\DSP\SFG
AudioLib\DSP\Filters
AudioLib\AudioLib
Test\AudioTerminal
Test\ConsoleApplication1
Test\FFTCorrectnessTest
Test\FFTSpeedTest
Test\MyTest
.Net.AudioLib\.Net.AudioLib
VisualAudioProcessor
Abstract
Common Data Types and Functionality
Audio Streaming Engine
Wave input device
XAudio2 output device
Multi-Core Fast Fourier Transform
LTI Audio Processing System
Signal Flow Graph (Block Diagram)
Frequency Filters
Commulative library both the Audio
Streaming Engine and Audio
Processing Engine
Console Application for testing the
C++ API of the Audio Streaming Engine
and Audio Processing Engine.
Console Application for testing the
.NET API of the Audio Streaming
Engine and Audio Processing Engine.
Correctness test of the Multi Core FFT.
Performance test of the Multi Core
FFT versus the Single Core FFT.
Multiband filter test
.NET Commulative library both for the
Audio Streaming Engine and Audio
Processing Engine
Visual Audio Processor Application
76
Target Binary
NONE
LIB
LIB
LIB
NONE
LIB
LIB
LIB
LIB
Language
C++
C++
C++
C++
C++
C++
C++
C++
C++
EXE
C++
EXE
C++
EXE
EXE
C++
C++
EXE
DLL
CLR
EXE
.NET C#
7.2 Audio Engine C++ Namespace
``
audio
dsp
CStreamer
devices
IOutputDevice
IAudioProcessor
CLTISystem : public IAudioProcessor
sfg
IInputDevice
CWaveInput : public IInputDevice
fgIIIAudioProcessor
nodes
filters
CFilter
CNode : public CGraph::CNode
CLowPass : public CFilter
CIn : public CNode
CHighPass : public CFilter
COut : public CNode
CBandPass : public CFilter
CMultiplier : public CNode
CMultiBand : public CFilter
CXAudio2Output : public IOutputDevice
CDelay : public CNode
CAdder : public CNode
CSplitter : public CNode
CBlock : public CNode
CTopology
77
CTopologyException : public std::excpetion
7.3 Audio Engine .NET Namespace
``
Audio
DSP
Streamer
Devices
OutputDevice
AudioProcessor
LTISystem : IAudioProcessor
SFG
InputDevice
WaveInput : IInputDevice
fgIIIAudioProcessor
Nodes
Filters
Filter
Node
LowPass : CFilter
In : CNode
HighPass : CFilter
Out : CNode
BandPass : CFilter
Multiplier : CNode
CustomFilter(*) : Filter
CDAudioInput(*):IInputDevice
XAudio2Output : IOutputDevice
Delay : CNode
Adder : CNode
Splitter : CNode
Block : CNode
Topology
TopologyException : System::Excpetion
78
8 System Requirements
The following section summarizes system requirements.
8.1 Audio Streaming Engine Requirements


Microsoft Windows 7
Microsoft DirectX SDK June 2010
8.2 Audio Processing Engine Requirements


Microsoft Windows 7
Microsoft DirectX SDK June 2010
8.3 Visual Audio Processor Requirements



Microsoft Windows 7
Microsoft DirectX SDK June 2010
Microsoft Visio 2010
79
9 Summary
9.1 Further Development
The primary goal was to design an audio processing engine. Our processing engine is a linear
system which can be programmed, as a main feature, by means of Block Diagrams.




The Block Diagram object model can be extended to support Directed Graphs rather
than Directed Acyclic Graphs and by this allow even more flexible designs.
Extending the Audio Processing Engine and the Audio Streaming Engine to support
Multiple Inputs Multiple Outputs (MIMO System).
The Audio Processing Engine can be extended to support an array of Processing
Engines and a "Super Block Diagram" defining connections and operators between
them. The operators between them can be non-linear operations.
Visual Audio Processor can be rewritten in C++ instead of its current implementation
in C# and abandon its dependency from Microsoft Visio by coding a proprietary
drawing solution dedicated for this system and by this increasing the scale of Visual
Audio Processor.
9.2 Thanks and Gratitude
I would like to thank the following:


Dr.Ilana David for supervising and personal care about this project.
Victor Kulikov for supplying all the necessary tools to complete the project.
80
10 References








234122 - Introduction to Systems Programming.
Effective C++ by Scot Meyers.
More Effective C++ by Scot Meyers.
104214 – Fourier-Series and Integral Transformations.
044130 – Signals and Systems.
044198 – Introduction to Digital Signal Processing.
046745 – Digital Signal Processing.
234247 – Algorithms 1.
81