Implementation of speech modification on hardware

Transcription

School of Engineering and the Built Environment
Implementation of speech
modification on hardware
Author: Marco Gloeckler (40050956)
Honours Bachelor Thesis 2011/2012
Supervisor:
1. Mr. Jay Hoy
2. Mr. James MCWhinnie
German Supervisor:
Prof. Dr. D. Pross
Abstract
The main objective of this dissertation was to implement an algorithm called “Phase
Vocoder” onto a hardware platform. This algorithm is used to time compress or
expand audio or speech. Therefore a Rapid Prototyping Workflow was used.
The whole range of developing a product is covered. This includes choosing suitable
hardware
to
implement
the
“Phase
Vocoder”.
Furthermore
the
software
Matlab/Simulink was evaluated and chosen because the tool allows Rapid
Prototyping.
The engineered workflow enables to develop a program in an abstract level and build
an executable program with one click.
The “Phase Vocoder” algorithm itself was evaluated and compared to another time
stretching method. It was then implemented onto the hardware platform, which shows
the differences between simulation and an executable version for hardware.
Marco Gloeckler
ii
Acknowledgement
This thesis has benefited greatly from the support of many people, some of whom I
would sincerely like to thank here.
To begin with, I am really grateful for the help of my supervisor Mr. Jay Hoy. I also
want to thank my second supervisor and German supervisor Mr. James MCWhinnie
and Prof. Dr. D. Pross.
Furthermore I want to thank the technicians of the Edinburgh Napier University who
helped me to set up a computer where I can work with.
Finally I want to thank my family and my friends who supported me and gave me the
opportunity to write the thesis in Edinburgh.
Affirmation
Hereby I, Marco Gloeckler, affirm that I wrote the present thesis without any
inadmissible help by a third party and without using any other means than indicated.
Thoughts that were taken directly or indirectly from other sources are indicated as
such. This thesis has not been presented to any other examination board in this or a
similar form.
I have written this dissertation at Edinburgh Napier University under the scientific
supervision of Mr. Jay Hoy.
Marco Gloeckler, Mat. 40050956, Edinburgh, Scotland, 30.03.2012
Marco Gloeckler
iii
Table of contents
Abstract ..................................................................................................................... ii
Acknowledgement ................................................................................................... iii
Affirmation ............................................................................................................... iii
Table of contents ..................................................................................................... iv
1
Introduction ........................................................................................................ 1
1.1
1.2
1.3
1.4
1.5
2
1.5.2
FPGA or DSP ...................................................................................... 5
1.5.3
Fixed-point or floating-point ................................................................. 7
1.5.3.1
Fixed-point representation of numbers ..................................... 7
1.5.3.2
Floating-point representation of numbers ................................. 8
1.5.3.3
Comparison .............................................................................. 8
Preparation ....................................................................................................... 10
2.1
3
Motivation ...................................................................................................... 1
Objectives ...................................................................................................... 2
Approach ....................................................................................................... 3
Rapid Prototyping Workflow .......................................................................... 4
General Information about DSP..................................................................... 5
1.5.1 What is DSP? ...................................................................................... 5
Software ...................................................................................................... 10
2.1.1 Introduction of Tools .......................................................................... 10
2.1.1.1
Code Composer Studio (CCS) ................................................ 10
2.1.1.2
MATLAB/ Simulink .................................................................. 11
2.1.1.3
LabVIEW ................................................................................. 12
2.2
2.3
Decision process of the suitable tool and board .......................................... 12
Hardware ..................................................................................................... 14
2.3.1 TMS320C6713 DSP Starter Kit ......................................................... 14
2.4
The Rapid Prototyping Workflow ................................................................. 17
Theory ............................................................................................................... 20
3.1
3.2
Time Domain Harmonic Scaling .................................................................. 20
Phase Vocoder ............................................................................................ 22
3.2.1 Overview ........................................................................................... 22
3.2.2
Detail ................................................................................................. 23
Marco Gloeckler
iv
4
Algorithm and Simulation ............................................................................... 28
4.1
5
Result of Simulation..................................................................................... 33
Implementation on Hardware .......................................................................... 34
5.1
5.2
The different parts of the model ................................................................... 36
5.1.1 FindEndOfFile ................................................................................... 36
5.1.2
“Processing” subsystem .................................................................... 38
5.1.3
“Play” subsystem ............................................................................... 39
5.1.4
“Embedded Control Unit” ................................................................... 40
5.1.5
“LedFlash” subsystem ....................................................................... 43
5.1.6
Delay ................................................................................................. 44
Problems within the design .......................................................................... 44
6
Conclusion ....................................................................................................... 46
7
References........................................................................................................ 47
8
Table of figures ................................................................................................ 49
9
Appendix ........................................................................................................... 50
9.1
Configure MATLAB/Simulink and CCS 3.3 .................................................. 50
9.1.1 CCS ................................................................................................... 50
9.1.2
9.2
9.3
9.4
9.5
MATLAB (29)(30) .............................................................................. 52
Used software versions ............................................................................... 54
RTW file description .................................................................................... 55
CCS File Type (31) ...................................................................................... 57
Code of the “Embedded Control Unit” ......................................................... 58
Marco Gloeckler
v
Introduction
1 Introduction
1.1 Motivation
Nowadays everything in the field of audio, video and picture processing, industrial
control and communication systems is using digital signal processing (DSP).
Therefore it is important for students and engineers in this field to know the basics
and how to work with it.
In the past digital signal processing was described as very complex and
mathematical. Today, DSP can also be described on an abstract level like block
diagrams or state flows.
Developing algorithms in the field of DSP Rapid Prototyping is often used nowadays.
The goal is to quickly get from a simulation to a prototype. This type of development
allows transferring the developed DSP algorithm from the high level, like state flows
onto hardware for testing. This process enables to prevent costly production errors.
This procedure should be examined and documented for later developments.
As an example to apply a Rapid Prototyping process in the field of audio or speech
there is an algorithm called “Phase Vocoder”, speeding up or slowing down
audio/speech.
This is useful because studies have shown that people read/hear and understand
faster than talk. It is important however that the pitch itself is not changed (1).
This knowledge can be used to play language files faster for example for an
answering machine or to study from audio CDs.
Changing the speed of language is also used for other applications, like speechrecognition or to convert a 32 seconds radio advertisement in the available frame of
30 seconds.
Slowing down speech can also be useful to generate effects in movies.
Marco Gloeckler
1
Introduction
The range of use in not limited to speech, DJs and producers use this technique to
generate special effects, or to bring two different sound tracks to the same speed to
unite them.
To sum up, there is Rapid Prototyping which includes hardware and software. The
other field of interest is audio/speech time stretching which needs a mathematical
algorithm.
1.2 Objectives
The goal in this project was to develop or use an algorithm to slow down or speed up
speech and implement it on a board with a Digital Signal Processor (DSP). The idea
of the algorithm should be based on the “Phase Vocoder” method.
An important issue is the timbre of the voice, which should sound as natural as
possible after modification.
A Rapid Prototyping Process should be used to allow fast changes and a good
readable program.
The project starts from scratch so the whole development environment had to be set
up. Therefore examinations of different hardware platforms and also suitable
software had to be considered, evaluated and finally organised.
If suitable hardware and software was found a workflow would have to be tested with
some simple examples.
The theory of speech shifting methods had to be analyzed. The goal was to use the
“Phase Vocoder” method but other possibilities had to be read and understood, too.
At the end of the project a running version should be on a hardware board and be
ready for a demonstration. The algorithm on board should be modifiable by switches
on hardware or by computer software.
Marco Gloeckler
2
Introduction
1.3 Approach
As there was no former project or development environment available for this kind of
task, there were a lot of different aspects to consider.
After a first overview what this project includes 3 main topics can be defined.
1. Gathering information in fields like DSP vs. FPGA, fixed-point vs. floatingpoint, processing power and theory of “Phase Vocoder”
2. A software/hardware combination which allows a high-level approach had
to be organised/bought
3. A “Phase Vocoder” algorithm had to be found or developed and adapted
to the hardware needs
First research about the “Phase Vocoder” theory had to be done to get ideas that had
to be considered. As it was also part of the project that the whole development
environment had to be set up other aspects like hardware or software tools had to be
considered as well.
So really basic topics like DSP vs. FPGA and fixed-point vs. floating-point had to be
analyzed. As there was the possibility that the University would not have suitable
hardware, not only processing power and architecture played a key role but also
budget and possible ways of ordering.
But even if a suitable hardware was found it won’t mean that this is the solution
because
the
objective
to
use
a
Rapid
Prototyping
Workflow
needs
a
software/hardware combination.
This leads to main topic 2 where suitable hardware had to be correlated with
software. In this field there were not just technical aspects in demand but also
available licences and costs.
The result of one and two had to be a hardware/software combination permitting a
Rapid Prototyping Workflow allowing to program hardware very easily and fast.
Besides it had to be suitable for an algorithm like the “Phase Vocoder”.
The last step would be the implementation of the “Phase Vocoder”. Therefore
thoughts had to be given about peripheries like microphone and speakers and how
the user can interact with the program.
Marco Gloeckler
3
Introduction
1.4 Rapid Prototyping Workflow
Rapid Prototyping is a method of developing, which is getting more and more
common. With this kind of developing, it is relative “easy and fast“ to implement a
piece of software onto a hardware platform.
A general workflow of software development for a hardware platform is shown in
Figure 1.
Algorithm development
and design
Software coding
Hardware
implementation
Figure 1: Rapid Prototyping process
Those are the key stages which have to be considered and they will need some
iteration till the final product can be released.
There are tools available helping engineers to achieve the development of products
as quickly and cost effective as possible.
In the first step the tools allow developing algorithms in a high-level language (HLL).
This means that after designing in state flows or function blocks the tools translate
these into code, often C or Ada.
This translation can often be specified for different hardware platforms, so the code
will be more efficient and flexible.
With the finished coding the code has to be downloaded onto the hardware.
Sometimes it is done in C or for even faster applications in Assembler.
Marco Gloeckler
4
Introduction
Testing and verification can take place and if errors occur the whole workflow has to
be repeated. But because tools do most of it automatically errors can be fixed
quickly, compared to chips which must be produced and tested. It makes the
developing process much cheaper and faster.
Also a change of hardware platform can be done easily as the adaption can be done
in the tool which generates the HLL (2)(3)(4).
1.5 General Information about DSP
1.5.1 What is DSP?
The term DSP stands for digital signal processing but also for digital signal
processor.
Digital signal processing is a field of communications engineering and is engaged in
the processing of digital signals with the help of digital systems.
In practice almost all of the recording, transmission and storage methods for images,
movies and sound is based on digital processing of the corresponding signals.
The main advantage in DSP systems is that very complex algorithms and filters can
be implemented, even in real-time.
The hardware platforms used for signal processing are mostly digital signal
processors or Field Programmable Gate Arrays (FPGA).
Such chips are optimised for digital signal processing, which means that they can do
complex calculations extremely fast.
1.5.2 FPGA or DSP
The DSP is a specialised microprocessor, mostly programmed in C or for better
performance in Assembler.
It is optimised to do complex maths-intensive tasks and is limited by the useful
operations it can do per clock and the clock itself.
Marco Gloeckler
5
Introduction
DSP have operations specialised for the fast signal processing called MAC (Multiply,
Add, and Accumulate). This operation can be performed in one clock in a DSP
whereas an ordinary processor would need 70 clocks (5).
An FPGA, however, is an integrated circuit (IC) of digital technology, in which a logic
circuit can be programmed.
Due to the specific configuration of internal structures (gates) a variety of circuits can
be created. Starting with low complexity circuits, such as multipliers, registers and
adders, to highly complex circuits such as microprocessors.
FPGAs are used in all areas of digital technology, but above all, where quick
response and flexible modification of the circuit are required.
The performance is limited by the number of gates in the chip and the clock rate.
Because of two completely different approaches to build the chip both have
advantages and disadvantages.
If the sampling rate exceeds a few MHz, it is difficult for a DSP to process data
without loss. This is due to the access to shared resources like memory or busses.
An FPGA, however, with its fixed internal structure allows high data rates of I/O.
A DSP is designed so that its entities can be reused. Thus, the multipliers used for
the FFT can be used for filters or other things afterwards. In a FPGA the reusing is
hard to achieve and is normally not used.
Therefore, a DSP is capable of working with huge and different programs. It can
perform a big context switch by loading another part of the program.
The FPGA has to have a routine to reconfigure the FPGA which can take a long time,
but it is necessary for huge programs because they can’t fit on one FPGA because of
its limited gates.
One major factor in the industry is also the costs, a DSP is cheaper than their
counterparts in FPGA logic.
Technology benefits due to the diversity of advantages. Thus, complex programs
should be split between a DSP and FPGA.
Marco Gloeckler
6
Introduction
To summarize a DSP should be used when the sampling rate is low and the
complexity of the program is high, but other factors like available tools and
background of the engineer are important and must be considered in every project
(6)(7).
For the project of the "Phase Vocoder" both hardware platforms are viable because
the complexity is not a problem for current DSP or FPGA (2)(3)(4).
1.5.3 Fixed-point or floating-point
Because of finite word length in digital systems there are two ways to represent
numbers, fixed-point and floating-point. Both have advantages and disadvantages. A
brief discussion will be provided, further information can be found in (4).
1.5.3.1 Fixed-point representation of numbers
Fixed-point is a generalization of the decimal format.
The most significant bit (MSB) is used for the sign, the remaining binary digits for the
numeric representation. Like decimal numbers the fixed-point numbers have the
integer part left from the radix point and the fractional digits to the right.
Due to the fixed position of the radix point less calculations are required than with
floating-point numbers. Furthermore the conversion and correction necessary for
multiplications and divisions can be replaced by fast shift operations. So it takes less
processing power to calculate and the calculation can be done easier.
The main problem with this number representation is rounding errors and overflows.
So it is possible that in a multiplication the range of numbers is insufficient, and a
huge number will become negative because it runs out of the range - arithmetic
overflow. Therefore, the developer has to take care of this and scale the numbers in
the development which is time consuming and fault-prone.
To minimize the rounding errors today's processors with 32 or even 64 bits normally
have double the amount of bits for intermediate values within the accumulators (3).
Marco Gloeckler
7
Introduction
Fixed-point operations simplify numerical operations, they save space, but require a
compromise between accuracy and dynamics (4).
1.5.3.2 Floating-point representation of numbers
When performing calculations using floating point numbers each and every
intermediate result is scaled individually. This scaling of each intermediate result
requires additional computational effort and makes it more complex.
Floating-point numbers can be represented with arbitrary bases, generally
with b as base, m as mantissa and s as sign.
Computing systems use b = 2, the dual system and also a normalization of the
mantissa. This normalization is important and limits the range of the mantissa to
. Because the mantissa always starts with a 1 it is never written (hidden
bit). Thus, one bit is saved (8).
Common 32-bit processors (e.g. TMSc67xx) have 23 bit for mantissa, 1 bit for sign
and 8 bit for the exponent. But with the one bit saved for the mantissa there are 24 bit
for resolution (3).
But there are also other world lengths like 64 bit called “double” or 128 bit “quad” on
hand.
This is standardized from IEEE and is called IEEE Std. 754-2008 (8).
1.5.3.3 Comparison
The floating-point processors of today give a high dynamic range and a good
resolution. Thus, in most of the cases the limitation of the range and the accuracy
can be ignored, which makes the development easier.
This is in contrast to fixed-point designs, where the engineer has to implement
scaling factors to protect against arithmetic overflow. This is very time consuming and
therefore it sometimes can be justified to use a floating-point processor. Especially
where development costs are high and production volumes are low.
Marco Gloeckler
8
Introduction
To sum up, the advantages of fixed-point are hardware is cheaper and sometimes
faster, but the floating-point processors are more flexible, easier to handle and
numerically more precise. Therefore often mix of both platforms is used to combine
both advantages (3).
For the “Phase Vocoder” both representations would be suitable but other aspects
had to be taken in account, as described in section 2.2.
Marco Gloeckler
9
Preparation
2 Preparation
Prior the development of the algorithm being started a suitable development
environment had to be found. Therefore software and compatible hardware had to be
chosen.
As known from the introduction chapter the kind of processors plays a minor role. So
it's not important whether floating-point or fixed-point numbers or whether FPGA or
DSP is used. Although a floating-point processor is preferred because the
development needs fewer thoughts about the data types and normalization.
Important factors were the availability, costs and sufficient performance for the
required algorithm. However the major factor was the compatibility of hardware and
software, which was a difficult part.
In Edinburgh Napier University the TMS320C6711 DSP Starter Kit (DSK) from Texas
Instruments was available. Therefore the board and the software required to use it
were evaluated first.
2.1 Software
If working with any processor of Texas Instruments the Code Composer Studio
(CCS) is needed. So compatible software for CCS had to be found and evaluated.
An overview of possible tools for a Rapid Prototyping process with a Texas
Instrument processor had to be worked out.
The tools are introduced briefly in the following.
2.1.1 Introduction of Tools
2.1.1.1 Code Composer Studio (CCS)
“Code Composer Studio is an integrated development environment (IDE) for Texas
Instruments’ (TI) embedded processor families. CCS comprises a suite of tools used
to develop and debug embedded applications. It includes compilers for each of TI's
Marco Gloeckler
10
Preparation
device families, source code editor, project build environment, debugger, profiler,
simulators, real-time operating system and many other features.”(9)
With this tool it is easy to program hardware on a low level. It enables developing and
using C-code to program it on the DSK.
Since it is very extensive and complex to write programs in C or C++ some programs
get presented in the following to simplify the development. One tool could be
Matlab/Simulink another one Labview, as they both generate the C-Code
automatically.
2.1.1.2 MATLAB/ Simulink
“MATLAB is a high-level language and interactive environment that enables you to
perform computationally intensive tasks faster than with traditional programming
languages such as C, C++, and Fortran.”(10)
“Simulink is an environment for multidomain simulation and Model-Based Design for
dynamic and embedded systems. It provides an interactive graphical environment
and a customizable set of block libraries that let you design, simulate, implement, and
test a variety of time-varying systems, including communications, controls, signal
processing, video processing, and image processing.”(10)
The combination of Matlab and Simulink is very popular and very well documented. It
has a lot of increments and can also be used with third-party products which are
directly implemented in the software. With this tool it is possible to develop a program
in an abstract level and write it directly onto hardware.
It also supports third-party products such as the CCS but also hardware directly as
some processors of Texas Instruments.
Marco Gloeckler
11
Preparation
2.1.1.3 LabVIEW
“LabVIEW is a graphical programming environment used to develop sophisticated
measurement, test, and control systems using intuitive graphical icons and wires that
resemble a flowchart. It offers unrivalled integration with thousands of hardware
devices and provides hundreds of built-in libraries for advanced analysis and data
visualization – all for creating virtual instrumentation.”(11)
2.2 Decision process of the suitable tool and board
As discussed already, the C6711 DSK has been evaluated as a first step. Since it
was clear that CCS is required in order to use the C6711, tools were sought to work
with CCS.
It was started with Matlab/Simulink, as this is the standard tool for developments of
this kind. In addition, licenses existed in school and there was previous knowledge of
Matlab.
LabView has not been evaluated in detail because a solution was found using
MATLAB. Besides the license of the tool with around 2500 GBP was too expensive
for the project.
Resulting from initial research with Matlab/Simulink and CCS there are ways to use a
DSP board with Simulink. To make this possible, however, extensions for
Matlab/Simulink are necessary. Thus, extensions such as Target Support library,
Embedded Coder, Embedded Target for TI C6000, Real-Time Library (RTW), IDELink, Developer's Kit for Texas Instruments etc. are necessary.
The extension names vary with the versions of MATLAB and are sometimes
combined into suites. This makes it very difficult and time-consuming to get an
overview of the enhancements really needed. Since these versions also must be
compatible with CCS, it was difficult to find the appropriate version and organize it.
The finally used software versions are listed in 9.2.
The extensions are necessary to generate optimized C-code for the DSK. This allows
implementing real-time programs on a DSP board. They also allow hardware support
Marco Gloeckler
12
Preparation
for various manufacturers of processors in Simulink. To work with the processors,
special settings for code development must be applied but also special Simulink
function blocks are required.
These blocks are contained in “Embedded Coder” but the supported processors differ
on the version of the “Embedded Coder”. These blocks are optimized and include
functions such as multiplication, FFT, and filtering, as well as specially adapted
blocks to tap of data from AD converter or to control the LEDs on the DSK.
Unfortunately there were no function blocks for the C6711 DSK in the existing
Matlab/Simulink version. In addition, the CCS available for the C6711 was 2.1 and
does not support the Matlab extension "IDE-Link".
This "IDE-Link" also called ”Link for Code Composer Studio” is important to link the
two tools, CCS and Matlab, ensuring automatic code generation. This makes it
possible to link from the abstract Simulink model on the DSK without further
interaction, see 2.4.
Without “IDE-Link” it is still possible to download the model on the board. However,
this is connected with more effort because the generated C-code from Simulink must
be loaded into CCS in a project with several other files. An explanation of the files
types is attached in 9.4.
But without the appropriate library in Simulink it is a really difficult to develop a model
because there are no blocks, which allow to access data like audio stream or LEDs.
To make those things possible the functions must be written by hand, which would be
a huge expense and would have brought delay to the project.
Thus, it made more sense to look for a new board. The other alternative would be a
suitable Matlab and CCS version for C6711 but because of costs it was discarded.
The DSK board, fully compatible with Matlab and CCS was 330 GBP. In contrast
Matlab with the required extensions would be several thousand pounds.
Therefore it was decided to buy the successor to the C6711 DSK, the C6713 DSK.
This DSK is compatible with the existing Matlab license and with the USB support
allows to use the board with “all” PCs. Another advantage for the C6713 DSK was
Marco Gloeckler
13
Preparation
that training was already done in Matlab and CCS. Thus, this knowledge could be
used later in the project (12) (13).
Furthermore the software/hardware combination can be used for other projects. With
this high-performance C6713 DSK (further information in 2.3.1) it is possible to
develop complex tasks like a DSL-modem.
2.3 Hardware
As already mentioned the TMS320C6711 DSP Starter Kit couldn’t be used because
of software incompatibility. Therefore this board is not described.
2.3.1 TMS320C6713 DSP Starter Kit
“The TMS320C6713 DSP Starter Kit (DSK) developed jointly with Spectrum Digital is
a low-cost development platform designed to speed the development of high
precision applications based on TI´s TMS320C6000 floating point DSP generation.
The kit uses USB communications for true plug-and-play functionality. Both
experienced and novice designers can get started immediately with innovative
product designs with the DSK´s full featured Code Composer Studio™ IDE and
eXpressDSP™ Software which includes DSP/BIOS and Reference Frameworks.”(14)
The TMS320C6713 DSP Starter Kit is the newer version of the TMS320C6711 DSP
Starter Kit.
This DSK with up to 1800 MIPS of processing power allows the developing of
algorithm in fields like networking, communications, imaging and other applications.
Important for the project was the support of USB and enough processing power
(15)(16).
Marco Gloeckler
14
Preparation
Figure 2: Layout DSK C6713 (14)
The features of the DSK:

TMS320C6713 DSP operating at 225 MHz

An AIC23 stereo codec with 8-96 kHz sample rates (8-32 Bit word length)

16 MB of synchronous DRAM

512 KB of non-volatile Flash memory (256 KB usable in default configuration)

4 user accessible LEDs and DIP switches

Software board configuration through registers implemented in CPLD

Configurable boot options

Standard expansion connectors for daughter card use
The CPU is working with very-long instruction words (VLIW) (256 bits wide).
The DSP 6713 interfaces on-board peripherals through a 32-bit wide EMIF bus
(External
Memory
Interface).
The
SDRAM,
Flash
and
CPLD
(Complex
Programmable Logic Device) are all connected to this bus, see Figure 3.
Marco Gloeckler
15
Preparation
Third parties use this expansion of the EMIF bus for video support, memory
extension, other sound codec, etc.
Analogue audio signals are accessed via an on-board AIC23 codec and four 3.5-mm
audio jacks (microphone input, line input, line output and headphone output). The
analogue input can be microphone (fixed gain) or line(boost), the output line-out
(fixed gain) or headphone (adjustable gain).
The CPLD is a programmable logic device used to tie board components together
and has a register-based interface to configure the DSK.
The DSK has 4 LEDs and DIP switches to allow user to work interactive with the
board. To use this interactive method the CPLD register gets read and written.
Figure 3: Functional Block Diagram of the DSK C6713 (14)
Code Composer Studio communicates with the DSK via the integrated JTAG
emulator on-Board. They are connected with a USB interface.
Programs can be downloaded to the board into the SDRAM or Flash. The advantage
of the flash memory is that it will keep the program after a restart of the board.
Marco Gloeckler
16
Preparation
2.4 The Rapid Prototyping Workflow
The principal of the workflow shown in Figure 4 is a simplified representation.
The algorithm developed in Simulink
is saved in "model.mdl". To allow an
efficient C-code generation special
blocks of the Texas Instruments C6x
library should be used. Then the
“IDE-Link” and CCS transfer the code
onto the target.
To use this kind of workflow the
environment has to be set up as
described in the tutorial 9.1.
Figure 4: Workflow Simulink (17)
In truth there are a lot of steps and tools needed to make this workflow running.
As you can see in Figure 5 there are different extensions for Simulink needed.
Marco Gloeckler
17
Preparation
Figure 5: Software pieces used in workflow
First of all there are limitations for the development of the Simulink model, because of
memory management (further described in chapter 5.2). Another difference is the
approach, running the program on hardware rather than in simulation. Thus, it isn’t
possible to halt and start the simulation as it is done in simulation. Therefore it is a
different approach which needs to consider problems like different tasks or memory
management (further information in RTW user guide within the Matlab help).
Testing is different too, because there is no comfortable opportunity to see what
happens when the software is downloaded to the board.
Furthermore not all blocks of the Simulink libraries can be used because some are
not supported for code generation. Some Matlab commands are not supported
either, therefore it can be necessary to write some of the functions manually.
If all limitations are considered and adhered to in developing the model and the setup
of the workflow components as described in 9.1 the code generation can be done
without further manual interaction. This is possible because the different pieces of
software are perfectly chained together with their different tasks.
The RTW will automatically generate Ansii C/C++ code from the Simulink-model. It
also adds the I/O device (driver) as inline S-function to the code.
Marco Gloeckler
18
Preparation
The “Embedded Target for TI C6000” provides RTW with APIs (Advanced
programming interface) which are needed to build code for the C6000 platform. The
generated data types are listed and explained in 9.3.
With the C-code available the “Link for Code Composer Studio“ invokes the CCS and
builds the executable automatically. Therefore a project is generated with different
data types and functions described in 9.4. The link also invokes the program and
downloads it onto the target.
So with one click all of this is done and the program can be tested on the hardware
within the Code Composer Studio.
This workflow can be easily changed to other targets by changing the driver as long
as there are no essential differences in the memory management.
Marco Gloeckler
19
Theory
3 Theory
Definition of “Time Stretching”
“Time Stretching” also known as Time Compression/Expansion, which means the
slowdown or acceleration of an audio or speech signal without altering their pitch.
Pitch shifting is to some extent the counterpart, i.e. the change in pitch, but without
changing the tempo.
There are a lot of different methods to change the tempo of a signal.
But there are just two basic approaches the other methods are based on them.
This document focuses on the “Phase Vocoder” method but the Time Domain
Harmonic Scaling (TDHS) is briefly discussed, too.
None of the algorithms is perfect and the functionality is highly dependent on the
input signal, i.e., whether it is a voice, music or sine wave. The degree the signal
should be accelerated or decelerated by plays an important role, too.
In general stretch factors of 130% can be achieved for music signals, for individual
instruments or speech signals even up to 200%.
3.1 Time Domain Harmonic Scaling
The Time Domain Harmonic Scaling technique provides a signal to stretch or to
compress in the time domain, and was developed in 1978 by Lawrence Rabiner and
Ronald Schafer (18).
To manipulate the signal, it is processed in short pieces of the original signal. These
sections may only be a maximum of 40ms, otherwise the human sense of hearing
would notice the manipulation of the signal as it has a resolvableness of 40 ms. For
humans it is not possible to dissolve what happens during a period of 40ms (19).
Marco Gloeckler
20
Theory
Figure 6: Signal before and after modification (20)
An arbitrary choice of the sections can have the effect of phase hit; therefore the
signal must first be examined for its period. This information is determined by the
Autocorrelation Function (ACF) and is used for the section length.
If the input signal is periodical, it can be reduced by integer factors without altering
the pitch. In natural signals (music, language) additional difficulties arise because
there are not two completely identical sections. Thus, there are phase hits again and
the triangulation has to be used to achieve better results.
The triangulation is a method which avoids phase hits by multiplying a triangle
function to every section. In other words, Section A is multiplied by a falling triangle
function, and Section B with a rising, thus the effect of phase hits is avoided.
To slow down a periodic signal the periodic section is just doubled. For natural
signals the triangulation is used again.
The quality of the output signal depends strongly on the determination of the section
length. Signals which have a periodic pattern can be manipulated by this method
very well.
Sound elements of short duration, such as clicks, drums and percussion, are difficult
to process because they have a pulse-like character and are not periodic. With a
maximum of 40ms long blocks of sounds, the pulse-like character sounds twice in a
row. This can be avoided if the maximum length of a section is shortened. As a
result, the processed signal loses much of basses, which argues against short
sections. Therefore the optimum cut-off has to be determined (20)(21)(22).
Marco Gloeckler
21
Theory
3.2 Phase Vocoder
3.2.1 Overview
The first invention of this method was developed 1966 by Flanagan and Golden (23).
Since then there have been a lot of different extensions and improvements, which
are strongly depending of the kind of signal. There is not one perfect algorithm for
music, speech or simple sine waves. The advantages and disadvantages of the
different approaches have to be evaluated for each implementation.
The principal of the “Phase Vocoder” is
shown in Figure 7.
The signal is windowed in small parts of
the signal and then transformed with the
Fast
Fourier
Transformation
(FFT).
Those two steps together are called
Short Time Fourier Transform (STFT).
The next step is the important step to get
a precise result. Basically the “spectral
manipulation” produces a good estimate
of the frequencies within one windowed
signal.
Afterwards the process is inverted with
the Inverse Fast Fourier Transformation
(IFFT), windowed and summed (24)
(25)(26)(27)(27)(28).
To change the tempo of the signal the
overlap factor of the window segments is
changed.
Figure 7: “Phase Vocoder” overview (30)
Marco Gloeckler
22
Theory
3.2.2 Detail
The principal without the “spectral manipulation” can be found in literature under
STFT and it has his limitation in the FFT. The resolution of the FFT is:
3.1
The window length is normally between 512 and 4096 samples. It could be assumed
to take a long window to get a good resolution of the frequency. Unfortunately it isn’t
that simple because with a long window it misses the changes of frequency due to
the fact that the FFT assumes that everything within one frame happens at once.
Therefore a trade off between resolution and accuracy of frequency change must be
done.
Assuming a medium window length of 2048 samples and sampling an audio signal
with 44.1 KHz the resolution will therefore be 21.5 Hz.
For some speech signals this might be acceptable but for audio with a piano for
example the resolution is not good enough. If the fundamental of the piano note is at
80 Hz, there is a mistake of 25%. The piano however has just 6% between
consecutive notes (27).
To get a better frequency resolution without harming the time resolution too much the
“Phase Vocoder” method gets used. This is achieved with the “spectral manipulation”
which is using information in the signal the SFFT ignores.
The first information used is the phase of two samples.
Assuming there is a sinusoidal signal of 220 Hz.
Shown in Figure 8 there is an angle
respectively
at time
at time
.
The signal could have a frequency where within
the angle change from
change from
to
to
directly. It could also
and any number n, where n is
one turn (2π) of the radian.
Figure 8: Phase of 2 samples (29)
Marco Gloeckler
23
Theory
With this information an equation is defined to:
3.2
This equation is not solvable yet n is unknown.
But there is a way to get a good estimation of the
which will be described further.
The “Phase Vocoder” analyses a peak in magnitude within two different frames. Then
the closest
is chosen. This principal is shown in Figure 9.
The 220 Hz sinusoidal signal is the
example signal again. It is windowed and
transformed with FFT. After the FFT the
signal is shown in magnitude and phase.
As there is just a sinusoidal signal the
magnitude spectrum in both frames is
the same but the phases differ. With
these different phases there are values
found for
and
.
The difference in time can be directly
gathered by the window length, overlap
factor and sampling rate, see equation
3.3.
Figure 9: Spectral Manipulation” (26)
3.3
The OverlapFactor describes the samples which overlap from two consecutive
windows. If OverlapFactor is 2 half of the samples of the first window will be used in
the next window.
Marco Gloeckler
24
Theory
Another way to describe the overlap is named HopSize, which is the temporal shift of
the window. Described in an example with WindowLength = 256 samples and
HopSize = 64 samples, the windows overlaps with 256-64=192 samples.
With the time information of equation 3.3 the equation 3.2 for fn can be solved. There
will be not one result but many. Thus, the nearest value to the peak in magnitude
received by the FFT is taken.
To describe it within an example the 220 Hz sinusoidal signal with a sampling rate of
44.1 KHz and an overlap factor of 2 was chosen.
Using just the FFT the result would be 215.3 Hz instead of the 220 Hz of the signal.
With the “spectral manipulation” there will be a more accurate result as shown with
values of the example. The phases corresponding to the magnitude peak are
and
(see Figure 7) and the time difference
is
resulting from equation 3.3.
Inserting those values in equation 3.2 the results for the first 6 n are 47.7472 Hz,
90.8136 Hz, 133.8800 Hz, 176.946 Hz, 220.0129 Hz, and 263.0793 Hz.
The closest frequency to the FFT result of 215.3 Hz is obviously 220.0129 Hz, which
is just 0.0129 Hz away from the real value of the signal. This is vast improvement to
the FFT.
This is not just a coincidence because of well chosen values as Puckette and Brown
(30) showed.
Till now the explanation was restricted to a simple sinusoidal signal. If the signal is
more complex and has more frequencies the algorithm stays the same with the
difference that the operation is repeated for every magnitude peak in the spectrum.
This is reasonable as long as the peaks in magnitude are adequately separated by
the FFT.
With this result of the “spectral manipulation”, where a good estimate of the actual
frequency is available it is possible to do different changes to the signal like reading
Marco Gloeckler
25
Theory
direction inversion, frame shuffling, change the pitch or like in this project timestretching.
In the synthesis part (illustrated in Figure 7) IFFT is used to transform the changed
spectrum back to pieces of the time signal and with the window function it is added to
one time signal.
To change the tempo of the signal the OverlapFactor or HopSize of the window gets
changed, which is obviously making the resulting output file longer or shorter.
If the file is played with the same sampling rate as the input file the speed is changed.
The algorithm described till now is the simplest one to understand and was chosen
therefore. In literature it is referred as “spectral peak following“.
However the used algorithm in Matlab/Simulink in chapter 4 works slightly different,
the theory is explained in the following.
Another implementation
The implementation is basically the same, the difference is that not just the angles of
the magnitude peak are considered but every angle.
This means phases are not chosen corresponding to a peak but to a bin. A bin is an
amplitude/phase pair of data for each channel/band.
A channel or band is used within the FFT. So for example a window length of 512
has 256 channels. This is because of the double sideband of the FFT.
To sum it up if windowing and transforming 512 samples there will be 256 bins.
Those bins will be used for the phase estimation.
This algorithm is calculating the angle for every bin and compares it with the angle of
the same bin from one frame before. So instead of searching maxima in the
magnitude and compare the corresponding phases the algorithm checks every bin.
Marco Gloeckler
26
Theory
This algorithm has another challenge not mentioned so far and is called “phase
unwrapping”. The phases after the FFT are modulo 2π.
In the “spectral peak following” method the n of equation 3.2 could be guessed with
the knowledge of the closest FFT result.
In this implementation however the phase gets unwrapped which means that 360
degree gets added if there is more than one cycle as Figure 10 illustrates.
Figure 10: Phase unwrapping (25)
The unwrapping recovers the precise phase values for each bin and is therefore an
important part to get a god result.
Except for the guessing of the phase/frequency the algorithm stays the same. This
method is implemented in the used algorithm described in chapter 4.
Marco Gloeckler
27
Algorithm and Simulation
4 Algorithm and Simulation
This part will explain how the algorithm in Matlab/Simulink implements the theory of
chapter 3. The algorithm has been developed by Mathworks and is available as an
example within Simulink.
To open the example type dsppitchtime in the Matlab command line.
The algorithm takes three parameters to define it. WindowLength, AnalysisHopSize
and SynthesisesHopSize. The last 2 parameters are similar to the former described
OverlapFactor. The real meaning will be clear later.
WindowLength must be a number of
where x is a positive integer because FFT
allows just these numbers. Furthermore the HopSizes must be smaller than the
window length.
The Top-level shown in Figure 11 of the algorithm is similar to Figure 7.
Figure 11: "Phase Vocoder" Simulink
The “Overlap ST-FFT” is responsible for the windowing and transformation in
frequency domain. After splitting the transformed values in magnitude and phase the
“Synthesis Phase Calculation” does the spectral manipulation and returns a better
phase estimate. After combining angle and magnitude together the “Overlap ISTFFT” returns the windowing and the IFFT changes the signal back to a time domain.
The last block is just a multiplication and is responsible to rescale the values to the
input range.
To describe the algorithm following values were chosen:
WindowLength=512, AnalysisHopSize=64 and SynthesisesHopSize=90.
Marco Gloeckler
28
The following will take a closer look at the subsystems of the algorithm. Starting with
the “Overlap ST-FFT” in Figure 12.
Figure 12: “Overlap ST-FFT” detail
This subsystem changes the time signal in a frequency domain with FFT and the
“hanning window” function. It also adds an overlap. Therefore the “Overlap buffer” is
used. The numbers at the signal paths describe the dimensions of the signal. This
means that the input is a frame with 64 samples and at the output there are 512
samples. The 512 is because of the WindowLength. The overlap of the frames in
samples is WindowLength- AnalysisHopSize=448.
There are other windows like “hamming window” which can be used. Further
information to the “hanning window” and why this is a good window function can be
read in (31).
After splitting the signal into magnitude and phase the phase manipulation takes
place (see Figure 13). The phases at the input are normalized between –π and π.
Figure 13: "Synthesis Phase Calculation" detail
This is the complex part and needs some focus. The basic idea is to get a good
frequency estimate by comparing the phases within each bin.
Therefore the addition block takes the actual phases of the frame and subtracts the
phases from the frame before, the result is Δφ.
Marco Gloeckler
29
The expected phase because of the time difference between the bins is subtracted
from Δφ. This is done with the constant frame generated with the function shown in
Figure 13 number 3. Thus the nominal phase for each bin is subtracted from Δφ.
To illustrate this there are the signals from the 4th bin of the signal shown in Figure
14.
Figure 14: Signal 1,2,3,4
To apply phase unwrapping the subsystem "Principal Argument” is used.
This block was developed by Carlo Drioli and is shown in Figure 15.
Figure 15: "Principal Argument" detail
This block computes the principal argument of the nominal initial phase of each
frame.
After this subsystem the expected phase value for the bin gets added because it was
subtracted before. This happens again with the constant block shown in number 3 in
Figure 13.
Marco Gloeckler
30
This is shown for the 4th bin in Figure 16. There can be seen that between signal 5
and 6 is just a small difference in the y-scale because of the added nominal phase
from number 3. This difference is getting larger from bin to bin.
Figure 16: Signal 4,5,6
As now the real phase increment is available it is rescaled with the
.
This rescaling is needed because if changing the time scale the phase changes
occur in a longer time. In other words if there is a 45° change in consecutive bins and
the time scale gets changed it would result in altering the frequency. This happens
because the IFFT spreads the bins further apart and changes the frequency as it now
occurs over a longer time interval.
To prevent this rescaling is used with the time stretching factor.
Marco Gloeckler
31
After that the values get accumulated frame by frame. This is shown in Figure 17.
Figure 17: Signal 7,8
As shown the phase increment from the actual bin gets added to the phase
increment of the last phase. So there is a continuous slope of phase.
Now the optimized phase is available it gets combined with the magnitude again.
The signal gets transformed back into time domain and multiplied with the window
function as illustrated in Figure 18.
Figure 18: “Overlap IST-FFT” detail
In the last step in the subsystem the overlap gets added with the “OverlapAndAdd”
block. The output is now 90 samples per frame defined by the SynthesisesHopSize.
So the time scale got changed with the factor of
Marco Gloeckler
.
32
4.1 Result of Simulation
With the parameters of the “Phase Vocoder” there can be different speeds of
voice/audio be achieved. The maximum stretching factor without a audible loss
achieved was 2. The following values were chosen: WindowLength=512,
AnalysisHopSize=32 and SynthesisesHopSize=64.
The input signal speech signal was 3 seconds long and is shown in Figure 19.
Figure 19: Input signal
The output signal is shown in Figure 20 with the stretching factor of 2 and is therefore
6 seconds long.
Figure 20: Output signal
The scope of both signals shows that they are not exactly the same but when hearing
them there is now recognisable loss.
Marco Gloeckler
33
Implementation on Hardware
5 Implementation on Hardware
The implementation on the hardware was tricky because the “Phase Vocoder” is not
a real-time application.
This is in the nature of the applied processing.
Consider talking into a microphone and slowing the speech down with a factor of 2.
So the algorithm would always just have processed half of the input. So after 1
minute of talking just 30 seconds could have been heard. The other values must be
stored in the memory and would cause a buffer overflow if talking for a long time.
Considering time compression would be even worse because the algorithm had to
process values which were not even spoken. After one minute of talking it should
already had an output of 2 minutes, which is obviously not possible.
Thus another implementation had to be chosen. The general idea was to implement
a “Processing” and a “Play” block. So when the input signal is recorded it gets
processed and saved into the memory. Afterwards the processed file in the memory
gets played and the user can hear it.
The input file was not a microphone signal but a sample voice signal which was
loaded into Simulink as a variable. To use a microphone would just need another
subsystem but is not a real change to the design.
The top level design of the Simulink model is shown in Figure 21.
Figure 21: Top-Level Simulink
Marco Gloeckler
34
At the left top is the “C6713DSK” block where parameters for the code generation are
set.
The other blocks are used for controlling the algorithm. As shown the dip switch is
used as input for the “Embedded Control Unit” to work interactive. This block controls
the other 3 blocks which are used for flashing LEDs to show the user what is
happening, to start “Processing” the signal and to enable the “Play” block.
To get a good design much time was spent in the Simulink help file to read about
pros and cons of different subsystems.
The result was the “enabled subsystem” because this block executes the subsystem
as long as there is a “1” at the “enable” input. This was considered as a good solution
because generating a “1” is easy and could be done with a lot of different blocks. It
also allows working with different sample times within one system which was
important as the control of the subsystems shouldn’t work with a high sample time.
Using a high sample time in the control would use a lot of processing power and is
unnecessary because the user won’t change the configuration a few thousand times
per second.
However the “Processing” and “Play” block must work with a sample time of
Ts=1/8000 because the input file was sampled with this rate.
Not wasting the processing power the control block works not with Ts but with Tdip
with 100ms. This sample time is fast enough to control the “enabled subsystems”.
Because of code generation there were limitations using Simulink blocks and Matlab
commands. This had to be considered while designing the control and led to the final
design.
The management of variables was also difficult and is described in 5.2.
Marco Gloeckler
35
5.1 The different parts of the model
The following subchapters describe the subsystems of the developed model.
5.1.1 FindEndOfFile
Working with a voice example it would be possible to use a fixed processing time for
that file as the file length is known. To make the control more flexible and to make it
possible to load every audio or voice signal it was necessary to find the end of the
input file.
To achieve this the elements of the subsystem “FindEndOfFile” shown in Figure 22
are used.
Figure 22: "FindEndOfFile" subsystem
The “Overlap Buffer1” changes the frame based signal into a sample based one.
After that it is integrated over an amount of samples. 64 samples were chosen and
tested with different examples with a satisfying result.
The integrated values are then compared to nonzero. So if 64 values are not 0 there
will be a 1 at the output (see Figure 23).
The “Rate Transistion1” is needed that this subsystem works together with the slower
working control unit. The “Data type Conversion” change the data type into double as
the “Embedded Control Unit” needs a “double” as input.
Marco Gloeckler
36
Figure 23: "FindEndOfFile" signal
As shown in Figure 23 the output signal is “1” as long as there is a input signal.
This block is used as input for the “Embedded Control Unit” to help to enable and
disable subsystems.
Marco Gloeckler
37
5.1.2 “Processing” subsystem
This subsystem is an “enabled subsystem” which can be seen at the “Enable” block
at the top of the model.
The “enabled subsystem” works as long as there is a logical “1” at the enable input
port. This is managed by the “Embedded Control Unit”.
Figure 24: "Processing" subsystem
The “Input Signal” block reads a variable form the “Model Workspace” which stores
the speech sample “Speech8KHz” and transmits it to the “Phase Vocoder”, which is
doing the calculations as described in chapter 4.
The signal gets rescaled to the normalized input of ±1 and gets written to an output
variable y_pnt which is stored in the “Matlab Workspace”. Working with this
workspace is not a good solution as this block is not working properly with the
“Embedded Target for TI C6000 DSP” support package (further described in 5.2).
The other path is the former described “FindEndOfFile” block used to find the end of
the input variable and terminate the enable when processing of the file is done.
Marco Gloeckler
38
5.1.3 “Play” subsystem
This is also an “enabled subsystem” and the “FindEndOfFileDAC” has the same
function as the “FindEndOfFile” block.
Figure 25: "Play" subsystem
As shown in Figure 25 the former calculated y_pnt signal gets read from the
workspace and rescaled. This is necessary because the DAC block takes as input a
32 bit integer value. As the input file is normalized to ±1 it has to be rescaled to the
whole
scale.
The DAC block outputs the signal to the “line out” port of the DSK where it can be
heard with speakers.
The “FindEndOfFileDAC” is needed to terminate the enable signal of its own
subsystem after playing the file.
Marco Gloeckler
39
5.1.4 “Embedded Control Unit”
This block represents the control of the algorithm and is the tie of all components.
The basic idea of this control is that with the input of the dip switches the other
components in the program like “Processing” or “Play” are controlled. So the dip
switches control the system and allows the user to work interactively with it.
The Embedded Matlab block was chosen because it has the ability to take normal
Matlab commands and is therefore flexible. The Matlab commands were not needed
because the ones who would be useful couldn’t be used because of limitations of
RTW (see 5.2). But this knowledge was achieved while developing. Another
possibility would have been a “Stateflow Chart” but it wouldn’t be that flexible with
using Matlab commands.
The complete code is shown in 9.5. Some parts will be described here to give an
understanding of the working principle.
This block operates with Tdip. This means that every 100ms this block gets
executed.
So first the inputs and outputs are defined as shown in Figure 21.
function [enable,enableplay,ledFlashOut] = fcn(processing,dip, playing)
The explanation will focus on the “Processing” block as the “Play block is quite
similar.
So first there is the definition of some variables and allocation of them.
enablevar=0;
persistent enableFOld;
persistent enableFNew;
enableFNew=processing;
As there is no explicit definition of a type the standard type is used which is double.
Marco Gloeckler
40
The first “if” detects a falling edge of the processing input which comes from the
“FindEndOfFile” block.
The second “if” checks if the dip switches represent an integer 1 and if the processed
file is not at the end. As long as this is true the enablevar is 1 and keeps the
processing alive.
if ((enableFNew==0) && (enableFOld==1))
fileend = 1;
startplay = 1;
% enables the start of the "Play" block
end
if ((dip==1)&& (fileend==0))
enablevar=1;
else
enablevar=0;
end
This can also be seen in the former shown Figure 23. Because after the signal
changes from 1 to 0 the subsystem stops processing which displays no values after 5
seconds.
As last step the enable output hast to be written to enablevar.
enable=enablevar;
This must be done when working with “if”s and direct output variables or the compiler
states errors. The output variable , in this case “enable”, can’t be defined within
“if/else” statements, therefore a new variable “enablevar” is used within the “if/else”
and its final state is assigned to the output “enable”.
The other functions of the Embedded Control Unit besides enabling the processing
are
1. Enabling the “Play” block
2. Control the flashing of LEDs
3. Reset the control
Integer number
Function
of dip switches
0
Reset the variables used for “Processing” and “Play” enables
1
Start the processing of the file with adjacent playing of the file
Marco Gloeckler
41
Another useful implementation would be the change of the “Phase Vocoder”
parameters. So it would be possible to change the speed of the example in different
ways. Unfortunately this isn’t easy to develop as further described in chapter 5.2.
The actual implemented way the system works is shown in Figure 26.
The first signal is the enable signal at the “Processing” block, the second one is the
enable of the “Play” block.
Figure 26: Enable signals
So if the dip switch is “1” the processing starts. As soon as this is finished the “Play”
block gets enabled and the file can be heard directly.
To restart the system, switches has to be a “0” for resetting the variables and set to
“1” again.
Marco Gloeckler
42
5.1.5 “LedFlash” subsystem
As it is not possible for the user to see what happens within the board the LEDs are
used to show at least some information.
So they are generally used to see which dip switches are used. The 4 LEDs
represent an integer number between 0-15; the dip switches do the same.
There are 2 input ports for this subsystem as shown in Figure 27.
Figure 27: “LedFlash” subsystem
The “Enable” port is used to make the LEDs flash with the time of Tdip.
If there is no “Enable” the switch is connecting the “Dip” port to LED so the user can
see and control the choice of the dip switches.
Figure 28: "LedFlash" signals
The first signal in Figure 28 is the “Enable”. The second one is the “Dip” and the last
is the signal at the LED block. So as shown if there is no “Enable” the LEDs
represent the number of the dip switches in this case “1”.
Marco Gloeckler
43
5.1.6 Delay
By now all the blocks were described except for the memory blocks in the top level.
Those blocks are necessary for Simulink to solve the model. If not using these blocks
Matlab states an error that there is an algebraic loop which can’t be solved.
This leads to a small delay when finding the end of the file. So the “Embedded
Control Unit” receives the end of the signal with a delay of Tdip.
However this is not a problem because even if the end of the file wouldn’t be
recognised for some seconds it would just be zeros. When playing it there would be a
longer time of zeros but it wouldn’t be recognised as it isn’t audible.
5.2 Problems within the design
As mentioned before there were limitations when designing for hardware.
One issue was the memory/variables management.
This affected two parts one was not implemented therefore the other one is not
working properly.
Starting with the one which isn’t working is a real issue and affects the system. This
is the reason the system is not working as intended.
The block “Signal To Workspace” and “Signal From Workspace” which are used to
write and read the variable with the speech file can just be used under special
circumstances which don’t apply for this system. This is because of the RTW.
The reason is that normally those blocks write the variable at the end of the
simulation. As there is no end because it is always running on the hardware those
values never get written. So when using the model at the moment it reads the stored
variables from the loaded configuration instead of the processed ones.
However there are solutions but there was no time to implement it. If typing
“rtwdemo_cscpredef” in the Matlab command line there is an example how it can be
done.
What basically should be done is implement a new storage class. Within this class
there has to be a variable to take the file. The signal must be named like the variable
Marco Gloeckler
44
so it gets written directly into the variable. As this is in real time the values can be
read from there then.
The other not implemented feature is changing the “Phase Vocoder” parameters at
runtime.
The reason has also something to do with the storage of data.
In the simulation something like this is done mostly manual. This means that before
starting the simulation the variables are read from a m-file or from a mask. In this
case a mask is used.
The mask opens when clicking onto the “Phase Vocoder” block. There are fields
were a values can be typed in and assigned to a variable name defined in the mask.
To use masked parameters has the benefits of receiving or writing the values within a
simulation. With commands like “getParameter” or “setParameter” these values can
be received and changed while the simulation is running. Unfortunately this is not
possible when developing for hardware.
The solution would be similar to the one before just using different data types.
Another aspect not mentioned so far is “Tunable Parameters”. These values are the
ones which can be changed within runtime. As some parameters in the “Phase
Vocoder” are not tunable because Simulink states that there are internal errors this
needs some further research. From today’s point of view, however, there shouldn’t be
a problem as the “Phase Vocoder” is controlled by the “Embedded Control Unit” and
those values would just be changed if processing is not running because the
subsystem would be disabled. Therefore the errors should be switched of in the RTW
configuration.
Marco Gloeckler
45
Conclusion
6 Conclusion
The project covered different fields of development.
It was possible to set up a new developing environment from scratch. Thanks to this
project upcoming projects can use the implemented workflow and organized
hardware to develop with a Rapid Prototyping approach.
Thus, there is no need to spend a lot of time investigating for hardware needs and
tool workflows as with one click the Simulink model can be downloaded onto the
hardware.
Furthermore the hardware platform is powerful and allows complex algorithms to
perform. So further projects can develop “Pitch shifting” in real time or a DSL-modem.
With the daughter card expansion for the DSK it is even possible to develop in fields
like video/image processing.
Besides the implemented development environment a mathematically interesting
algorithm was implemented. Unfortunately the project run out of time as a lot of time
was spent in setting up the Rapid Prototyping Workflow and finding of suitable
hardware.
Therefore the problem with the memory/variable management couldn’t be solved
within this project, however there is a suitable approach found which would solve it.
The implemented system works in simulation perfectly and the results of the “Phase
Vocoder” up to a scaling factor of 2 could be implemented without audible loss. This
is similar to other people’s research results (24).
The implementation on hardware was not finished because of time issues but a lot of
challenges could be solved and just one topic couldn’t be fully covered.
Very interesting in this project was the fact that there was nothing to build on. This
meant that so many things had to be considered and evaluated. There were not just
technical aspects but also the financial side of the project had to be considered.
So this project gives an insight in the whole developing process from hardware and
software to simulation, implementation on hardware and finally testing.
Marco Gloeckler
46
<References
7 References
1. Arons, Barry. SpeechSkimmer: A System for Interactively Skimming Recorded
Speech. s.l. : ACM Transactions on Computer-Human Interaction, 1997.
2. Bateman, Andy and Paterson-Stephens, Iain. The DSP Handbook: Algorithms,
Applications and Design Techniques. s.l. : Prentice Hall, 2002. 978-0201398519.
3. Kuo, Sen M., Lee, Bob H. and Tian, Wenshun. Real-Time Digital Signal
Processing: Implementations and Applications. s.l. : Wiley, 2003. 978-0470014950.
4. Proakis, John G. and Manolakis, Dimitris K. Digital Signal Processing (4th
Edition). s.l. : Prentice Hall, 2006. 978-0131873742.
5. Akhan, Mehmet and Larson, Keith. DSP Intro Slides. s.l. : University of
Herdfortshire; Texas Instruments, 1998.
6.
Hunt
Engineering.
[Online]
2011
06
09.
[Cited:
22
12
2011.]
http://www.hunteng.co.uk/info/fpga-or-dsp.htm.
7. Poole, Ian. FPGAs for DSP Hardware. Radio-electronics.com. [Online] [Cited: 11
1
2012.]
http://www.radio-electronics.com/info/rf-technology-design/digital-signal-
processing/fpga-dsp.php.
8. IEEE. IEEE Standard for Floating-Point Arithmetic Std. 754-2008 . 2008. 978-07381-5753-5 .
9. Texas Instruments. [Online] [Cited: 18 11 2011.] http://www.ti.com/tool/ccstudio.
10. MathWorks. [Online] [Cited: 18 11 2011.] http://www.mathworks.co.uk.
11. Instruments, National. National Instruments. [Online] [Cited: 21 11 2011.]
http://www.ni.com/labview/whatis/.
12. MathWorks, Inc. Matlab R2009b Producthelp.
13. Mathworks, Inc. Embedded Target for TI C6000 DSP Release Notes.
14. Texas Instruments Inc. TMS320C6713 DSP Starter Kit. Product Information.
15. Inc., Spectrum Digital. TMS320C6713 DSK Module Technical Reference. 2003.
16. Texas Instruments Inc. Datasheet - TMS320c6711.
17. MathWorks. Developing Embedded Targets using Real-Time Workshop
Embedded Coder. 2010.
18. Rabiner, Lawrence. R. and Schafer, Ronald. W. Digital Processing of Speech
Signals. New Jersey : Prentice-Hal,l Inc., 1978.
19. Ostrop, Dennis and Buhr, Daniel de. Time Domain Harmonic Scaling. Köln :
FH Köln, 2007.
Marco Gloeckler
47
<References
20.
Bühler,
Christian
and
Liechti,
Christian.
Veränderung
der
Wiedergabegeschwindigkeit von Musiksignalen. s.l. : Hochschule Rapperswill, 1999.
21. Brennan, David. Time Modification of Speech. Edinburgh : Napier University,
2007/08. Honours Thesis.
22. Adrian, Marti. Time Domain Harmonic Scaling. s.l. : HS Rapperswil, 2002.
23. Flanagan, J. L. and Golden, R. M. Phase Vocoder. s.l. : Bell System Technical
Journal, 1966.
24.
TheDSPDimension.
[Online]
8
1999.
[Cited:
18
11
2011.]
http://www.dspdimension.com.
25. Dolson, Mark. The Phase Vocoder: A Tutorial. s.l. : Computer Music Journal,
1986.
26. Laroche, Jean and Dolson, Mark. New Phase Vocoder Technique for PitchShifting, Harmonizing and Other Exotic Effect. New York : IEEE Workshop on
Applications of Signal Processing to Audio and Acoustics, 1999.
27. Sethares, William A. A Phase Vocoder in Matlab. [Online] [Cited: 2011 11 18.]
http://sethares.engr.wisc.edu/vocoders/phasevocoder.html.
28. Portnoff, Michael R. Implementation of the Digital Phase Vocoder Using the
Fast Fourier Transform. s.l. : IEEE Trans. Acoustics, Speech, and Signal Processing,
1976.
29. Sethares, William A. Rhythm and transforms. s.l. : Springer, 2007. 9781846286391.
30. Puckette, Miller S. and Brown, Judith C. Accuracy of Frequency Estimates
Using the Phase Vocoder. s.l. : IEEE TRANSACTIONS ON SPEECH AND AUDIO
PROCESSING, 1998.
31. Götzen, Amalia De, Bernardini, Nicola and Arfib, Daniel. Traditional (?)
Implementations of a Phase Vocoder: The tricks of the trade. Verona : Proceedings
of the COST G-6 Conference on Digital Audio Effects (DAFX-00), 2000.
32. Thesis, S. Ganapathi’s M.Sc. Introduction to Simulink, Link for CCS. 2006.
33. The MathWorks, Inc. Target for TI C6000™. [Online] [Cited: 12 12 2011.]
http://www.kxcad.net/cae_MATLAB/toolbox/tic6000/f3-108524.html.
34. Murmu, Manas. Application of Digital Signal Processing on TMS320C6713 DSK.
Department of Electronics and Communication Engineering, National Institute Of
Technology, Rourkela. 2008. Bachelor Thesis.
Marco Gloeckler
48
Table of figures
8 Table of figures
Figure 1: Rapid Prototyping process .......................................................................... 4
Figure 2: Layout DSK C6713 (14) ............................................................................ 15
Figure 3: Functional Block Diagram of the DSK C6713 (14)..................................... 16
Figure 4: Workflow Simulink (17) .............................................................................. 17
Figure 5: Software pieces used in workflow.............................................................. 18
Figure 6: Signal before and after modification(20) .................................................... 21
Figure 7: “Phase Vocoder” overview (30) ................................................................. 22
Figure 8: Phase of 2 samples (29) ............................................................................ 23
Figure 9: Spectral Manipulation” (26) ....................................................................... 24
Figure 10: Phase unwrapping (25) ........................................................................... 27
Figure 11: "Phase Vocoder" Simulink ....................................................................... 28
Figure 12: “Overlap ST-FFT” detail ........................................................................... 29
Figure 13: "Synthesis Phase Calculation" detail ....................................................... 29
Figure 14: Signal 1,2,3,4........................................................................................... 30
Figure 15: "Principal Argument" detail ...................................................................... 30
Figure 16: Signal 4,5,6.............................................................................................. 31
Figure 17: Signal 7,8 ................................................................................................ 32
Figure 18: “Overlap IST-FFT” detail .......................................................................... 32
Figure 19: Input signal .............................................................................................. 33
Figure 20: Output signal ........................................................................................... 33
Figure 21: Top-Level Simulink .................................................................................. 34
Figure 22: "FindEndOfFile" subsystem ..................................................................... 36
Figure 23: "FindEndOfFile" signal ............................................................................. 37
Figure 24: "Processing" subsystem .......................................................................... 38
Figure 25: "Play" subsystem ..................................................................................... 39
Figure 26: Enable signals ......................................................................................... 42
Figure 27: “LedFlash” subsystem ............................................................................. 43
Figure 28: "LedFlash" signals ................................................................................... 43
Marco Gloeckler
49
Appendix
9 Appendix
9.1 Configure MATLAB/Simulink and CCS 3.3
9.1.1 CCS
This tutorial will describe how to set the C6713 DSK within the CCS 3.3 up.
After the installation the package with the specific data like drivers, examples etc. has
to be downloaded from the spectrum digital homepage and copied to the installation
path.
Now we can setup the board to the CCS environment.
1. Launch the “Setup CCStudio v3.3” from the start menu
2. Choose the “C6713 DSK” board
3. In the properties(right-click) of “C6713 DSK” choose the “Diagnostic Utility” file
in the installation path
Hint: If later problems occur like “can’t generate data file” you should choose it
manually in these settings
4. As a sub device choose the “TMS320C671x_0” processor
5. In the properties(right-click) of “TMS320C671x_0” choose the GEL file in the
installation path
6. Now you can save it and start the CCS 3.3 software.
It should now initialise the DSK while starting. Sometimes there are problems with the
emulation of the USB. To fix it take the “C6713 DSK Diagnostic Utility” and unplug
the board. Plug it in again and it should work.
Marco Gloeckler
50
Appendix
Also troubles could occur because of a wrong linking.
So if you have built a project (or MATLAB do it automatically) a wrong link could be
registered and it will not compile.
To change this setting you right-click on your *.pjt and click on “Build Options”. You
will see something similar to the following figure. There you have to change the
“include search path” to the installation path with your DSK specific files.
The environment is now ready to work with. There are some nice tutorials for the first
steps like (32) or the help file of CCS which further provides useful information about
the whole program.
To get CCS 3.3 and MATLAB connected you have to choose “connect” from the
“debug” menu.
Now MATLAB can be configured to use CCS.
Marco Gloeckler
51
Appendix
9.1.2 MATLAB (32)(33)
To verify that CCS is properly installed on the system, enter
ccsboardinfo
at the Matlab command line. Matlab should return information similar to the following
listing:
To ensure Embedded Target for TI C6000 DSP is installed, enter
c6000lib
Matlab should display the C6000 block library containing libraries e.g. C6000 DSP
Core Support, C62x DSP Library, C64x DSP Library and most important the C6713
DSK support library.
As we now know that our hardware is addressable and the Simulink libraries are
available we can set up the AMTLAB/Simulink environment.
1. to start Simulink type
simulink
2. Create a new model in Simulink.
3. To open the Configuration Parameters, select
Simulation ->Configuration ->Parameters
4. In the Select tree, chose the Real-Time Workshop category
5. For Target Selection, choose the file “ccslink_ert.tlc” in the Real-Time
Workshop. With this will automatically change the Make command and
Template makefile selections
6. Choose the Optimization category in the Select tree. For Simulation and Code
generation, unselect Block reduction and Implement logic signals.
7. Choose from the select tree the Hardware identification and choose the board
and little endian
Marco Gloeckler
52
Appendix
8. Choose the TI C6000 compiler. Set Symbolic debugging
9. In the Select tree, choose the Debug category. Select Verbose build
10. In the Select tree, choose the Solver category. Ensure that Solver is set to
Fixed type / discrete
11. Set the following Real-Time Workshop run-time options:
- Build action: Build_and_execute
-Interrupt overrun notification method: Print_message
In the model itself you need to add the targetc6713 preferences block. This block
represents your driver and will be included when generating c-code. The default
parameters should be fine in most programs. However if you want to change memory
settings you can do it there.
Marco Gloeckler
53
Appendix
9.2 Used software versions
The software used is listed in the following table.
Module title
Version
Matlab
7.9
Simulink
7.4
Embedded IDE Link
4.0
Real-Time Windows Target
3.4
Real-Time Workshop
7.4
Real-Time Workshop Embedded Coder
5.4
Signal Processing Blockset
6.10
Signal Processing Toolbox
6.12
Target Support Package
4.0
Code Composer Studio
3.3.81.6
CCS 3.3 driver package
CCSPlatinum_v30330
Marco Gloeckler
54
Appendix
9.3 RTW file description
Table 1: RTW-files description (12)
File
model.c
Description
or .cpp
Contains entry points for code implementing the model algorithm
(for example, model_step, model_initialize, and
model_terminate).
model_private.h
Contains local macros and local data that are required by the model
and subsystems. This file is included by the generated source files in
the model. You do not need to include model_private.h when
interfacing hand-written code to a model.
model.h
Declares model data structures and a public interface to the model
entry points and data structures. Also provides an interface to the
real-time model data structure (model_M) with accessor macros.
model.h is included by subsystem .c or .cpp files in the model.
If you are interfacing your hand-written code to generated code for
one or more models, you should include model.h for each model to
which you want to interface.
model_data.c
(conditional)
or .cpp model_data.c or .cpp is conditionally generated. It contains the
declarations for the parameters data structure, the constant block I/O
data structure, and any zero representations used for the model's
structure data types. If these data structures and zero representations
are not used in the model, model_data.c or .cpp is not generated.
Note that these structures and zero representations are declared
extern in model.h.
model_types.h
Provides forward declarations for the real-time model data structure
and the parameters data structure. These may be needed by function
declarations of reusable functions. Also provides type definitions for
user-defined types used by the model.
rtwtypes.h
Defines data types, structures and macros required by Real-Time
Workshop Embedded Coder generated code. Most other generated
code modules require these definitions.
ert_main.c
or .cpp
(optional)
autobuild.h
(optional)
This file is generated only if the Generate an example main
program option is on. (This option is on by default.) See Generate
an example main program.
This file is generated only if the Generate an example main
program option is off. (See Generate an example main program.)
autobuild.h
contains #include directives required by the static
version of the ert_main.c main program module. Since the static
ert_main.c is not created at code generation time, it includes
autobuild.h to access model-specific data structures and entry
Marco Gloeckler
55
Appendix
File
Description
points.
See Static Main Program Module for further information.
model_capi.c
model_capi.h
(optional)
or .cpp Provides data structures that enable a running program to access
model parameters and signals without use of external mode. To learn
how to generate and use the model_capi.c or .cpp and .h files, see
the Monitoring Signals With the C API chapter in the Real-Time
Workshop documentation.
Marco Gloeckler
56
Appendix
9.4 CCS File Type (34)
Explanation of the important file types in CCS
1. file.pjt: to create and build a project named file
2. file.c: C source program
3. file.asm: assembly source program created by the user, by the C compiler, or by
the linear
optimizer
4. file.sa: linear assembly source program. The linear optimizer uses file.sa as input
to produce an assembly program file.asm
5. file.h: header support file
6. file.lib: library file, such as the run-time support library file rts6700.lib
7. file.cmd: linker command file that maps sections to memory
8. file.obj: object file created by the assembler
9. file.out: executable file created by the linker to be loaded and run on the C6713
processor
10. file.cdb: configuration file when using DSP/BIOS
Marco Gloeckler
57
Appendix
9.5 Code of the “Embedded Control Unit”
function [enable,enableplay,ledFlashOut] = fcn(processing,dip, playing)
%#eml
%----------------------------------------------------------------------% Defining Variables
% ***
enablevar=0;
enableplayvar=0;
persistent ledFlash;
persistent fileend;
persistent fileendplay;
persistent startplay;
persistent playend;
persistent enableFOld;
persistent enableFNew;
persistent enablePOld;
persistent enablePNew;
% ***
% Initialise persistent variables
% ***
if isempty(ledFlash)
ledFlash = 0;
end
if isempty(fileend)
fileend = 0;
end
if isempty(fileendplay)
fileendplay = 0;
end
if isempty(startplay)
startplay = 0;
end
if isempty(playend)
playend = 0;
end
if isempty(enableFOld)
enableFOld = 0;
end
if isempty(enableFNew)
enableFNew = 0;
end
if isempty(enablePOld)
enablePOld = 0;
end
if isempty(enablePNew)
enablePNew = 0;
end
% ***
%----------------------------------------------------------------------% Defining start values
% ***
enableFNew=processing;
dipnew=dip;
enablePNew=playing;
% ***
Marco Gloeckler
58
Appendix
%----------------------------------------------------------------------% Control to enable the "Processing" block
% ***
if ((enableFNew==0) && (enableFOld==1))
fileend = 1;
startplay = 1;
% enables the start of the "Play" block
end
if ((dip==1)&& (fileend==0))
enablevar=1;
else
enablevar=0;
end
%
% ***
% Control to enable the "Play block"
% ***
%
if ((dip==1) && (enablePNew==0) && (enablePOld==1))
fileendplay = 1;
end
if ((dip==1) && (startplay==1)&& (fileendplay==0))
enableplayvar=1;
else
enableplayvar=0;
end
%
% ***
%
% Reset of variables
% ***
if (dip==0)
fileendplay=0;
enableplayvar=0;
enablevar=0;
startplay=0;
fileend=0;
ledFlash=0;
end
% ***
% Toggle enable for LEDs
% ***
if ((ledFlash==0) && (processing==1))
ledFlash=1;
else
ledFlash=0;
end
% ***
Marco Gloeckler
59
Appendix
%----------------------------------------------------------------------% Writing output and set variable for next step
% ***
enable=enablevar;
enableplay=enableplayvar;
ledFlashOut=ledFlash;
enableFOld=enableFNew;
enablePOld=enablePNew;
% ***
%#eml
end
Marco Gloeckler
60

Implementation of speech modification on hardware

Transcription

Similar documents

Anbindung von Echtzeitsimulationsmodellen auf

Lasix List