Implementation of speech modification on hardware
Transcription
Implementation of speech modification on hardware
School of Engineering and the Built Environment Implementation of speech modification on hardware Author: Marco Gloeckler (40050956) Honours Bachelor Thesis 2011/2012 Supervisor: 1. Mr. Jay Hoy 2. Mr. James MCWhinnie German Supervisor: Prof. Dr. D. Pross Abstract The main objective of this dissertation was to implement an algorithm called “Phase Vocoder” onto a hardware platform. This algorithm is used to time compress or expand audio or speech. Therefore a Rapid Prototyping Workflow was used. The whole range of developing a product is covered. This includes choosing suitable hardware to implement the “Phase Vocoder”. Furthermore the software Matlab/Simulink was evaluated and chosen because the tool allows Rapid Prototyping. The engineered workflow enables to develop a program in an abstract level and build an executable program with one click. The “Phase Vocoder” algorithm itself was evaluated and compared to another time stretching method. It was then implemented onto the hardware platform, which shows the differences between simulation and an executable version for hardware. Marco Gloeckler ii Acknowledgement This thesis has benefited greatly from the support of many people, some of whom I would sincerely like to thank here. To begin with, I am really grateful for the help of my supervisor Mr. Jay Hoy. I also want to thank my second supervisor and German supervisor Mr. James MCWhinnie and Prof. Dr. D. Pross. Furthermore I want to thank the technicians of the Edinburgh Napier University who helped me to set up a computer where I can work with. Finally I want to thank my family and my friends who supported me and gave me the opportunity to write the thesis in Edinburgh. Affirmation Hereby I, Marco Gloeckler, affirm that I wrote the present thesis without any inadmissible help by a third party and without using any other means than indicated. Thoughts that were taken directly or indirectly from other sources are indicated as such. This thesis has not been presented to any other examination board in this or a similar form. I have written this dissertation at Edinburgh Napier University under the scientific supervision of Mr. Jay Hoy. Marco Gloeckler, Mat. 40050956, Edinburgh, Scotland, 30.03.2012 Marco Gloeckler iii Table of contents Abstract ..................................................................................................................... ii Acknowledgement ................................................................................................... iii Affirmation ............................................................................................................... iii Table of contents ..................................................................................................... iv 1 Introduction ........................................................................................................ 1 1.1 1.2 1.3 1.4 1.5 2 1.5.2 FPGA or DSP ...................................................................................... 5 1.5.3 Fixed-point or floating-point ................................................................. 7 1.5.3.1 Fixed-point representation of numbers ..................................... 7 1.5.3.2 Floating-point representation of numbers ................................. 8 1.5.3.3 Comparison .............................................................................. 8 Preparation ....................................................................................................... 10 2.1 3 Motivation ...................................................................................................... 1 Objectives ...................................................................................................... 2 Approach ....................................................................................................... 3 Rapid Prototyping Workflow .......................................................................... 4 General Information about DSP..................................................................... 5 1.5.1 What is DSP? ...................................................................................... 5 Software ...................................................................................................... 10 2.1.1 Introduction of Tools .......................................................................... 10 2.1.1.1 Code Composer Studio (CCS) ................................................ 10 2.1.1.2 MATLAB/ Simulink .................................................................. 11 2.1.1.3 LabVIEW ................................................................................. 12 2.2 2.3 Decision process of the suitable tool and board .......................................... 12 Hardware ..................................................................................................... 14 2.3.1 TMS320C6713 DSP Starter Kit ......................................................... 14 2.4 The Rapid Prototyping Workflow ................................................................. 17 Theory ............................................................................................................... 20 3.1 3.2 Time Domain Harmonic Scaling .................................................................. 20 Phase Vocoder ............................................................................................ 22 3.2.1 Overview ........................................................................................... 22 3.2.2 Detail ................................................................................................. 23 Marco Gloeckler iv 4 Algorithm and Simulation ............................................................................... 28 4.1 5 Result of Simulation..................................................................................... 33 Implementation on Hardware .......................................................................... 34 5.1 5.2 The different parts of the model ................................................................... 36 5.1.1 FindEndOfFile ................................................................................... 36 5.1.2 “Processing” subsystem .................................................................... 38 5.1.3 “Play” subsystem ............................................................................... 39 5.1.4 “Embedded Control Unit” ................................................................... 40 5.1.5 “LedFlash” subsystem ....................................................................... 43 5.1.6 Delay ................................................................................................. 44 Problems within the design .......................................................................... 44 6 Conclusion ....................................................................................................... 46 7 References........................................................................................................ 47 8 Table of figures ................................................................................................ 49 9 Appendix ........................................................................................................... 50 9.1 Configure MATLAB/Simulink and CCS 3.3 .................................................. 50 9.1.1 CCS ................................................................................................... 50 9.1.2 9.2 9.3 9.4 9.5 MATLAB (29)(30) .............................................................................. 52 Used software versions ............................................................................... 54 RTW file description .................................................................................... 55 CCS File Type (31) ...................................................................................... 57 Code of the “Embedded Control Unit” ......................................................... 58 Marco Gloeckler v Introduction 1 Introduction 1.1 Motivation Nowadays everything in the field of audio, video and picture processing, industrial control and communication systems is using digital signal processing (DSP). Therefore it is important for students and engineers in this field to know the basics and how to work with it. In the past digital signal processing was described as very complex and mathematical. Today, DSP can also be described on an abstract level like block diagrams or state flows. Developing algorithms in the field of DSP Rapid Prototyping is often used nowadays. The goal is to quickly get from a simulation to a prototype. This type of development allows transferring the developed DSP algorithm from the high level, like state flows onto hardware for testing. This process enables to prevent costly production errors. This procedure should be examined and documented for later developments. As an example to apply a Rapid Prototyping process in the field of audio or speech there is an algorithm called “Phase Vocoder”, speeding up or slowing down audio/speech. This is useful because studies have shown that people read/hear and understand faster than talk. It is important however that the pitch itself is not changed (1). This knowledge can be used to play language files faster for example for an answering machine or to study from audio CDs. Changing the speed of language is also used for other applications, like speechrecognition or to convert a 32 seconds radio advertisement in the available frame of 30 seconds. Slowing down speech can also be useful to generate effects in movies. Marco Gloeckler 1 Introduction The range of use in not limited to speech, DJs and producers use this technique to generate special effects, or to bring two different sound tracks to the same speed to unite them. To sum up, there is Rapid Prototyping which includes hardware and software. The other field of interest is audio/speech time stretching which needs a mathematical algorithm. 1.2 Objectives The goal in this project was to develop or use an algorithm to slow down or speed up speech and implement it on a board with a Digital Signal Processor (DSP). The idea of the algorithm should be based on the “Phase Vocoder” method. An important issue is the timbre of the voice, which should sound as natural as possible after modification. A Rapid Prototyping Process should be used to allow fast changes and a good readable program. The project starts from scratch so the whole development environment had to be set up. Therefore examinations of different hardware platforms and also suitable software had to be considered, evaluated and finally organised. If suitable hardware and software was found a workflow would have to be tested with some simple examples. The theory of speech shifting methods had to be analyzed. The goal was to use the “Phase Vocoder” method but other possibilities had to be read and understood, too. At the end of the project a running version should be on a hardware board and be ready for a demonstration. The algorithm on board should be modifiable by switches on hardware or by computer software. Marco Gloeckler 2 Introduction 1.3 Approach As there was no former project or development environment available for this kind of task, there were a lot of different aspects to consider. After a first overview what this project includes 3 main topics can be defined. 1. Gathering information in fields like DSP vs. FPGA, fixed-point vs. floatingpoint, processing power and theory of “Phase Vocoder” 2. A software/hardware combination which allows a high-level approach had to be organised/bought 3. A “Phase Vocoder” algorithm had to be found or developed and adapted to the hardware needs First research about the “Phase Vocoder” theory had to be done to get ideas that had to be considered. As it was also part of the project that the whole development environment had to be set up other aspects like hardware or software tools had to be considered as well. So really basic topics like DSP vs. FPGA and fixed-point vs. floating-point had to be analyzed. As there was the possibility that the University would not have suitable hardware, not only processing power and architecture played a key role but also budget and possible ways of ordering. But even if a suitable hardware was found it won’t mean that this is the solution because the objective to use a Rapid Prototyping Workflow needs a software/hardware combination. This leads to main topic 2 where suitable hardware had to be correlated with software. In this field there were not just technical aspects in demand but also available licences and costs. The result of one and two had to be a hardware/software combination permitting a Rapid Prototyping Workflow allowing to program hardware very easily and fast. Besides it had to be suitable for an algorithm like the “Phase Vocoder”. The last step would be the implementation of the “Phase Vocoder”. Therefore thoughts had to be given about peripheries like microphone and speakers and how the user can interact with the program. Marco Gloeckler 3 Introduction 1.4 Rapid Prototyping Workflow Rapid Prototyping is a method of developing, which is getting more and more common. With this kind of developing, it is relative “easy and fast“ to implement a piece of software onto a hardware platform. A general workflow of software development for a hardware platform is shown in Figure 1. Algorithm development and design Software coding Hardware implementation Figure 1: Rapid Prototyping process Those are the key stages which have to be considered and they will need some iteration till the final product can be released. There are tools available helping engineers to achieve the development of products as quickly and cost effective as possible. In the first step the tools allow developing algorithms in a high-level language (HLL). This means that after designing in state flows or function blocks the tools translate these into code, often C or Ada. This translation can often be specified for different hardware platforms, so the code will be more efficient and flexible. With the finished coding the code has to be downloaded onto the hardware. Sometimes it is done in C or for even faster applications in Assembler. Marco Gloeckler 4 Introduction Testing and verification can take place and if errors occur the whole workflow has to be repeated. But because tools do most of it automatically errors can be fixed quickly, compared to chips which must be produced and tested. It makes the developing process much cheaper and faster. Also a change of hardware platform can be done easily as the adaption can be done in the tool which generates the HLL (2)(3)(4). 1.5 General Information about DSP 1.5.1 What is DSP? The term DSP stands for digital signal processing but also for digital signal processor. Digital signal processing is a field of communications engineering and is engaged in the processing of digital signals with the help of digital systems. In practice almost all of the recording, transmission and storage methods for images, movies and sound is based on digital processing of the corresponding signals. The main advantage in DSP systems is that very complex algorithms and filters can be implemented, even in real-time. The hardware platforms used for signal processing are mostly digital signal processors or Field Programmable Gate Arrays (FPGA). Such chips are optimised for digital signal processing, which means that they can do complex calculations extremely fast. 1.5.2 FPGA or DSP The DSP is a specialised microprocessor, mostly programmed in C or for better performance in Assembler. It is optimised to do complex maths-intensive tasks and is limited by the useful operations it can do per clock and the clock itself. Marco Gloeckler 5 Introduction DSP have operations specialised for the fast signal processing called MAC (Multiply, Add, and Accumulate). This operation can be performed in one clock in a DSP whereas an ordinary processor would need 70 clocks (5). An FPGA, however, is an integrated circuit (IC) of digital technology, in which a logic circuit can be programmed. Due to the specific configuration of internal structures (gates) a variety of circuits can be created. Starting with low complexity circuits, such as multipliers, registers and adders, to highly complex circuits such as microprocessors. FPGAs are used in all areas of digital technology, but above all, where quick response and flexible modification of the circuit are required. The performance is limited by the number of gates in the chip and the clock rate. Because of two completely different approaches to build the chip both have advantages and disadvantages. If the sampling rate exceeds a few MHz, it is difficult for a DSP to process data without loss. This is due to the access to shared resources like memory or busses. An FPGA, however, with its fixed internal structure allows high data rates of I/O. A DSP is designed so that its entities can be reused. Thus, the multipliers used for the FFT can be used for filters or other things afterwards. In a FPGA the reusing is hard to achieve and is normally not used. Therefore, a DSP is capable of working with huge and different programs. It can perform a big context switch by loading another part of the program. The FPGA has to have a routine to reconfigure the FPGA which can take a long time, but it is necessary for huge programs because they can’t fit on one FPGA because of its limited gates. One major factor in the industry is also the costs, a DSP is cheaper than their counterparts in FPGA logic. Technology benefits due to the diversity of advantages. Thus, complex programs should be split between a DSP and FPGA. Marco Gloeckler 6 Introduction To summarize a DSP should be used when the sampling rate is low and the complexity of the program is high, but other factors like available tools and background of the engineer are important and must be considered in every project (6)(7). For the project of the "Phase Vocoder" both hardware platforms are viable because the complexity is not a problem for current DSP or FPGA (2)(3)(4). 1.5.3 Fixed-point or floating-point Because of finite word length in digital systems there are two ways to represent numbers, fixed-point and floating-point. Both have advantages and disadvantages. A brief discussion will be provided, further information can be found in (4). 1.5.3.1 Fixed-point representation of numbers Fixed-point is a generalization of the decimal format. The most significant bit (MSB) is used for the sign, the remaining binary digits for the numeric representation. Like decimal numbers the fixed-point numbers have the integer part left from the radix point and the fractional digits to the right. Due to the fixed position of the radix point less calculations are required than with floating-point numbers. Furthermore the conversion and correction necessary for multiplications and divisions can be replaced by fast shift operations. So it takes less processing power to calculate and the calculation can be done easier. The main problem with this number representation is rounding errors and overflows. So it is possible that in a multiplication the range of numbers is insufficient, and a huge number will become negative because it runs out of the range - arithmetic overflow. Therefore, the developer has to take care of this and scale the numbers in the development which is time consuming and fault-prone. To minimize the rounding errors today's processors with 32 or even 64 bits normally have double the amount of bits for intermediate values within the accumulators (3). Marco Gloeckler 7 Introduction Fixed-point operations simplify numerical operations, they save space, but require a compromise between accuracy and dynamics (4). 1.5.3.2 Floating-point representation of numbers When performing calculations using floating point numbers each and every intermediate result is scaled individually. This scaling of each intermediate result requires additional computational effort and makes it more complex. Floating-point numbers can be represented with arbitrary bases, generally with b as base, m as mantissa and s as sign. Computing systems use b = 2, the dual system and also a normalization of the mantissa. This normalization is important and limits the range of the mantissa to . Because the mantissa always starts with a 1 it is never written (hidden bit). Thus, one bit is saved (8). Common 32-bit processors (e.g. TMSc67xx) have 23 bit for mantissa, 1 bit for sign and 8 bit for the exponent. But with the one bit saved for the mantissa there are 24 bit for resolution (3). But there are also other world lengths like 64 bit called “double” or 128 bit “quad” on hand. This is standardized from IEEE and is called IEEE Std. 754-2008 (8). 1.5.3.3 Comparison The floating-point processors of today give a high dynamic range and a good resolution. Thus, in most of the cases the limitation of the range and the accuracy can be ignored, which makes the development easier. This is in contrast to fixed-point designs, where the engineer has to implement scaling factors to protect against arithmetic overflow. This is very time consuming and therefore it sometimes can be justified to use a floating-point processor. Especially where development costs are high and production volumes are low. Marco Gloeckler 8 Introduction To sum up, the advantages of fixed-point are hardware is cheaper and sometimes faster, but the floating-point processors are more flexible, easier to handle and numerically more precise. Therefore often mix of both platforms is used to combine both advantages (3). For the “Phase Vocoder” both representations would be suitable but other aspects had to be taken in account, as described in section 2.2. Marco Gloeckler 9 Preparation 2 Preparation Prior the development of the algorithm being started a suitable development environment had to be found. Therefore software and compatible hardware had to be chosen. As known from the introduction chapter the kind of processors plays a minor role. So it's not important whether floating-point or fixed-point numbers or whether FPGA or DSP is used. Although a floating-point processor is preferred because the development needs fewer thoughts about the data types and normalization. Important factors were the availability, costs and sufficient performance for the required algorithm. However the major factor was the compatibility of hardware and software, which was a difficult part. In Edinburgh Napier University the TMS320C6711 DSP Starter Kit (DSK) from Texas Instruments was available. Therefore the board and the software required to use it were evaluated first. 2.1 Software If working with any processor of Texas Instruments the Code Composer Studio (CCS) is needed. So compatible software for CCS had to be found and evaluated. An overview of possible tools for a Rapid Prototyping process with a Texas Instrument processor had to be worked out. The tools are introduced briefly in the following. 2.1.1 Introduction of Tools 2.1.1.1 Code Composer Studio (CCS) “Code Composer Studio is an integrated development environment (IDE) for Texas Instruments’ (TI) embedded processor families. CCS comprises a suite of tools used to develop and debug embedded applications. It includes compilers for each of TI's Marco Gloeckler 10 Preparation device families, source code editor, project build environment, debugger, profiler, simulators, real-time operating system and many other features.”(9) With this tool it is easy to program hardware on a low level. It enables developing and using C-code to program it on the DSK. Since it is very extensive and complex to write programs in C or C++ some programs get presented in the following to simplify the development. One tool could be Matlab/Simulink another one Labview, as they both generate the C-Code automatically. 2.1.1.2 MATLAB/ Simulink “MATLAB is a high-level language and interactive environment that enables you to perform computationally intensive tasks faster than with traditional programming languages such as C, C++, and Fortran.”(10) “Simulink is an environment for multidomain simulation and Model-Based Design for dynamic and embedded systems. It provides an interactive graphical environment and a customizable set of block libraries that let you design, simulate, implement, and test a variety of time-varying systems, including communications, controls, signal processing, video processing, and image processing.”(10) The combination of Matlab and Simulink is very popular and very well documented. It has a lot of increments and can also be used with third-party products which are directly implemented in the software. With this tool it is possible to develop a program in an abstract level and write it directly onto hardware. It also supports third-party products such as the CCS but also hardware directly as some processors of Texas Instruments. Marco Gloeckler 11 Preparation 2.1.1.3 LabVIEW “LabVIEW is a graphical programming environment used to develop sophisticated measurement, test, and control systems using intuitive graphical icons and wires that resemble a flowchart. It offers unrivalled integration with thousands of hardware devices and provides hundreds of built-in libraries for advanced analysis and data visualization – all for creating virtual instrumentation.”(11) 2.2 Decision process of the suitable tool and board As discussed already, the C6711 DSK has been evaluated as a first step. Since it was clear that CCS is required in order to use the C6711, tools were sought to work with CCS. It was started with Matlab/Simulink, as this is the standard tool for developments of this kind. In addition, licenses existed in school and there was previous knowledge of Matlab. LabView has not been evaluated in detail because a solution was found using MATLAB. Besides the license of the tool with around 2500 GBP was too expensive for the project. Resulting from initial research with Matlab/Simulink and CCS there are ways to use a DSP board with Simulink. To make this possible, however, extensions for Matlab/Simulink are necessary. Thus, extensions such as Target Support library, Embedded Coder, Embedded Target for TI C6000, Real-Time Library (RTW), IDELink, Developer's Kit for Texas Instruments etc. are necessary. The extension names vary with the versions of MATLAB and are sometimes combined into suites. This makes it very difficult and time-consuming to get an overview of the enhancements really needed. Since these versions also must be compatible with CCS, it was difficult to find the appropriate version and organize it. The finally used software versions are listed in 9.2. The extensions are necessary to generate optimized C-code for the DSK. This allows implementing real-time programs on a DSP board. They also allow hardware support Marco Gloeckler 12 Preparation for various manufacturers of processors in Simulink. To work with the processors, special settings for code development must be applied but also special Simulink function blocks are required. These blocks are contained in “Embedded Coder” but the supported processors differ on the version of the “Embedded Coder”. These blocks are optimized and include functions such as multiplication, FFT, and filtering, as well as specially adapted blocks to tap of data from AD converter or to control the LEDs on the DSK. Unfortunately there were no function blocks for the C6711 DSK in the existing Matlab/Simulink version. In addition, the CCS available for the C6711 was 2.1 and does not support the Matlab extension "IDE-Link". This "IDE-Link" also called ”Link for Code Composer Studio” is important to link the two tools, CCS and Matlab, ensuring automatic code generation. This makes it possible to link from the abstract Simulink model on the DSK without further interaction, see 2.4. Without “IDE-Link” it is still possible to download the model on the board. However, this is connected with more effort because the generated C-code from Simulink must be loaded into CCS in a project with several other files. An explanation of the files types is attached in 9.4. But without the appropriate library in Simulink it is a really difficult to develop a model because there are no blocks, which allow to access data like audio stream or LEDs. To make those things possible the functions must be written by hand, which would be a huge expense and would have brought delay to the project. Thus, it made more sense to look for a new board. The other alternative would be a suitable Matlab and CCS version for C6711 but because of costs it was discarded. The DSK board, fully compatible with Matlab and CCS was 330 GBP. In contrast Matlab with the required extensions would be several thousand pounds. Therefore it was decided to buy the successor to the C6711 DSK, the C6713 DSK. This DSK is compatible with the existing Matlab license and with the USB support allows to use the board with “all” PCs. Another advantage for the C6713 DSK was Marco Gloeckler 13 Preparation that training was already done in Matlab and CCS. Thus, this knowledge could be used later in the project (12) (13). Furthermore the software/hardware combination can be used for other projects. With this high-performance C6713 DSK (further information in 2.3.1) it is possible to develop complex tasks like a DSL-modem. 2.3 Hardware As already mentioned the TMS320C6711 DSP Starter Kit couldn’t be used because of software incompatibility. Therefore this board is not described. 2.3.1 TMS320C6713 DSP Starter Kit “The TMS320C6713 DSP Starter Kit (DSK) developed jointly with Spectrum Digital is a low-cost development platform designed to speed the development of high precision applications based on TI´s TMS320C6000 floating point DSP generation. The kit uses USB communications for true plug-and-play functionality. Both experienced and novice designers can get started immediately with innovative product designs with the DSK´s full featured Code Composer Studio™ IDE and eXpressDSP™ Software which includes DSP/BIOS and Reference Frameworks.”(14) The TMS320C6713 DSP Starter Kit is the newer version of the TMS320C6711 DSP Starter Kit. This DSK with up to 1800 MIPS of processing power allows the developing of algorithm in fields like networking, communications, imaging and other applications. Important for the project was the support of USB and enough processing power (15)(16). Marco Gloeckler 14 Preparation Figure 2: Layout DSK C6713 (14) The features of the DSK: TMS320C6713 DSP operating at 225 MHz An AIC23 stereo codec with 8-96 kHz sample rates (8-32 Bit word length) 16 MB of synchronous DRAM 512 KB of non-volatile Flash memory (256 KB usable in default configuration) 4 user accessible LEDs and DIP switches Software board configuration through registers implemented in CPLD Configurable boot options Standard expansion connectors for daughter card use The CPU is working with very-long instruction words (VLIW) (256 bits wide). The DSP 6713 interfaces on-board peripherals through a 32-bit wide EMIF bus (External Memory Interface). The SDRAM, Flash and CPLD (Complex Programmable Logic Device) are all connected to this bus, see Figure 3. Marco Gloeckler 15 Preparation Third parties use this expansion of the EMIF bus for video support, memory extension, other sound codec, etc. Analogue audio signals are accessed via an on-board AIC23 codec and four 3.5-mm audio jacks (microphone input, line input, line output and headphone output). The analogue input can be microphone (fixed gain) or line(boost), the output line-out (fixed gain) or headphone (adjustable gain). The CPLD is a programmable logic device used to tie board components together and has a register-based interface to configure the DSK. The DSK has 4 LEDs and DIP switches to allow user to work interactive with the board. To use this interactive method the CPLD register gets read and written. Figure 3: Functional Block Diagram of the DSK C6713 (14) Code Composer Studio communicates with the DSK via the integrated JTAG emulator on-Board. They are connected with a USB interface. Programs can be downloaded to the board into the SDRAM or Flash. The advantage of the flash memory is that it will keep the program after a restart of the board. Marco Gloeckler 16 Preparation 2.4 The Rapid Prototyping Workflow The principal of the workflow shown in Figure 4 is a simplified representation. The algorithm developed in Simulink is saved in "model.mdl". To allow an efficient C-code generation special blocks of the Texas Instruments C6x library should be used. Then the “IDE-Link” and CCS transfer the code onto the target. To use this kind of workflow the environment has to be set up as described in the tutorial 9.1. Figure 4: Workflow Simulink (17) In truth there are a lot of steps and tools needed to make this workflow running. As you can see in Figure 5 there are different extensions for Simulink needed. Marco Gloeckler 17 Preparation Figure 5: Software pieces used in workflow First of all there are limitations for the development of the Simulink model, because of memory management (further described in chapter 5.2). Another difference is the approach, running the program on hardware rather than in simulation. Thus, it isn’t possible to halt and start the simulation as it is done in simulation. Therefore it is a different approach which needs to consider problems like different tasks or memory management (further information in RTW user guide within the Matlab help). Testing is different too, because there is no comfortable opportunity to see what happens when the software is downloaded to the board. Furthermore not all blocks of the Simulink libraries can be used because some are not supported for code generation. Some Matlab commands are not supported either, therefore it can be necessary to write some of the functions manually. If all limitations are considered and adhered to in developing the model and the setup of the workflow components as described in 9.1 the code generation can be done without further manual interaction. This is possible because the different pieces of software are perfectly chained together with their different tasks. The RTW will automatically generate Ansii C/C++ code from the Simulink-model. It also adds the I/O device (driver) as inline S-function to the code. Marco Gloeckler 18 Preparation The “Embedded Target for TI C6000” provides RTW with APIs (Advanced programming interface) which are needed to build code for the C6000 platform. The generated data types are listed and explained in 9.3. With the C-code available the “Link for Code Composer Studio“ invokes the CCS and builds the executable automatically. Therefore a project is generated with different data types and functions described in 9.4. The link also invokes the program and downloads it onto the target. So with one click all of this is done and the program can be tested on the hardware within the Code Composer Studio. This workflow can be easily changed to other targets by changing the driver as long as there are no essential differences in the memory management. Marco Gloeckler 19 Theory 3 Theory Definition of “Time Stretching” “Time Stretching” also known as Time Compression/Expansion, which means the slowdown or acceleration of an audio or speech signal without altering their pitch. Pitch shifting is to some extent the counterpart, i.e. the change in pitch, but without changing the tempo. There are a lot of different methods to change the tempo of a signal. But there are just two basic approaches the other methods are based on them. This document focuses on the “Phase Vocoder” method but the Time Domain Harmonic Scaling (TDHS) is briefly discussed, too. None of the algorithms is perfect and the functionality is highly dependent on the input signal, i.e., whether it is a voice, music or sine wave. The degree the signal should be accelerated or decelerated by plays an important role, too. In general stretch factors of 130% can be achieved for music signals, for individual instruments or speech signals even up to 200%. 3.1 Time Domain Harmonic Scaling The Time Domain Harmonic Scaling technique provides a signal to stretch or to compress in the time domain, and was developed in 1978 by Lawrence Rabiner and Ronald Schafer (18). To manipulate the signal, it is processed in short pieces of the original signal. These sections may only be a maximum of 40ms, otherwise the human sense of hearing would notice the manipulation of the signal as it has a resolvableness of 40 ms. For humans it is not possible to dissolve what happens during a period of 40ms (19). Marco Gloeckler 20 Theory Figure 6: Signal before and after modification (20) An arbitrary choice of the sections can have the effect of phase hit; therefore the signal must first be examined for its period. This information is determined by the Autocorrelation Function (ACF) and is used for the section length. If the input signal is periodical, it can be reduced by integer factors without altering the pitch. In natural signals (music, language) additional difficulties arise because there are not two completely identical sections. Thus, there are phase hits again and the triangulation has to be used to achieve better results. The triangulation is a method which avoids phase hits by multiplying a triangle function to every section. In other words, Section A is multiplied by a falling triangle function, and Section B with a rising, thus the effect of phase hits is avoided. To slow down a periodic signal the periodic section is just doubled. For natural signals the triangulation is used again. The quality of the output signal depends strongly on the determination of the section length. Signals which have a periodic pattern can be manipulated by this method very well. Sound elements of short duration, such as clicks, drums and percussion, are difficult to process because they have a pulse-like character and are not periodic. With a maximum of 40ms long blocks of sounds, the pulse-like character sounds twice in a row. This can be avoided if the maximum length of a section is shortened. As a result, the processed signal loses much of basses, which argues against short sections. Therefore the optimum cut-off has to be determined (20)(21)(22). Marco Gloeckler 21 Theory 3.2 Phase Vocoder 3.2.1 Overview The first invention of this method was developed 1966 by Flanagan and Golden (23). Since then there have been a lot of different extensions and improvements, which are strongly depending of the kind of signal. There is not one perfect algorithm for music, speech or simple sine waves. The advantages and disadvantages of the different approaches have to be evaluated for each implementation. The principal of the “Phase Vocoder” is shown in Figure 7. The signal is windowed in small parts of the signal and then transformed with the Fast Fourier Transformation (FFT). Those two steps together are called Short Time Fourier Transform (STFT). The next step is the important step to get a precise result. Basically the “spectral manipulation” produces a good estimate of the frequencies within one windowed signal. Afterwards the process is inverted with the Inverse Fast Fourier Transformation (IFFT), windowed and summed (24) (25)(26)(27)(27)(28). To change the tempo of the signal the overlap factor of the window segments is changed. Figure 7: “Phase Vocoder” overview (30) Marco Gloeckler 22 Theory 3.2.2 Detail The principal without the “spectral manipulation” can be found in literature under STFT and it has his limitation in the FFT. The resolution of the FFT is: 3.1 The window length is normally between 512 and 4096 samples. It could be assumed to take a long window to get a good resolution of the frequency. Unfortunately it isn’t that simple because with a long window it misses the changes of frequency due to the fact that the FFT assumes that everything within one frame happens at once. Therefore a trade off between resolution and accuracy of frequency change must be done. Assuming a medium window length of 2048 samples and sampling an audio signal with 44.1 KHz the resolution will therefore be 21.5 Hz. For some speech signals this might be acceptable but for audio with a piano for example the resolution is not good enough. If the fundamental of the piano note is at 80 Hz, there is a mistake of 25%. The piano however has just 6% between consecutive notes (27). To get a better frequency resolution without harming the time resolution too much the “Phase Vocoder” method gets used. This is achieved with the “spectral manipulation” which is using information in the signal the SFFT ignores. The first information used is the phase of two samples. Assuming there is a sinusoidal signal of 220 Hz. Shown in Figure 8 there is an angle respectively at time at time . The signal could have a frequency where within the angle change from change from to to directly. It could also and any number n, where n is one turn (2π) of the radian. Figure 8: Phase of 2 samples (29) Marco Gloeckler 23 Theory With this information an equation is defined to: 3.2 This equation is not solvable yet n is unknown. But there is a way to get a good estimation of the which will be described further. The “Phase Vocoder” analyses a peak in magnitude within two different frames. Then the closest is chosen. This principal is shown in Figure 9. The 220 Hz sinusoidal signal is the example signal again. It is windowed and transformed with FFT. After the FFT the signal is shown in magnitude and phase. As there is just a sinusoidal signal the magnitude spectrum in both frames is the same but the phases differ. With these different phases there are values found for and . The difference in time can be directly gathered by the window length, overlap factor and sampling rate, see equation 3.3. Figure 9: Spectral Manipulation” (26) 3.3 The OverlapFactor describes the samples which overlap from two consecutive windows. If OverlapFactor is 2 half of the samples of the first window will be used in the next window. Marco Gloeckler 24 Theory Another way to describe the overlap is named HopSize, which is the temporal shift of the window. Described in an example with WindowLength = 256 samples and HopSize = 64 samples, the windows overlaps with 256-64=192 samples. With the time information of equation 3.3 the equation 3.2 for fn can be solved. There will be not one result but many. Thus, the nearest value to the peak in magnitude received by the FFT is taken. To describe it within an example the 220 Hz sinusoidal signal with a sampling rate of 44.1 KHz and an overlap factor of 2 was chosen. Using just the FFT the result would be 215.3 Hz instead of the 220 Hz of the signal. With the “spectral manipulation” there will be a more accurate result as shown with values of the example. The phases corresponding to the magnitude peak are and (see Figure 7) and the time difference is resulting from equation 3.3. Inserting those values in equation 3.2 the results for the first 6 n are 47.7472 Hz, 90.8136 Hz, 133.8800 Hz, 176.946 Hz, 220.0129 Hz, and 263.0793 Hz. The closest frequency to the FFT result of 215.3 Hz is obviously 220.0129 Hz, which is just 0.0129 Hz away from the real value of the signal. This is vast improvement to the FFT. This is not just a coincidence because of well chosen values as Puckette and Brown (30) showed. Till now the explanation was restricted to a simple sinusoidal signal. If the signal is more complex and has more frequencies the algorithm stays the same with the difference that the operation is repeated for every magnitude peak in the spectrum. This is reasonable as long as the peaks in magnitude are adequately separated by the FFT. With this result of the “spectral manipulation”, where a good estimate of the actual frequency is available it is possible to do different changes to the signal like reading Marco Gloeckler 25 Theory direction inversion, frame shuffling, change the pitch or like in this project timestretching. In the synthesis part (illustrated in Figure 7) IFFT is used to transform the changed spectrum back to pieces of the time signal and with the window function it is added to one time signal. To change the tempo of the signal the OverlapFactor or HopSize of the window gets changed, which is obviously making the resulting output file longer or shorter. If the file is played with the same sampling rate as the input file the speed is changed. The algorithm described till now is the simplest one to understand and was chosen therefore. In literature it is referred as “spectral peak following“. However the used algorithm in Matlab/Simulink in chapter 4 works slightly different, the theory is explained in the following. Another implementation The implementation is basically the same, the difference is that not just the angles of the magnitude peak are considered but every angle. This means phases are not chosen corresponding to a peak but to a bin. A bin is an amplitude/phase pair of data for each channel/band. A channel or band is used within the FFT. So for example a window length of 512 has 256 channels. This is because of the double sideband of the FFT. To sum it up if windowing and transforming 512 samples there will be 256 bins. Those bins will be used for the phase estimation. This algorithm is calculating the angle for every bin and compares it with the angle of the same bin from one frame before. So instead of searching maxima in the magnitude and compare the corresponding phases the algorithm checks every bin. Marco Gloeckler 26 Theory This algorithm has another challenge not mentioned so far and is called “phase unwrapping”. The phases after the FFT are modulo 2π. In the “spectral peak following” method the n of equation 3.2 could be guessed with the knowledge of the closest FFT result. In this implementation however the phase gets unwrapped which means that 360 degree gets added if there is more than one cycle as Figure 10 illustrates. Figure 10: Phase unwrapping (25) The unwrapping recovers the precise phase values for each bin and is therefore an important part to get a god result. Except for the guessing of the phase/frequency the algorithm stays the same. This method is implemented in the used algorithm described in chapter 4. Marco Gloeckler 27 Algorithm and Simulation 4 Algorithm and Simulation This part will explain how the algorithm in Matlab/Simulink implements the theory of chapter 3. The algorithm has been developed by Mathworks and is available as an example within Simulink. To open the example type dsppitchtime in the Matlab command line. The algorithm takes three parameters to define it. WindowLength, AnalysisHopSize and SynthesisesHopSize. The last 2 parameters are similar to the former described OverlapFactor. The real meaning will be clear later. WindowLength must be a number of where x is a positive integer because FFT allows just these numbers. Furthermore the HopSizes must be smaller than the window length. The Top-level shown in Figure 11 of the algorithm is similar to Figure 7. Figure 11: "Phase Vocoder" Simulink The “Overlap ST-FFT” is responsible for the windowing and transformation in frequency domain. After splitting the transformed values in magnitude and phase the “Synthesis Phase Calculation” does the spectral manipulation and returns a better phase estimate. After combining angle and magnitude together the “Overlap ISTFFT” returns the windowing and the IFFT changes the signal back to a time domain. The last block is just a multiplication and is responsible to rescale the values to the input range. To describe the algorithm following values were chosen: WindowLength=512, AnalysisHopSize=64 and SynthesisesHopSize=90. Marco Gloeckler 28 Algorithm and Simulation The following will take a closer look at the subsystems of the algorithm. Starting with the “Overlap ST-FFT” in Figure 12. Figure 12: “Overlap ST-FFT” detail This subsystem changes the time signal in a frequency domain with FFT and the “hanning window” function. It also adds an overlap. Therefore the “Overlap buffer” is used. The numbers at the signal paths describe the dimensions of the signal. This means that the input is a frame with 64 samples and at the output there are 512 samples. The 512 is because of the WindowLength. The overlap of the frames in samples is WindowLength- AnalysisHopSize=448. There are other windows like “hamming window” which can be used. Further information to the “hanning window” and why this is a good window function can be read in (31). After splitting the signal into magnitude and phase the phase manipulation takes place (see Figure 13). The phases at the input are normalized between –π and π. Figure 13: "Synthesis Phase Calculation" detail This is the complex part and needs some focus. The basic idea is to get a good frequency estimate by comparing the phases within each bin. Therefore the addition block takes the actual phases of the frame and subtracts the phases from the frame before, the result is Δφ. Marco Gloeckler 29 Algorithm and Simulation The expected phase because of the time difference between the bins is subtracted from Δφ. This is done with the constant frame generated with the function shown in Figure 13 number 3. Thus the nominal phase for each bin is subtracted from Δφ. To illustrate this there are the signals from the 4th bin of the signal shown in Figure 14. Figure 14: Signal 1,2,3,4 To apply phase unwrapping the subsystem "Principal Argument” is used. This block was developed by Carlo Drioli and is shown in Figure 15. Figure 15: "Principal Argument" detail This block computes the principal argument of the nominal initial phase of each frame. After this subsystem the expected phase value for the bin gets added because it was subtracted before. This happens again with the constant block shown in number 3 in Figure 13. Marco Gloeckler 30 Algorithm and Simulation This is shown for the 4th bin in Figure 16. There can be seen that between signal 5 and 6 is just a small difference in the y-scale because of the added nominal phase from number 3. This difference is getting larger from bin to bin. Figure 16: Signal 4,5,6 As now the real phase increment is available it is rescaled with the . This rescaling is needed because if changing the time scale the phase changes occur in a longer time. In other words if there is a 45° change in consecutive bins and the time scale gets changed it would result in altering the frequency. This happens because the IFFT spreads the bins further apart and changes the frequency as it now occurs over a longer time interval. To prevent this rescaling is used with the time stretching factor. Marco Gloeckler 31 Algorithm and Simulation After that the values get accumulated frame by frame. This is shown in Figure 17. Figure 17: Signal 7,8 As shown the phase increment from the actual bin gets added to the phase increment of the last phase. So there is a continuous slope of phase. Now the optimized phase is available it gets combined with the magnitude again. The signal gets transformed back into time domain and multiplied with the window function as illustrated in Figure 18. Figure 18: “Overlap IST-FFT” detail In the last step in the subsystem the overlap gets added with the “OverlapAndAdd” block. The output is now 90 samples per frame defined by the SynthesisesHopSize. So the time scale got changed with the factor of Marco Gloeckler . 32 Algorithm and Simulation 4.1 Result of Simulation With the parameters of the “Phase Vocoder” there can be different speeds of voice/audio be achieved. The maximum stretching factor without a audible loss achieved was 2. The following values were chosen: WindowLength=512, AnalysisHopSize=32 and SynthesisesHopSize=64. The input signal speech signal was 3 seconds long and is shown in Figure 19. Figure 19: Input signal The output signal is shown in Figure 20 with the stretching factor of 2 and is therefore 6 seconds long. Figure 20: Output signal The scope of both signals shows that they are not exactly the same but when hearing them there is now recognisable loss. Marco Gloeckler 33 Implementation on Hardware 5 Implementation on Hardware The implementation on the hardware was tricky because the “Phase Vocoder” is not a real-time application. This is in the nature of the applied processing. Consider talking into a microphone and slowing the speech down with a factor of 2. So the algorithm would always just have processed half of the input. So after 1 minute of talking just 30 seconds could have been heard. The other values must be stored in the memory and would cause a buffer overflow if talking for a long time. Considering time compression would be even worse because the algorithm had to process values which were not even spoken. After one minute of talking it should already had an output of 2 minutes, which is obviously not possible. Thus another implementation had to be chosen. The general idea was to implement a “Processing” and a “Play” block. So when the input signal is recorded it gets processed and saved into the memory. Afterwards the processed file in the memory gets played and the user can hear it. The input file was not a microphone signal but a sample voice signal which was loaded into Simulink as a variable. To use a microphone would just need another subsystem but is not a real change to the design. The top level design of the Simulink model is shown in Figure 21. Figure 21: Top-Level Simulink Marco Gloeckler 34 Implementation on Hardware At the left top is the “C6713DSK” block where parameters for the code generation are set. The other blocks are used for controlling the algorithm. As shown the dip switch is used as input for the “Embedded Control Unit” to work interactive. This block controls the other 3 blocks which are used for flashing LEDs to show the user what is happening, to start “Processing” the signal and to enable the “Play” block. To get a good design much time was spent in the Simulink help file to read about pros and cons of different subsystems. The result was the “enabled subsystem” because this block executes the subsystem as long as there is a “1” at the “enable” input. This was considered as a good solution because generating a “1” is easy and could be done with a lot of different blocks. It also allows working with different sample times within one system which was important as the control of the subsystems shouldn’t work with a high sample time. Using a high sample time in the control would use a lot of processing power and is unnecessary because the user won’t change the configuration a few thousand times per second. However the “Processing” and “Play” block must work with a sample time of Ts=1/8000 because the input file was sampled with this rate. Not wasting the processing power the control block works not with Ts but with Tdip with 100ms. This sample time is fast enough to control the “enabled subsystems”. Because of code generation there were limitations using Simulink blocks and Matlab commands. This had to be considered while designing the control and led to the final design. The management of variables was also difficult and is described in 5.2. Marco Gloeckler 35 Implementation on Hardware 5.1 The different parts of the model The following subchapters describe the subsystems of the developed model. 5.1.1 FindEndOfFile Working with a voice example it would be possible to use a fixed processing time for that file as the file length is known. To make the control more flexible and to make it possible to load every audio or voice signal it was necessary to find the end of the input file. To achieve this the elements of the subsystem “FindEndOfFile” shown in Figure 22 are used. Figure 22: "FindEndOfFile" subsystem The “Overlap Buffer1” changes the frame based signal into a sample based one. After that it is integrated over an amount of samples. 64 samples were chosen and tested with different examples with a satisfying result. The integrated values are then compared to nonzero. So if 64 values are not 0 there will be a 1 at the output (see Figure 23). The “Rate Transistion1” is needed that this subsystem works together with the slower working control unit. The “Data type Conversion” change the data type into double as the “Embedded Control Unit” needs a “double” as input. Marco Gloeckler 36 Implementation on Hardware Figure 23: "FindEndOfFile" signal As shown in Figure 23 the output signal is “1” as long as there is a input signal. This block is used as input for the “Embedded Control Unit” to help to enable and disable subsystems. Marco Gloeckler 37 Implementation on Hardware 5.1.2 “Processing” subsystem This subsystem is an “enabled subsystem” which can be seen at the “Enable” block at the top of the model. The “enabled subsystem” works as long as there is a logical “1” at the enable input port. This is managed by the “Embedded Control Unit”. Figure 24: "Processing" subsystem The “Input Signal” block reads a variable form the “Model Workspace” which stores the speech sample “Speech8KHz” and transmits it to the “Phase Vocoder”, which is doing the calculations as described in chapter 4. The signal gets rescaled to the normalized input of ±1 and gets written to an output variable y_pnt which is stored in the “Matlab Workspace”. Working with this workspace is not a good solution as this block is not working properly with the “Embedded Target for TI C6000 DSP” support package (further described in 5.2). The other path is the former described “FindEndOfFile” block used to find the end of the input variable and terminate the enable when processing of the file is done. Marco Gloeckler 38 Implementation on Hardware 5.1.3 “Play” subsystem This is also an “enabled subsystem” and the “FindEndOfFileDAC” has the same function as the “FindEndOfFile” block. Figure 25: "Play" subsystem As shown in Figure 25 the former calculated y_pnt signal gets read from the workspace and rescaled. This is necessary because the DAC block takes as input a 32 bit integer value. As the input file is normalized to ±1 it has to be rescaled to the whole scale. The DAC block outputs the signal to the “line out” port of the DSK where it can be heard with speakers. The “FindEndOfFileDAC” is needed to terminate the enable signal of its own subsystem after playing the file. Marco Gloeckler 39 Implementation on Hardware 5.1.4 “Embedded Control Unit” This block represents the control of the algorithm and is the tie of all components. The basic idea of this control is that with the input of the dip switches the other components in the program like “Processing” or “Play” are controlled. So the dip switches control the system and allows the user to work interactively with it. The Embedded Matlab block was chosen because it has the ability to take normal Matlab commands and is therefore flexible. The Matlab commands were not needed because the ones who would be useful couldn’t be used because of limitations of RTW (see 5.2). But this knowledge was achieved while developing. Another possibility would have been a “Stateflow Chart” but it wouldn’t be that flexible with using Matlab commands. The complete code is shown in 9.5. Some parts will be described here to give an understanding of the working principle. This block operates with Tdip. This means that every 100ms this block gets executed. So first the inputs and outputs are defined as shown in Figure 21. function [enable,enableplay,ledFlashOut] = fcn(processing,dip, playing) The explanation will focus on the “Processing” block as the “Play block is quite similar. So first there is the definition of some variables and allocation of them. enablevar=0; persistent enableFOld; persistent enableFNew; enableFNew=processing; As there is no explicit definition of a type the standard type is used which is double. Marco Gloeckler 40 Implementation on Hardware The first “if” detects a falling edge of the processing input which comes from the “FindEndOfFile” block. The second “if” checks if the dip switches represent an integer 1 and if the processed file is not at the end. As long as this is true the enablevar is 1 and keeps the processing alive. if ((enableFNew==0) && (enableFOld==1)) fileend = 1; startplay = 1; % enables the start of the "Play" block end if ((dip==1)&& (fileend==0)) enablevar=1; else enablevar=0; end This can also be seen in the former shown Figure 23. Because after the signal changes from 1 to 0 the subsystem stops processing which displays no values after 5 seconds. As last step the enable output hast to be written to enablevar. enable=enablevar; This must be done when working with “if”s and direct output variables or the compiler states errors. The output variable , in this case “enable”, can’t be defined within “if/else” statements, therefore a new variable “enablevar” is used within the “if/else” and its final state is assigned to the output “enable”. The other functions of the Embedded Control Unit besides enabling the processing are 1. Enabling the “Play” block 2. Control the flashing of LEDs 3. Reset the control Integer number Function of dip switches 0 Reset the variables used for “Processing” and “Play” enables 1 Start the processing of the file with adjacent playing of the file Marco Gloeckler 41 Implementation on Hardware Another useful implementation would be the change of the “Phase Vocoder” parameters. So it would be possible to change the speed of the example in different ways. Unfortunately this isn’t easy to develop as further described in chapter 5.2. The actual implemented way the system works is shown in Figure 26. The first signal is the enable signal at the “Processing” block, the second one is the enable of the “Play” block. Figure 26: Enable signals So if the dip switch is “1” the processing starts. As soon as this is finished the “Play” block gets enabled and the file can be heard directly. To restart the system, switches has to be a “0” for resetting the variables and set to “1” again. Marco Gloeckler 42 Implementation on Hardware 5.1.5 “LedFlash” subsystem As it is not possible for the user to see what happens within the board the LEDs are used to show at least some information. So they are generally used to see which dip switches are used. The 4 LEDs represent an integer number between 0-15; the dip switches do the same. There are 2 input ports for this subsystem as shown in Figure 27. Figure 27: “LedFlash” subsystem The “Enable” port is used to make the LEDs flash with the time of Tdip. If there is no “Enable” the switch is connecting the “Dip” port to LED so the user can see and control the choice of the dip switches. Figure 28: "LedFlash" signals The first signal in Figure 28 is the “Enable”. The second one is the “Dip” and the last is the signal at the LED block. So as shown if there is no “Enable” the LEDs represent the number of the dip switches in this case “1”. Marco Gloeckler 43 Implementation on Hardware 5.1.6 Delay By now all the blocks were described except for the memory blocks in the top level. Those blocks are necessary for Simulink to solve the model. If not using these blocks Matlab states an error that there is an algebraic loop which can’t be solved. This leads to a small delay when finding the end of the file. So the “Embedded Control Unit” receives the end of the signal with a delay of Tdip. However this is not a problem because even if the end of the file wouldn’t be recognised for some seconds it would just be zeros. When playing it there would be a longer time of zeros but it wouldn’t be recognised as it isn’t audible. 5.2 Problems within the design As mentioned before there were limitations when designing for hardware. One issue was the memory/variables management. This affected two parts one was not implemented therefore the other one is not working properly. Starting with the one which isn’t working is a real issue and affects the system. This is the reason the system is not working as intended. The block “Signal To Workspace” and “Signal From Workspace” which are used to write and read the variable with the speech file can just be used under special circumstances which don’t apply for this system. This is because of the RTW. The reason is that normally those blocks write the variable at the end of the simulation. As there is no end because it is always running on the hardware those values never get written. So when using the model at the moment it reads the stored variables from the loaded configuration instead of the processed ones. However there are solutions but there was no time to implement it. If typing “rtwdemo_cscpredef” in the Matlab command line there is an example how it can be done. What basically should be done is implement a new storage class. Within this class there has to be a variable to take the file. The signal must be named like the variable Marco Gloeckler 44 Implementation on Hardware so it gets written directly into the variable. As this is in real time the values can be read from there then. The other not implemented feature is changing the “Phase Vocoder” parameters at runtime. The reason has also something to do with the storage of data. In the simulation something like this is done mostly manual. This means that before starting the simulation the variables are read from a m-file or from a mask. In this case a mask is used. The mask opens when clicking onto the “Phase Vocoder” block. There are fields were a values can be typed in and assigned to a variable name defined in the mask. To use masked parameters has the benefits of receiving or writing the values within a simulation. With commands like “getParameter” or “setParameter” these values can be received and changed while the simulation is running. Unfortunately this is not possible when developing for hardware. The solution would be similar to the one before just using different data types. Another aspect not mentioned so far is “Tunable Parameters”. These values are the ones which can be changed within runtime. As some parameters in the “Phase Vocoder” are not tunable because Simulink states that there are internal errors this needs some further research. From today’s point of view, however, there shouldn’t be a problem as the “Phase Vocoder” is controlled by the “Embedded Control Unit” and those values would just be changed if processing is not running because the subsystem would be disabled. Therefore the errors should be switched of in the RTW configuration. Marco Gloeckler 45 Conclusion 6 Conclusion The project covered different fields of development. It was possible to set up a new developing environment from scratch. Thanks to this project upcoming projects can use the implemented workflow and organized hardware to develop with a Rapid Prototyping approach. Thus, there is no need to spend a lot of time investigating for hardware needs and tool workflows as with one click the Simulink model can be downloaded onto the hardware. Furthermore the hardware platform is powerful and allows complex algorithms to perform. So further projects can develop “Pitch shifting” in real time or a DSL-modem. With the daughter card expansion for the DSK it is even possible to develop in fields like video/image processing. Besides the implemented development environment a mathematically interesting algorithm was implemented. Unfortunately the project run out of time as a lot of time was spent in setting up the Rapid Prototyping Workflow and finding of suitable hardware. Therefore the problem with the memory/variable management couldn’t be solved within this project, however there is a suitable approach found which would solve it. The implemented system works in simulation perfectly and the results of the “Phase Vocoder” up to a scaling factor of 2 could be implemented without audible loss. This is similar to other people’s research results (24). The implementation on hardware was not finished because of time issues but a lot of challenges could be solved and just one topic couldn’t be fully covered. Very interesting in this project was the fact that there was nothing to build on. This meant that so many things had to be considered and evaluated. There were not just technical aspects but also the financial side of the project had to be considered. So this project gives an insight in the whole developing process from hardware and software to simulation, implementation on hardware and finally testing. Marco Gloeckler 46 <References 7 References 1. Arons, Barry. SpeechSkimmer: A System for Interactively Skimming Recorded Speech. s.l. : ACM Transactions on Computer-Human Interaction, 1997. 2. Bateman, Andy and Paterson-Stephens, Iain. The DSP Handbook: Algorithms, Applications and Design Techniques. s.l. : Prentice Hall, 2002. 978-0201398519. 3. Kuo, Sen M., Lee, Bob H. and Tian, Wenshun. Real-Time Digital Signal Processing: Implementations and Applications. s.l. : Wiley, 2003. 978-0470014950. 4. Proakis, John G. and Manolakis, Dimitris K. Digital Signal Processing (4th Edition). s.l. : Prentice Hall, 2006. 978-0131873742. 5. Akhan, Mehmet and Larson, Keith. DSP Intro Slides. s.l. : University of Herdfortshire; Texas Instruments, 1998. 6. Hunt Engineering. [Online] 2011 06 09. [Cited: 22 12 2011.] http://www.hunteng.co.uk/info/fpga-or-dsp.htm. 7. Poole, Ian. FPGAs for DSP Hardware. Radio-electronics.com. [Online] [Cited: 11 1 2012.] http://www.radio-electronics.com/info/rf-technology-design/digital-signal- processing/fpga-dsp.php. 8. IEEE. IEEE Standard for Floating-Point Arithmetic Std. 754-2008 . 2008. 978-07381-5753-5 . 9. Texas Instruments. [Online] [Cited: 18 11 2011.] http://www.ti.com/tool/ccstudio. 10. MathWorks. [Online] [Cited: 18 11 2011.] http://www.mathworks.co.uk. 11. Instruments, National. National Instruments. [Online] [Cited: 21 11 2011.] http://www.ni.com/labview/whatis/. 12. MathWorks, Inc. Matlab R2009b Producthelp. 13. Mathworks, Inc. Embedded Target for TI C6000 DSP Release Notes. 14. Texas Instruments Inc. TMS320C6713 DSP Starter Kit. Product Information. 15. Inc., Spectrum Digital. TMS320C6713 DSK Module Technical Reference. 2003. 16. Texas Instruments Inc. Datasheet - TMS320c6711. 17. MathWorks. Developing Embedded Targets using Real-Time Workshop Embedded Coder. 2010. 18. Rabiner, Lawrence. R. and Schafer, Ronald. W. Digital Processing of Speech Signals. New Jersey : Prentice-Hal,l Inc., 1978. 19. Ostrop, Dennis and Buhr, Daniel de. Time Domain Harmonic Scaling. Köln : FH Köln, 2007. Marco Gloeckler 47 <References 20. Bühler, Christian and Liechti, Christian. Veränderung der Wiedergabegeschwindigkeit von Musiksignalen. s.l. : Hochschule Rapperswill, 1999. 21. Brennan, David. Time Modification of Speech. Edinburgh : Napier University, 2007/08. Honours Thesis. 22. Adrian, Marti. Time Domain Harmonic Scaling. s.l. : HS Rapperswil, 2002. 23. Flanagan, J. L. and Golden, R. M. Phase Vocoder. s.l. : Bell System Technical Journal, 1966. 24. TheDSPDimension. [Online] 8 1999. [Cited: 18 11 2011.] http://www.dspdimension.com. 25. Dolson, Mark. The Phase Vocoder: A Tutorial. s.l. : Computer Music Journal, 1986. 26. Laroche, Jean and Dolson, Mark. New Phase Vocoder Technique for PitchShifting, Harmonizing and Other Exotic Effect. New York : IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 1999. 27. Sethares, William A. A Phase Vocoder in Matlab. [Online] [Cited: 2011 11 18.] http://sethares.engr.wisc.edu/vocoders/phasevocoder.html. 28. Portnoff, Michael R. Implementation of the Digital Phase Vocoder Using the Fast Fourier Transform. s.l. : IEEE Trans. Acoustics, Speech, and Signal Processing, 1976. 29. Sethares, William A. Rhythm and transforms. s.l. : Springer, 2007. 9781846286391. 30. Puckette, Miller S. and Brown, Judith C. Accuracy of Frequency Estimates Using the Phase Vocoder. s.l. : IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998. 31. Götzen, Amalia De, Bernardini, Nicola and Arfib, Daniel. Traditional (?) Implementations of a Phase Vocoder: The tricks of the trade. Verona : Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-00), 2000. 32. Thesis, S. Ganapathi’s M.Sc. Introduction to Simulink, Link for CCS. 2006. 33. The MathWorks, Inc. Target for TI C6000™. [Online] [Cited: 12 12 2011.] http://www.kxcad.net/cae_MATLAB/toolbox/tic6000/f3-108524.html. 34. Murmu, Manas. Application of Digital Signal Processing on TMS320C6713 DSK. Department of Electronics and Communication Engineering, National Institute Of Technology, Rourkela. 2008. Bachelor Thesis. Marco Gloeckler 48 Table of figures 8 Table of figures Figure 1: Rapid Prototyping process .......................................................................... 4 Figure 2: Layout DSK C6713 (14) ............................................................................ 15 Figure 3: Functional Block Diagram of the DSK C6713 (14)..................................... 16 Figure 4: Workflow Simulink (17) .............................................................................. 17 Figure 5: Software pieces used in workflow.............................................................. 18 Figure 6: Signal before and after modification(20) .................................................... 21 Figure 7: “Phase Vocoder” overview (30) ................................................................. 22 Figure 8: Phase of 2 samples (29) ............................................................................ 23 Figure 9: Spectral Manipulation” (26) ....................................................................... 24 Figure 10: Phase unwrapping (25) ........................................................................... 27 Figure 11: "Phase Vocoder" Simulink ....................................................................... 28 Figure 12: “Overlap ST-FFT” detail ........................................................................... 29 Figure 13: "Synthesis Phase Calculation" detail ....................................................... 29 Figure 14: Signal 1,2,3,4........................................................................................... 30 Figure 15: "Principal Argument" detail ...................................................................... 30 Figure 16: Signal 4,5,6.............................................................................................. 31 Figure 17: Signal 7,8 ................................................................................................ 32 Figure 18: “Overlap IST-FFT” detail .......................................................................... 32 Figure 19: Input signal .............................................................................................. 33 Figure 20: Output signal ........................................................................................... 33 Figure 21: Top-Level Simulink .................................................................................. 34 Figure 22: "FindEndOfFile" subsystem ..................................................................... 36 Figure 23: "FindEndOfFile" signal ............................................................................. 37 Figure 24: "Processing" subsystem .......................................................................... 38 Figure 25: "Play" subsystem ..................................................................................... 39 Figure 26: Enable signals ......................................................................................... 42 Figure 27: “LedFlash” subsystem ............................................................................. 43 Figure 28: "LedFlash" signals ................................................................................... 43 Marco Gloeckler 49 Appendix 9 Appendix 9.1 Configure MATLAB/Simulink and CCS 3.3 9.1.1 CCS This tutorial will describe how to set the C6713 DSK within the CCS 3.3 up. After the installation the package with the specific data like drivers, examples etc. has to be downloaded from the spectrum digital homepage and copied to the installation path. Now we can setup the board to the CCS environment. 1. Launch the “Setup CCStudio v3.3” from the start menu 2. Choose the “C6713 DSK” board 3. In the properties(right-click) of “C6713 DSK” choose the “Diagnostic Utility” file in the installation path Hint: If later problems occur like “can’t generate data file” you should choose it manually in these settings 4. As a sub device choose the “TMS320C671x_0” processor 5. In the properties(right-click) of “TMS320C671x_0” choose the GEL file in the installation path 6. Now you can save it and start the CCS 3.3 software. It should now initialise the DSK while starting. Sometimes there are problems with the emulation of the USB. To fix it take the “C6713 DSK Diagnostic Utility” and unplug the board. Plug it in again and it should work. Marco Gloeckler 50 Appendix Also troubles could occur because of a wrong linking. So if you have built a project (or MATLAB do it automatically) a wrong link could be registered and it will not compile. To change this setting you right-click on your *.pjt and click on “Build Options”. You will see something similar to the following figure. There you have to change the “include search path” to the installation path with your DSK specific files. The environment is now ready to work with. There are some nice tutorials for the first steps like (32) or the help file of CCS which further provides useful information about the whole program. To get CCS 3.3 and MATLAB connected you have to choose “connect” from the “debug” menu. Now MATLAB can be configured to use CCS. Marco Gloeckler 51 Appendix 9.1.2 MATLAB (32)(33) To verify that CCS is properly installed on the system, enter ccsboardinfo at the Matlab command line. Matlab should return information similar to the following listing: To ensure Embedded Target for TI C6000 DSP is installed, enter c6000lib Matlab should display the C6000 block library containing libraries e.g. C6000 DSP Core Support, C62x DSP Library, C64x DSP Library and most important the C6713 DSK support library. As we now know that our hardware is addressable and the Simulink libraries are available we can set up the AMTLAB/Simulink environment. 1. to start Simulink type simulink 2. Create a new model in Simulink. 3. To open the Configuration Parameters, select Simulation ->Configuration ->Parameters 4. In the Select tree, chose the Real-Time Workshop category 5. For Target Selection, choose the file “ccslink_ert.tlc” in the Real-Time Workshop. With this will automatically change the Make command and Template makefile selections 6. Choose the Optimization category in the Select tree. For Simulation and Code generation, unselect Block reduction and Implement logic signals. 7. Choose from the select tree the Hardware identification and choose the board and little endian Marco Gloeckler 52 Appendix 8. Choose the TI C6000 compiler. Set Symbolic debugging 9. In the Select tree, choose the Debug category. Select Verbose build 10. In the Select tree, choose the Solver category. Ensure that Solver is set to Fixed type / discrete 11. Set the following Real-Time Workshop run-time options: - Build action: Build_and_execute -Interrupt overrun notification method: Print_message In the model itself you need to add the targetc6713 preferences block. This block represents your driver and will be included when generating c-code. The default parameters should be fine in most programs. However if you want to change memory settings you can do it there. Marco Gloeckler 53 Appendix 9.2 Used software versions The software used is listed in the following table. Module title Version Matlab 7.9 Simulink 7.4 Embedded IDE Link 4.0 Real-Time Windows Target 3.4 Real-Time Workshop 7.4 Real-Time Workshop Embedded Coder 5.4 Signal Processing Blockset 6.10 Signal Processing Toolbox 6.12 Target Support Package 4.0 Code Composer Studio 3.3.81.6 CCS 3.3 driver package CCSPlatinum_v30330 Marco Gloeckler 54 Appendix 9.3 RTW file description Table 1: RTW-files description (12) File model.c Description or .cpp Contains entry points for code implementing the model algorithm (for example, model_step, model_initialize, and model_terminate). model_private.h Contains local macros and local data that are required by the model and subsystems. This file is included by the generated source files in the model. You do not need to include model_private.h when interfacing hand-written code to a model. model.h Declares model data structures and a public interface to the model entry points and data structures. Also provides an interface to the real-time model data structure (model_M) with accessor macros. model.h is included by subsystem .c or .cpp files in the model. If you are interfacing your hand-written code to generated code for one or more models, you should include model.h for each model to which you want to interface. model_data.c (conditional) or .cpp model_data.c or .cpp is conditionally generated. It contains the declarations for the parameters data structure, the constant block I/O data structure, and any zero representations used for the model's structure data types. If these data structures and zero representations are not used in the model, model_data.c or .cpp is not generated. Note that these structures and zero representations are declared extern in model.h. model_types.h Provides forward declarations for the real-time model data structure and the parameters data structure. These may be needed by function declarations of reusable functions. Also provides type definitions for user-defined types used by the model. rtwtypes.h Defines data types, structures and macros required by Real-Time Workshop Embedded Coder generated code. Most other generated code modules require these definitions. ert_main.c or .cpp (optional) autobuild.h (optional) This file is generated only if the Generate an example main program option is on. (This option is on by default.) See Generate an example main program. This file is generated only if the Generate an example main program option is off. (See Generate an example main program.) autobuild.h contains #include directives required by the static version of the ert_main.c main program module. Since the static ert_main.c is not created at code generation time, it includes autobuild.h to access model-specific data structures and entry Marco Gloeckler 55 Appendix File Description points. See Static Main Program Module for further information. model_capi.c model_capi.h (optional) or .cpp Provides data structures that enable a running program to access model parameters and signals without use of external mode. To learn how to generate and use the model_capi.c or .cpp and .h files, see the Monitoring Signals With the C API chapter in the Real-Time Workshop documentation. Marco Gloeckler 56 Appendix 9.4 CCS File Type (34) Explanation of the important file types in CCS 1. file.pjt: to create and build a project named file 2. file.c: C source program 3. file.asm: assembly source program created by the user, by the C compiler, or by the linear optimizer 4. file.sa: linear assembly source program. The linear optimizer uses file.sa as input to produce an assembly program file.asm 5. file.h: header support file 6. file.lib: library file, such as the run-time support library file rts6700.lib 7. file.cmd: linker command file that maps sections to memory 8. file.obj: object file created by the assembler 9. file.out: executable file created by the linker to be loaded and run on the C6713 processor 10. file.cdb: configuration file when using DSP/BIOS Marco Gloeckler 57 Appendix 9.5 Code of the “Embedded Control Unit” function [enable,enableplay,ledFlashOut] = fcn(processing,dip, playing) %#eml %----------------------------------------------------------------------% Defining Variables % *** enablevar=0; enableplayvar=0; persistent ledFlash; persistent fileend; persistent fileendplay; persistent startplay; persistent playend; persistent enableFOld; persistent enableFNew; persistent enablePOld; persistent enablePNew; % *** % Initialise persistent variables % *** if isempty(ledFlash) ledFlash = 0; end if isempty(fileend) fileend = 0; end if isempty(fileendplay) fileendplay = 0; end if isempty(startplay) startplay = 0; end if isempty(playend) playend = 0; end if isempty(enableFOld) enableFOld = 0; end if isempty(enableFNew) enableFNew = 0; end if isempty(enablePOld) enablePOld = 0; end if isempty(enablePNew) enablePNew = 0; end % *** %----------------------------------------------------------------------% Defining start values % *** enableFNew=processing; dipnew=dip; enablePNew=playing; % *** Marco Gloeckler 58 Appendix %----------------------------------------------------------------------% Control to enable the "Processing" block % *** if ((enableFNew==0) && (enableFOld==1)) fileend = 1; startplay = 1; % enables the start of the "Play" block end if ((dip==1)&& (fileend==0)) enablevar=1; else enablevar=0; end % % *** % Control to enable the "Play block" % *** % if ((dip==1) && (enablePNew==0) && (enablePOld==1)) fileendplay = 1; end if ((dip==1) && (startplay==1)&& (fileendplay==0)) enableplayvar=1; else enableplayvar=0; end % % *** % % Reset of variables % *** if (dip==0) fileendplay=0; enableplayvar=0; enablevar=0; startplay=0; fileend=0; ledFlash=0; end % *** % Toggle enable for LEDs % *** if ((ledFlash==0) && (processing==1)) ledFlash=1; else ledFlash=0; end % *** Marco Gloeckler 59 Appendix %----------------------------------------------------------------------% Writing output and set variable for next step % *** enable=enablevar; enableplay=enableplayvar; ledFlashOut=ledFlash; enableFOld=enableFNew; enablePOld=enablePNew; % *** %#eml end Marco Gloeckler 60