System-on-Chip deployment with MCAPI abstraction and IP
Transcription
System-on-Chip deployment with MCAPI abstraction and IP
System-on-Chip deployment with abstraction and IP-XACT metadata MCAPI Lauri Matilainen, Lasse Lehtonen, Joni-Matti Määttä, Erno Salminen, Timo D. Hämäläinen Tampere University of Technology, Department of Computer Systems, P.O. Box 553, 33101 Tampere, Finland Email: {lauri.matilainen, lasse.lehtonen, joni-matti.maatta, erno.salminen, timo.d.hamalainen}@tut.fi focus on the latter approach, which includes modularizing, reusing and putting together components from libraries. Specifically, we focus on abstracting and describing the components, designs and design configurations in XML metadata. The practical goal of the metadata is to ease compatibility of design automation tools. However, in this paper we exclude the automation and show how the metadata is used in each design step. Two case studies demonstrate the flow and metadata usage for five different deployments on FPGAs. The performance of the methodology is measured in terms of design effort, time, and system performance. Our approach is based on IEEE1685/IP-XACT metadata standard [3], which is originally purposed for hardware IPblock integration. We extend IP-XACT to SW components and HW/SW mappings, as well as application communication abstraction using Multicore Association Communications API (MCAPI) [4]. The benefit is the ability to describe all SoC components in a uniform, standard way, and include also the design process and tools related information. Throughout the deployment flow, we use freely available Kactus2 IP-XACT tool [5] [6] and IP cores [7]. The key contributions in this paper are: New extensions to IP-XACT 1.5 for MCAPI, SW and HW/SW mappings Two design examples using the extensions New MCAPI platform-dependent transport layer implementations for shared memory and message passing architectures Design effort measures for changing SoC deployment between different platforms The paper is organized as follows: Section II discusses related works and Sections III and IV introduce the design flow and the case studies. Section V briefly explains our IPXACT extensions. Section VI gives details how metadata is used in the flow. Sections VII and VIII present the results and conclusions. Abstract—IP-XACT, the recent IEEE1685 standard, defines metadata format for IP packing and integration in System-onChip designs. It was originally proposed for hardware descriptions, but we have extended it for software, HW/SW mappings and application communication abstraction. The latter is realized with Multicore Association MCAPI that is a lightweight message passing interface. In this paper we present as a work-in-progress how we utilize all these to deploy and move application tasks between different platforms for FPGA prototyping, execution acceleration or verification. The focus is on the metadata format since it is a foundation for automation and tool development. The design flow is illustrated with two case studies: A motion JPEG encoder and a 12-node workload model of video object plane decoder (VOPD). These are deployed to PC and Altera and Xilinx FPGA boards in five variations. The results are reported as the deployment time for both nonrecurring and deployment specific tasks. Setting up a new deployment is a matter of hours when there is an IP-XACT library of HW and SW components. Index Terms—electronic design automation, multiprocessor system-on-chip, meta-data, IP-XACT, IEEE1685, MCAPI, Multicore Association, Kactus2, deployment I. INTRODUCTION Current System-on-Chip designs include many Intellectual Property (IP) blocks, currently about 50 IPs in total and ten processor cores. The amount of IPs increases steadily about 20% per year. At the same time, the amount of embedded software increases even more rapidly. Despite the SoC design itself, the challenge is also the overall product design especially for industrial embedded systems. Today 60% of the SoC design should be reused and productivity increased 10x by year 2022 [1]. One of the most acute problems is how to quickly deploy a product to a new platform either because it offers better performance, cost reduction or because the old one is discontinued. The same challenge is inherently present in design verification, e.g. in FPGA prototyping and emulation. In addition, reuse often proves complicated in practice although all designers recognize its importance and benefits. SoC design tools are principally categorized as creation and integration environments [2]. The former ones start from high abstraction level models, and transform the models to executable codes for supported platforms. In this paper we 978-1-4673-2297-3/12/$31.00 ©2012 IEEE II. RELATED WORK Any system design includes lots of source files (VHDL, C/C++), HW and SW libraries, tool specific project files and target specific executable files. A project can consist of 10k files on hundreds of folders. Traditionally the dependencies 209 between parts are in the form of file references and path definitions. Care must be taken that the paths remain valid. One major problem is how to quickly apply changes in platforms, applications or both. Version management alone does not solve the whole problem, since it is immune to the meaning of the files. Improvements include database-based system descriptions and use of metadata, most often in XML or script languages like TCL. Several SoC design frameworks include domain-specific and custom metadata formats in companies and research frameworks like Koski [8] and Daedalus [9]. Recently, IPXACT extensions are proposed to exchange application description from process networks to a reconfigurable computing HW generation in DWARV (Delft Workbench Automated Reconfigurable VHDL) [10]. However, their approach is to fit IP-XACT to their domain, which reduces general applicability of IP-XACT. Xilinx has presented a big set of vendor extensions, in total of 119 different XML elements. Extensions are for their own tool support and new attributes and dependencies e.g. for supported devices, IP taxonomy and port endianess. Xilinx ISE Design Suite 13 includes 50 Xilinx IP cores and by end of 2012, all Xilinx IP cores will support IP-XACT [17]. Although vendor extensions are handy when working with one tool only, they create vendor lock-in. Consequently, design data is harder to port from tool to another, and requires transformation tools. As far as we know, we present for the first time IP-XACT extensions for explicit SW components, MCAPI nodes, endpoints, channels and HW/SW mappings. It should be noted that we do not present a principal new system model but a metadata format that reuses IP-XACT 1.5 constructs. MCAPI was chosen since it aims at embedded and manycore systems with lightweight and efficient implementations. One principal requirement was support for platforms with and without an operating system, as well as abstracting HW blocks with the same abstraction level. MCAPI was found as a suitable standard. Since MCAPI is quite a new standard there are not many publications comparing it to other parallel APIs. nodes and endpoints. MCAPI channels connect the endpoints and are illustrated as dotted lines in Figure 1A. This step makes the application modules independently portable. Abstract communication incurs some overhead and therefore it is not beneficial to convert all function calls into MCAPI endpoints. Appropriate granularity depends on both the application and the underlying platform. The third step includes decision what is executed on HW and what on SW by means of automated or manual architecture exploration. Accelerator modules for HW execution have the same MCAPI endpoints as their SW counterparts, so it does not matter which implementation is selected. The HW implementation e.g. in VHDL is reused either from a library or a newly created at this step. In the last step, the application modules are deployed to the desired platforms, including e.g. several different FPGAs, boards and computers. The deployment can be the final product implementation or prototyping and verification. Independent of technologies, the platform must include MCAPI protocol layers for processors. For accelerator HW IP blocks the MCAPI endpoints can be virtual (access included in processors MCAPI implementation) or directly implemented in HDL at the IP-block. The final outcome is quick change of the deployment between different platforms. A. Design flow B. IP-XACT / Kactus2 objects A 1. Executable application description B • SW application module descriptions (as IP-XACT component) • SW structure descriptions (as IPXACT design) C D • MCAPI endpoint descriptions (as IP-XACT bus interface) • Endpoints mapping to SW application and SW designs (VLNV reference) A 2. Communication abstraction API B 3. Architecture exploration and HW/SW partition API Processor A API B III. DESIGN FLOW OVERVIEW HW IP-block API Figure 1 depicts an overview of the proposed deployment flow. The left side (A) shows the phases with an example design and the right side (B) the IP-XACT metadata objects used during the flow. Our starting point is executable application description, and now we do not consider how this is obtained. An example is a set of C source codes that should be deployed to a new platform and possibly some of them to HW acceleration. The application can be complete or a model representing external behavior. The next step is grouping of the application modules according to mutual communication by performing static analysis or with simulation traces. For example, highly data dependent modules are put together. For such modules the next step is abstracting the communication using MCAPI 4. Deployment FPGA 1 FPGA 2 B A C API PHY API API PHY API API PHY API • • • • HW components and designs SW platform components SW platform structure SW platform mapping to HW components • Endpoints mapping to HW components and SW (VLNV reference) • System design • SW to HW mapping • MCAPI channels descriptions • Design configuration descriptions Figure 1 The proposed deployment flow on the left (A). Summary of metadata objects for each design step on the right (B). In the proposed design flow, Kactus2 is used in all steps that need metadata capture. The main principle is to use the standard IP-XACT elements like components, objects and bus interfaces as far as possible to help tool development and compatibility. The right-hand side of Figure 1 lists the 210 deployment flow related objects (IP-XACT component, Design) and the extensions defined as explicit relations (VLNV reference). Unfortunately, that is by default component-specific and not instance-specific, just like the file sets and the object reference to a HW design. Therefore, having multiple identical CPUs (HW instances) executing different SW is cumbersome. Either one needs to add a new view for each SW for the HW component, or make a new HW component version for each SW. The former blows up the number of views in the HW component, and the latter the number of HW components in a library. Thus, SW for CPU should also be stored to the HW component instances. Alltogether, we have three choices for referring to SW from a HW component: i) the standard way: fileset references per CPU for SW, ii) we add a custom parameter for each CPU to store the VLNV reference to a SW design, or iii) we store the mapping information into a separate, new design type. We call the latter a system design that we explain later. IV. HANDLING SW AND COMMUNICATION In IP-XACT, component is the main object type and it describes one reusable IP, for example CPU, memory or accelerator. Its internal structure can be captured with design that contains instances of other components and their connections. In a sense, the component corresponds to a VHDL entity and the design to architecture. Bus definitions groups several I/O pins together and simplifies making connections between components. All IP-XACT objects are referenced using a unique VLNV (Vendor, Library, name, Version) to build up systems, which means that the file names or locations on disk do not matter. Increasing fraction of SoC designs regard SW, but currently IP-XACT considers mostly HW. However, many aspects are very similar in SW as well and hence IP-XACT is directly applicable, for example SW modules correspond to HW components. Moreover, we propose the following extensions to describe, combine and configure SW building blocks, as well as bind them to HW resources. B. Capturing SW stack structure We follow the general model of a SW stack, which includes a platform independent application layer on top and a SW platform below it having a hardware dependent layer at the bottom. When designed properly, changing one layer does not require changes on other layers. SW platform as whole should be application (case-specific) independent. In our SW design, we can describe the structure of SW as a SW stack (with the layers), or without it just to show how SW components are connected together. Figure 3 shows the simplified model of the SW design, in which instances are denoted with i0, i1… A. Referring software from IP-XACT objects IP-XACT components have one or multiple views, for example a hierarchical design with references to other component VLNVs or a flat one with references to files as shown in Figure 2. IP-XACT design is a separate object and it is referred to in component’s view. In addition to referring to a SW file directly from a HW component, we extend IP-XACT to refer to a new SW design. SW design i0: SW component Application SW API (IP-XACT bus interface) Disk IP-XACT object lib HW component SW design i0: SW component (flat) View: File set for SW source File Refs SW design i0: SW component Test bench files … (hier.) View: Ref to SW Design … Object Refs (hier.) View VHDL files (flat) View: File set for HW src (flat) View: File set for SW src i2: SW component SW platform File set: e.g. MCAPI transport SW API (IP-XACT bus interface) SW application source file(s) i1: SW component File set: e.g. Eth driver files Figure 3 Software design includes application and/or SW platform. … … IP-XACT bus interfaces originally meant for HW pin interfaces now correspond to APIs between the SW components. This allows IP-XACT tool to automatically check that all connections are valid, for example a driver may need the POSIX-compliant API. For the SW stack, we separate the applications and SW platform components using custom attributes. The same information can also be stored into e.g. the library field of the VLNV, but the attributes are safer for tool implementation. SW components can be flat or hierarchical like the HW components. The simplest possible SW component includes SW platform files Figure 2 Regarding SW as IP-XACT component brings flexibility to referencing. Our extensions in metadata are written with italic font. It is just like a regular IP-XACT HW design but includes a set of SW components instead of HW components. The benefit is that now SW components can exist without a HW component, and can be independently referenced and reused. IP-XACT HW component can include CPUs and, by means of file references, executable images that is mapped to them. 211 only references to the needed files, and a general interface that does not specify any API. The benefits of SW components and designs are formalized documentation, better library management and potential to add automatic checking and compilation. between SW and HW instances as depicted in Figure 4. The upper level component, e.g. “FPGA-based product”, refers to the HW and system designs. This way we can introduce several SW mappings to the same HW or the same SW for several HW platforms. C. Communication abstraction with MCAPI Structural packetizing of the software helps in the design management but does not reveal the inter processor communication patterns in the system. Structural description tells what parts are connected but not how they communicate as the exact details are “hidden” in the source codes. For example MCAPI is very suitable for alleviating this. It abstracts the system as nodes (e.g. thread) that have multiple endpoints. Consequently, the top-level view of the system is a set of interconnected nodes, regardless of the HW hierarchy level they reside. These nodes are usually processors, e.g. inside a workstation, on a PCB board, or on an FPGA, and all are shown together. Moreover, in our work HW accelerators can be accessed using the MCAPI endpoint abstraction instead of e.g. register addresses. The MCAPI channels between nodes’ endpoints define which nodes communicate with each other. There are three types of unidirectional endpoints: scalar, message, and packet. For example, a DCT function may have two inputs (pixel data and quantization parameter) and one output (frequency coefficients). Many functions can be easily chained. The same DCT can be reused in many systems by changing the connections between the endpoints, e.g. are the pixels coming from a camera or from a processor. Obviously, MCAPI function calls must be translated to the underlying platform, for example as calls to Ethernet driver on a FPGA [14] and POSIX calls on Linux PC. Implementation of this MCAPI transport layer is part of the SW platform stack, as depicted in Figure 3. Such abstraction simplifies making different deployments with modest effort. Explicit description of MCAPI nodes is implemented as IPXACT bus interfaces for SW components. However, we can also define virtual endpoints for HW components by introducing special communication interfaces for them. MCAPI channels are implemented as IP-XACT interconnections between bus interfaces, and defined in our system design. To help importing code, Kactus2 tool can analyze the legacy source codes looking for endpoints and then add them to the IP-XACT SW component automatically. Component (e.g. FPGA-based product) (hier.) View (hier.) View IP-XACT Object References System design HW Design Obj Ref i0: HW instance i0: HW component Instance Ref … sw_i0: SW Component Obj Ref HW component sw_i1: SW Component … … Obj Ref SW component Figure 4 System design describes instance-specific mapping of SW to HW in IP-XACT. Our system design shows graphically a flattened HW design that tells what SW instances are mapped to which HW instances. Each CPU instance (a programmable component), and HW instance with virtual MCAPI endpoints, is automatically imported to the system design by Kactus2. Then the user drag-and-drops SW components from the library onto the HW placeholders, or creates new SW components from scratch for the HW instance. Unfortunately, there is no native way in IP-XACT to store references to instances, but only to VLNV which always points to a library object. Hence, we add vendor extensions. System design is now linked to a HW design and each SW component instance is linked to its host HW component instance, as shown in Figure 4. It should be noted that only HW components with CPUs or virtual MCAPI endpoints are available for the SW mapping. Each CPU placeholder contains all application and SW platform components executed by that CPU. If IPC ports, such as MCAPI endpoints, have been defined, the designer connects them as well to visualize which application parts communicate. These system-wide MCAPI channels are defined as IP-XACT interconnections and bus interfaces between SW component instances as depicted in Figure 5. HW instance i0 is of type CPU and includes SW components with MCAPI endpoints. i1 is a fixed HW components with virtual MCAPI endpoints. Creating MCAPI channels is not only for documentation, but it is also used in writing the application code in Kactus2. The code editor suggests auto-completion of valid system wide endpoint identifiers and other MCAPI related tasks. During the SW generation the system-wide unique endpoint IDs are automatically generated by Kactus2. It also checks automatically the endpoint types and that the communicating parties have also a physical connection. D. Separate system design phase for mapping In the simplest cases component-specific SW has been associated to a HW component already during the IP packetization, for example in case of a fixed applicationspecific processor that always computes the same task in all of its instances. However, in general, we must have a way to specify instance-specific SW for each CPU. Therefore, we create a separate system design which tells the mapping 212 System Design i0: HW component (CPU) C++ classes for the manager. They also provide editors for adding and modifying IP-XACT objects. Kactus2 includes moderate design automation by itself, e.g. object validation, tracking of changes and built-in generators for VHDL, Quartus, Modelsim simulation, MCAPI code template and header file generation. For example, VHDL generator takes hierarchical IP-XACT component as input and creates a structural top-level VHDL description from it. Architecture exploration, automated HW/SW mapping and other system level automation can be implemented as standard IP-XACT generators. i1: HW component (Fixed) sw_i0: SW component sw_i1: SW component sw_i2: SW Component … MCAPI Channel as IP-XACTbus Figure 5 MCAPI channel definitions in system design. In summary, the system design describes the following: 1) SW structure including application and SW platform components for layered models (stack), 2) binding SW components or a stack to each CPU, 3) association of SW component and MCAPI endpoints, 4) system wide MCAPI channels between the endpoints. B. Kactus usage In this paper, Kactus2 has three main roles. First, it is used to packetize HW and SW platform and application components, including MCAPI endpoints. The objects include generic parameters, associated files and information about the tools used and their languages and options. The second use of Kactus2 is to create the designs: hierarchical deployment-specific HW designs, application independent SW stacks, SW designs to be associated with HW components, and system designs that collect all parts together. All of these are designs carried out manually in this paper. However, generators are used to create implementations on FPGAs. Third, Kactus2 also helps to manage the library and show dependencies between the objects, both for user and generator created ones. This is the more important the more deployment experiments and versions exist [16]. V. TOOLS The case study applications are first executed on a workstation to perform functional verification of the algorithms, and then moved and remapped to multiple embedded processors or accelerators. Hence, HW/SW codesign is required and multiple tools are needed as listed in Table 1. MCAPI abstraction helps especially in remapping. The Kactus2 tool helps mainly in the integration phase, by capturing information from individual components and in interconnecting them. After that, the legacy synthesis and compilation tools are used to create the executables. Applications can be profiled with Intel Vtune on PC or with timers and specialized monitors on FPGA. VI. CASE STUDIES The proposal is demonstrated with two case study applications and three platforms. Case studies consider how the application modules are described and moved from PC to different FPGA platforms. Given the limited space and the complexity of the topic, simple toy applications are used here for demonstration. Table 1 Utilized tools in different phases of the case studies. Design step Executable application description MCAPI communication abstraction Nios generation uBlaze generation IP-XACT design capture (all steps) Structural VHDL code generation MCAPI code generation FPGA synthesis Software Compilation Performance measurement User time measurement Tool Manual coding in C Manual coding in C SOPC/Qsys Platform Studio Kactus2 Kactus2 Kactus2 Quartus, Xilinx ISE Altera IDE, Xilinx SDK Intel Vtune, timers Grindstone, manual A. Applications The first case is an intra-frame video encoder (motionJPEG) given in ANSI C-code. The original encoder uses shared memory architecture and runs on a single thread. We modify the code so that the DCT transform could be more easily executed on another processor or a specialized HW accelerator. The second case is a workload model for a Video Object Plane Decoder [17]. It is a process network including 12 nodes, 14 arcs and a timer, as shown in Figure 6. Each graph’s node is implemented as a MCAPI node and arcs as MCAPI channels. Since this is a coarse model, it shows only communication performance. However, we can easily modify the ratio between computational load and the amount communication. The tasks wait for input tokens (certain number of bytes) before they are triggered for execution. Here, “execution” means just waiting since this is a A. Kactus2 tool Kactus2 is the first open source IP-XACT tool [15]. It is v1.5 standard compatible and also implements the extensions presented in this paper. It is freely available under GPL2 license at sourceforge.com. It is written in C++ and uses Qt 4.8.0 framework. Currently Kactus2 includes over 87k lines of code. The main modules are a design manager and a library handler that are also visible in GUI. The former takes care of the main tasks, including design capture and invoking generators. The library functions parse XML metadata into 213 workload but not a functional model. After execution, a task sends one or more tokens to other tasks. INV SCAN ARM IDCT VID ACDC PRED IQUA N UP SAMP Appl. RUN LEN DEC message passing, so we implemented a topology mapping layer into MCAPI transport. In practice, we have a header file describing where MCAPI nodes are mapped. The header is generated by Kactus2 from the system design. Based on the node ID, the mapper selects whether to use shared memory or message passing. VOP MEM Application(s): M-JPEG, VOPD MCAPI top layer (API) OpenMCAPI VOP REC Figure 6 The task graph of VOPD. B. HW platforms Our physical hardware platform consists of a PC, Altera DE2 and Xilinx Spartan-3E FPGA boards. FPGAs accommodate soft NIOS and uBlaze processors, DCT as an exemplary HW IP-block and Avalon, HIBI and Xilinx PLB on-chip buses as well as Ethernet interfaces. NIOS processors also included HIBI DMA connected to Avalon and dual-port memory as in [14]. Simplified structure of the HW platform is shown in Figure 7. In the experiments, the FPGAs were run at 50 MHZ. The logic utilization was few thousand LUTs. DE2 DCT PC Eth Eth MCAPI ref. impl. PAD SW platform STRIP E MEM Topology mapping HW platf. Hardware Abstraction Layer POSIX Linux X Shared memory Message passing Virtual endpoint adapter(s) Standard devices MCAPI Transport layer Custom devices Shared mem, ETH, PCIe, Avalon, CoreConnect,AXI,HIBI X86, x64, NIOS, MicroBlaze, HW IP-blocks Figure 8 Layered model of the system. VII. RESULTS Table 2 summarizes the overall complexity in terms of files and lines of code for the case studies. We notice that MCAPI is the largest software module in terms on lines. Realistic applications are easily an order of magnitude larger and hence the overhead from MCAPI is not as drastic as here. Spartan 3E Eth Table 2 Case study components without metadata files. Nios Nios x86 Item uBlaze Video enc. (orig, Socket, MCAPI) VOPD (MCAPI) OpenMCAPI transport, Linux OpenMCAPI transport, uC-Linux MCAPI transport, no OS HIBI driver NIOS XilinxlwIP uC-Linux DCT accelerator HIBI PE DMA controller HIBI segment Figure 7 HW platform with three different types of CPUs. C. SW platform and MCAPI implementations The software platform includes MCAPI transport layers for Linux (PC), uC-Linux (NIOS) and a non-OS version (NIOS, uBlaze), Ethernet stacks and HIBI bus macros and drivers. PC uses Debian GNU/Linux 6.0 and NIOS uC-Linux ver. 2.6. OpenMCAPI is used on PC and on Nios that runs uCLinux. For Nios and uBlaze without any OS, MCAPI reference implementation from Multicore Association (MCA) was modified to use only message passing. It allows only one MCAPI node per processor since no scheduler is available. Furthermore, the channels are created statically. The implementation is very concise, just less than 5k lines of C. It took few weeks to implement. It should be noted that a MCAPI port is a non-recurring task. A special feature is that HW accelerators are seen as virtual endpoints [14]. Other nodes can use them similarly regardless of whether they are executed on SW or HW. For all virtual endpoints, we need and adapter that transforms MCAPI transactions to native device operations. In this case this was done for a DCT IP-block device driver. Figure 8 depicts the protocol stack used in PC, NIOS and uBlaze. OpenMCAPI uses POSIX and shared memory model for MCAPI. In heterogeneous platforms we may need also # of Files 8 12 77 75 5 3 40 55k 7 5 15 Lines of code (language) C: <1k C:<10k C:4.5k C:4.5k C:1.5k C: 2k C: 2.5k C:18M VHDL: 3k VHDL: 2k VHDL: 7k It is not easy to measure the benefit of reuse and abstraction, or design methodology in general. The primary metrics here is the required user effort in terms of wall-clock time as well as number of files and code lines. The former varies greatly between individuals whereas the latter ones provide a firstorder approximate for comprehension complexity. A. Performance results It should be noted that performance is not the primary objective in this paper. However, the MCAPI cost was measured from the video encoder running on PC. The calls to the most time consuming function, DCT, were modified in four ways. 214 Table 3 Created metadata files and the time in hours spent on creating them. First, the pass-by-reference was converted to pass-by-value, i.e. using memcpy() for the input data and results. This increased the runtime only by 2%. This is the lower bound overhead if DCT would be executed by an accelerator (or another processor) that does not have access to the processor’s data memory. Then, the DCT and main functions were turned into separate threads and communication is done via shared memory. The runtime increases by 43% since it turns out that system calls take quite a large fraction of time in this case. Note that the main function was just waiting for DCT results instead of a pipelined execution. Last, the communication was done using Sockets and MCAPI calls using OpenMCAPI. Both cases used two threads. The overheads were measured by Intel Vtune. As expected the runtime increased, up to 3x of the original. The overhead from threads is the same as before but there is large amount of bookkeeping associated with the communication abstraction. This was surprising, especially because our earlier experiments on Nios had extremely low overheads using our modified MCAPI reference implementation from MCA that was just tens of cycles per call [14]. However, OpenMCAPI uses lots of Linux system calls and the overhead was nearly 20 kcycles per transfer. This result shows that further optimization is required to gain both reusability and performance at the same time for openMCAPI. Metadata type HW component HW interface HW design SW component System design Total MJPEG #Files Time 20 4 12 2 3 1 2 0.2 4 1 41 8.2 VOPD #Files Time +5 +2 +2 13 10 35+32=67 +1 +0.5 +1 2 4 9.5+6=15.5 Most of the HW-related metadata was already in the library, shown on top of the table. Application and mappings are casedependent and created from scratch, shown on the bottom of the table. Moreover, we took a ready-made DCT HW IP-block described in VHDL, and created an IP-XACT HW component and associated to it a virtual MCAPI endpoint. Packetizing a working HW component is a straightforward task and takes about 10 minutes. Reusing the DCT IP-block in another SoC is just a drag-and-drop operation, a matter of seconds. Application is divided in two parts: DCT and the rest as two applications and two endpoints. The deployments are carried out for two platforms according to the three use cases. Table 4 Video encoder deployment time [hours]. MJPEG phase MCAPI comm. abstraction IP-XACT library setup (HWP/SWP) IP-XACT HW design IP-XACT System design Generators (VHDL, C) Synthesis and compilations Total B. Deployment effort The MCAPI communication abstraction and other modifications to the video encoder were very straightforward, just about half a working day for an engineer already familiar with the original version. In other words, converting a program into using MCAPI proved to be efficient. The total number of code lines increased only about 8%. In practice, only two files were modified (main.c and dct.c). Similarly, implementing the workload model of VOPD took only two days for an engineer without any earlier experience from MCAPI. Another part is the effort needed to create the IP-XACT metadata descriptions, and most importantly designs that reflect the different deployments. The video encoder experiment has the following use cases: PC 4 1 DCT@HW +0 +6 DCT@NIOS +0 +1 0.2 0.2 0.2 5.6 +1 +0.2 +0.2 +0.8 +8.2 +0.2 +0 +0 +0 +1.2 For VOPD, we used three different platforms and tried out two different deployments. The VOPD is first executed on PC only, and then parts of it to Altera and Xilinx FPGA boards, both connected via Ethernet to PC. The VOPD was also packetized similarly as the previous one. The right column of Table 3 shows the difference when the video encoder case was already created. Large parts of the HW was the same but components for Xilinx board needed few additions, hence marked e.g. as ”+5”. The total time for these was about one working day. Table 4 and Table 5 summarize the deployment effort in hours. For the MJPEG, the first column reports time to PC only deployment. To move DCT to HW on Altera FPGA, most of the extra time is required for the library setup. It included packetizing such HW and SW platform components that are not dependent on any application. To move DCT from HW to NIOS and uCLinux requires only NIOS related HW and SW platform components to be packetized (+1 hour). In summary, The PC deployment took about 6 hours, moving DCT to FPGA about 8, and after that moving the DCT to NIOS 1 hour. If the library were complete in the beginning, all deployment times would have been about 1 hour. The VOPD case included deployments on PC and a combination of PC, DE2 and Spartan boards. Again we 1) Run everything on PC 2) Run the DCT accelerator on FPGA and the rest on PC 3) Run the DCT on Nios-processor and the rest on PC We capture all the needed metadata for the case in Kactus2. At least three HW designs are needed: PC+FPGA board, structure of the FPGA board, and internal structure of the FPGA. The needed components include: PC, board, FPGA chip, memories, Ethernet chip, and all board peripherals. All components need also bus interface definitions. Table 3 summarizes the number of files and time associated to Kactus2 usage. 215 measure time to set up everything from scratch for PC and report the difference when FPGAs are involved. VIII. CONCLUSIONS We presented a proposal to deploy applications on MCAPI communication abstraction, and how IP-XACT metadata is extended to handle it. The case studies illustrate the effort when no automation is involved. Packetizing of the HW and SW components, and the HW and system designs with MCAPI definitions are carried out manually. The figures of effort included both non-recurring tasks and deployment specific, recurring effort. Provided a solid IP-XACT library, deployments can be carried out in less than an hour of user time. Our future work includes introduction of non-functional property extensions and IP-XACT generators to help automating the deployments. Table 5 VOPD deployments time [hours]. VOPD phase Creation, incl. MCAPI abstr. IP-XACT library setup (HWP/SW platforms) IP-XACT HW design IP-XACT System design Generators (VHDL, C) Synthesis and compilations Total PC 16 9 PC+ALTERA+XILINX +0 +6 2.5 0.5 0.2 28.2 +1 +0.2 +0.2 +1.8 +9.2 HW component instance REFERENCES [1] [2] Flat SW component (Application) [3] [4] MCAPI endpoints [5] Hier. SW component (SW platform) [6] [7] [8] [9] [10] [11] [12] [13] MCAPI channels [14] Figure 9 Kactus2 v1.3 system design of VOPD case study with 4 CPU instances, 12 application tasks and SW platforms. [15] Figure 9 shows a screenshot of Kactus2 system design of the VOPD case. Independent of the HW hierarchy, the system design flattens all components that can accommodate MCAPI endpoints. Thus, Figure 9 shows a PC, two NIOSes and one uBlaze as hardware components. Each has a SW platform stack, the endpoints and associated application code. MCAPI channels show clearly that PC communicates only with the first Nios, but not with the other or with MicroBlaze. [16] [17] [18] 216 The International Technology Roadmap For Semiconductors: 2011 edition System Drivers, Available for order: http://www.itrs.net/. D. Densmore, R. Passerone, A. Sangiovanni-Vincentelli, "A PlatformBased Taxonomy for ESL Design", IEEE Design&Test of Computers, Sep/Oct 2006, vol. 23 no. 5, pp. 359-374. IEEE Standard for IP-XACT, Standard Structure for Packaging, Integrating, and Reusing IP within Tools Flows, IEEE Std 1685-2009, Feb. 2010, pp.C1-360. Multicore Association. Multicore Communications API Specification Version 1, 2008. http://www.multicore-association.org. L. Matilainen, A. Kamppi, J-M. Määttä, E.Salminen, T.D. Hämäläinen, "KACTUS2: IP-XACT/IEEE1685 compatible design environment for embedded MP-SoC products", TUT Report 37, 2011, ISBN 978-952-152625-1. Kactus2, [Online], Available: http://sourceforge.net/projects/kactus2/. Funbase IP library::Overview, [online], Available: http://opencores.org/project,funbase_ip_library. T. Kangas et al., "UML-based Multi-Processor SoC Design Framework, Transactions on Embedded Computing Systems", May 2006, Vol.5, Issue 2, pp. 281-320. H. Nikolov, T. Stefanov, and E. Deprettere, "Systematic and automated multiprocessor system design, programming, and implementation", IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 27(3):542-555, March 2008. Razvan Naneet. al, "IP-XACT extensions for Reconfigurable Computing, Application-Specific Systems, in ASAP ", 2011, pp.215218, Sept. 2011. A. El Mrabti, F. Pétrot, A. Bouchhima, "Extending IP-XACT to support an MDE based approach for SoC design",in DATE, 2009, pp.586-589. C. Neely, G. Brebner, Weijia Shang, ShapeUp: "A High-Level Design Approach to Simplify Module Interconnection on FPGAs", in FCCM, May 2010, pp.141-148. PolyCore Software Ic., PolyPlatform, [Online] Available: http://www.polycoresoftware.com/products.php. L. Matilainen, E. Salminen, T.D. Hämäläinen, Marko Hännikäinen “Multicore Communications API (MCAPI) implementation on an FPGA multiprocessor”, in SAMOS XI, July 2011 pp.286-293. A. Kamppi et al., "Kactus2: Environment for Embedded Product Development Using IP-XACT and MCAPI", in Euromicro DSD, AugSep. 2011 pp.262-265. E. Salminen, T.D. Hämäläinen, M. Hännikäinen, "Applying IP-XACT in Product Data Management", in SoC symposium, Oct-Nov. 2011 pp.86-91. E. Pekkarinen, et al., "A Set of Traffic Models for Network-on-Chip Benchmarking", in SoC symposium, Oct- Nov 2011, pp. 78-81. Xilinx Inc., [Online] Available: http://press.xilinx.com/phoenix.zhtml?c=212763&p=irolnewsArticle&ID=1536758