Slides - WCET 2013
Transcription
Slides - WCET 2013
Predictable hardware: The AURIX Microcontroller Family Worst-Case Execution Time Analysis WCET 2013, July 9, 2013, Paris, France Jens Harnisch ([email protected]), Infineon Technologies AG, Automotive Microcontroller, Application and Concept Engineering 22-Jul-13 Copyright © Infineon Technologies 2011. All rights reserved. Page 1 Confidential Outline Challenges for Automotive Microcontrollers Power train application characteristics AURIX: WCET related system features AURIX: WCET related core features Multi-Core Software Considerations Tracing Options 22-Jul-13 Copyright © Infineon Technologies 2011. All rights reserved. Page 2 Confidential Automotive Microcontroller – The Challenge Safety & Security Performance Microcontroller Concept Power Consumption Cost Keep Constraints for Hard Real Time Support: • Predictability • Keep latency constraints (e.g. for interrupts) • Non intrusive trace support 22-Jul-13 Copyright © Infineon Technologies 2011. All rights reserved. Page 3 Confidential Characteristics of typical applications running on AURIX (power train) Static assignment of tasks to cores, separated schedulers (AMP) Time and event driven components (event driven components handled via preemption) Roughly every 7th instruction is a branch Floating point instructions: varying, depends on code generation Data locality: Data (except for LUTs) fits usually into DSPR, having 0 clocks latency Look up tables are frequently used Instruction level parallelism: 0.8 is often a good result; with high optimization more than 1 instruction per cycle is feasible (TriCore TC 1.6P has 3 pipelines) 22-Jul-13 Copyright © Infineon Technologies 2011. All rights reserved. Page 4 Confidential AURIX Multi-Core Controller: WCET related system features Checker Core FPU PMI FPU TriCore 1.6P PMI DMI TriCore 1.6P FPU DMI PMI TriCore 1.6P DMI Highly predictable architecture with duplicated resources (local memories, crossbar) to avoid resource conflicts Crossbar Program Flash Data Flash Sys RAM Bridge FFT SDMA HSSL BCU SCU STM GTM GPT12x CCU6x HSM DS-ADC ADC Peripheral Bus Priority driven and round robin arbitration for masters attached to crossbar Starvation protection in crossbar No cache coherence 10/04/2013 22-Jul-13 IOM FCE I²C PSI5(S) SENT QSPI ASCLIN MSC MultiCAN+ FlexRay Ethernet SCR EVR 5V or 3.3V single supply Copyright Copyright © © Infineon Infineon Technologies Technologies 2013. 2011. All All rights rights reserved. reserved. Dedicated and scalable communication instructions Page 5 Confidential TriCore 1.6E and TriCore 1.6P: WCET related features TriCore 1.6Efficiency F D E1 E2 TriCore 1.6Performance F1 Scalar Harvard F2 PD D E1 E2 PD D E1 E2 PD D E1 E2 Superscalar Harvard Common TriCore 1.6 Instruction Set One pipeline only for high power efficiency Superscalar: integer, load store and loop pipeline 4 pipeline stages for up to 200MHz 6 pipeline stages for up to 300MHz Power 0.2mW/MHz Power 0.3mW/MHz 1.2 – 1.4 DMIPS/MHz 1.6 – 2.3 DMIPS/MHz Caches: 2 way LRU Easy to describe RAW and structural dependencies No support of simultaneous multithreading 4 stage data store buffer 22-Jul-13 Copyright © Infineon Technologies 2011. All rights reserved. Page 6 Confidential Support for hard real time systems Support for high average case performance usually contradicts high predictability Static timing analysis: precision of results and efficiency of analysis strongly depend on hardware architecture features Three classes of hardware architectures [1]: – Fully timing compositional architectures: no timing anomalies – Compositional architectures with constant-bounded effects: timing anomalies without domino effects -> TriCore is assumed to belong to this class – Non-compositional architectures [1] Predictability Considerations in the Design of Multi-Core Embedded Systems. C. Cullmann, C. Ferdinand, G. Gebhard, D. Grund, C. Maiza, J. Reineke, B. Triquet, S. Wegener, and R. Wilhelm. Ingénieurs de l'Automobile, 807, 2010. 22-Jul-13 Copyright © Infineon Technologies 2011. All rights reserved. Page 7 Confidential Design Guidelines for predictable multicore architectures Established by C. Cullmann, C. Ferdinand, G. Gebhard, D. Grund, C. Maiza, J. Reineke, B. Triquet, S. Wegener and R. Wilhelm [1] 1. Fully timing compositional architecture 2. Disjoint instruction and data caches 3. Caches with LRU replacement policy 4. A shared bus protocol with bounded AURIX (with bounds) access delay 5. Private caches 6. Private memories, or, only share lonely resources [1] Predictability Considerations in the Design of Multi-Core Embedded Systems. C. Cullmann, C. Ferdinand, G. Gebhard, D. Grund, C. Maiza, J. Reineke, B. Triquet, S. Wegener, and R. Wilhelm. Ingénieurs de l'Automobile, 807, 2010. 22-Jul-13 Copyright © Infineon Technologies 2011. All rights reserved. Page 8 Confidential Some software mapping examples Interrupt driven tasks on one core, time driven tasks on other(s) Split all tasks across cores to balance performance With and without OS (computationally intensive functionality) Split by ASIL level Split by tier 1 / OEM FPU PMI TriCore 1.6P LMU DMI Overlay Important: Differentiate Progr. Flash PMU Progr. Flash Progr. Flash Progr. Flash SRI Cross Bar between your(!) optimization objectives (metrics) and constraints RAM PMU Data Flash BROM Key Flash Checker Core Checker Core FPU PMI TriCore 1.6P FPU DMI Overlay PMI TriCore 1.6E DMI Overlay Standby Examples for both optimization objectives and constraints : Core load, memory balance, specific task latencies, end-to-end latencies 22-Jul-13 Copyright © Infineon Technologies 2011. All rights reserved. Page 9 Confidential What does mapping of software mean? To map: functionality to cores, code and data to memories! To use: scheduling and communication (AUTOSAR mechanisms and/or direct mechanisms). Major mapping criteria: Core performance, support for lock step, memory access latencies and sizes, access conflicts Summary for the software developer: Mapping of functionality often constrained by performance of cores, lock step and hence is not so complex Mapping of code and data: Up to 9(!) different memory locations to consider: possibly complex FPU PMI TriCore 1.6P DMI PMU RAM PMU SRI Cross Bar PMI 22-Jul-13 Checker Core Checker Core FPU FPU TriCore 1.6P DMI PMI TriCore 1.6E Copyright © Infineon Technologies 2011. All rights reserved. DMI Page 10 Confidential Real life example: Race condition and deadlock after few micro seconds! PMI TriCore 1.6P DMI RAM core 0 ready core 1 core 2 ready ready not used SRI Cross Bar PMI 22-Jul-13 TriCore 1.6P DMI PMI TriCore 1.6E DMI 3 bits of a 32 bit variable used to synchronize all three cores, located in RAM and cached in all 3 DMIs Copyright © Infineon Technologies 2011. All rights reserved. Page 11 Confidential Real life example: Details 3 bits of a 32 bit variable used in RAM to synchronize all 3 cores, after synchronization cores should proceed independently. Atomic operation applied to 32 bit variable to modify ready bit for each core. PMI TriCore 1.6P DMI Because the synchronization may often be necessary, the developer decided to cache the variable. RAM SRI Cross Bar However, atomicity refers only to physical memory location (in this case the cache, and not LMU RAM)! Atomicity was wanted, but not really guaranteed, hence a race condition was provoked! PMI TriCore 1.6P DMI PMI TriCore 1.6E DMI Incorrect status of variable (due to race condition) led to deadlock. Incorrect solution of the developer: change timing via cross bar priorities -> software was executing without deadlock, but was still incorrect! Correct solution: Do not cache globally used variables! 22-Jul-13 Copyright © Infineon Technologies 2011. All rights reserved. Page 12 Confidential Trace, Measure and Calibrate in Target System with Emulation Devices (EDs) TC1798ED Product Chip Part (SoC) Cerberus Shared Bus I/F (DMA) SoC MLI0 DAP/ JTAG SBCU PCP2 SRI XBAR TC1.6 BOB POB BOB POB LMU EEC EMLI Message Sequencer MCX MCDS IOC32 DMC EEC Emulation Memory EMEM (768 KB SRAM) Back Bone Bus BBB EBCU ED_block_diagram.vsd Tracing as non-intrusive technology (with regards to timing) gains importance for multi-core software development EDs: Same package, no additional pins needed Calibration memory with same access speed as flash 22-Jul-13 Copyright © Infineon Technologies 2011. All rights reserved. Page 13 Confidential Flow for using MCDS Target Execution Control – Establish test case GBits Trace Qualification Trace Message Trace Storage Trace Reconstruction mHz Debug Tool 22-Jul-13 – Configure “trigger, event, action” – Generate only relevant information – Save messages chronologically – Interpolate history – Present in human readable form Copyright © Infineon Technologies 2011. All rights reserved. Page 14 Confidential Summary Reaching higher performance with more cores rather than with core and memory hierarchy optimizations might simplify WCET, if there is little interference between the cores Powerful protection and tracing mechanisms will be needed to assure non-interference or at least assess the degree of interference AURIX has a focus on protection mechanisms to avoid interference (where not needed) and powerful tracing options to assess interference (where unavoidable) 22-Jul-13 Copyright © Infineon Technologies 2011. All rights reserved. Page 15 Confidential Timing and Performance Analysis Partners, Acknowledgements Academia Partners Prof. Dr. Bernhard Bauer, Augsburg University Prof. Dr. Heiko Falk, Ulm University Industry Partners AbsInt Angewandte Informatik GmbH, Gliwa GmbH, Symtavison GmbH, Timing Architects GmbH Debugger Vendors with Trace support: iSYSTEM AG für Informatiksysteme, Lauterbach GmbH, PLS Programmierbare Logik & Systeme GmbH Acknowledgements Prof. Dr. Claire Maiza (Grenoble INP/ Ensimag) Reinhard Deml, Frank Hellwig, Pawel Jewstafjew, Dr. Albrecht Mayer (Infineon) 22-Jul-13 Copyright © Infineon Technologies 2011. All rights reserved. Page 16 Confidential Thank you for the attention 22-Jul-13 Copyright © Infineon Technologies 2011. All rights reserved. Page 17
Similar documents
Invited Talk - DVCon India
type (read, write, fetch, Accurate Bus Traffic Detailed Processor Trace load) Processor Generator data & number of bytes Model • Instruction Timing• ,Transferred op-code, Busaddress, Protocolmnemon...
More information(TriCore) Jul 31, 2012 | PDF
CPU load at 180MHz frequency, for the complete Field Oriented Control (FOC) algorithm. TriCore™ AURIX™ family offers multicore architecture, allowing inverter control, hybrid torque management and ...
More information