Werden sich Chips in Zukunft selbst heilen? - Eikon eV

Transcription

Werden sich Chips in Zukunft selbst heilen? - Eikon eV
Werden sich Chips in Zukunft selbst heilen?
Einsatz von Autonomic Computing Prinzipien in künftigen
System on Chip Architekturen
A. Herkersdorf
Lehrstuhl für Integrierte Systeme
TU München
www.lis.ei.tum.de
Outline
Integrated Systems (R)Evolution
Moore’s Law continues and enables SoCs of ever more
increasing complexity, heterogeneity
Conservative worst-case design approach no longer
feasible when reaching physical CMOS limits
Autonomic ASoC Design Concept – Conquer Complexity
with Complexity
ASoC Architecture Platform
ASoC Research Challenges
Acknowledgements:
Conclusions
This is joint ongoing work with
W. Rosenstiel, Uni Tübingen
Autonomic SoC - 2
© Lehrstuhl für
Integrierte Systeme
1
Source: ITRS
Microprocessor
complexity
Lmin
109
108
106
105
104
Pentium 4
107
103
Pentium
Pentium III
80486
Pentium II
80386
106
100
80286
8086
105
8008
104
4004
103
10
1
Designer Productivity
0.1
10
5
2
1
0.5
0.2
0.1
0.05
Min.transistor channel length (µm)
1011
1010
Designer Productivity (K trans. / PM)
Proposition:
1968 1972 1976 1980 1984 1988 1992 1996 2000 2004 2008 2012
Year
Challenges originating from CMOS scaling:
New conceptual approach
in IC design method
necessary
Incorporating autonomic
principles into SoC
architecture and design
tools
Increasing fabrication defects and
Coping with billion transistor design
sensitivity for stochastic events from
reaching CMOS physical limits
Increasing dynamic and static power
dissipation densities
complexities at all abstraction levels
“Productivity gap” increasing
engineering costs and/or development
time cycles
© Lehrstuhl für
Integrierte Systeme
Autonomic SoC - 3
Last Year’s EIKON Seminar
Aktuelle Herausforderungen an die EDA
1,000
10,000
Transistors / Staff Month.
1,000
100
58%/Yr. compounded
Complexity growth rate
10
100
10
1
x
0.1
xx
0.01
xx
x
x
1
21%/Yr. compound
Productivity growth rate
x
0.1
0.01
2007
2009
2005
2001
2003
1999
1997
1995
1993
1989
1987
1985
1983
0.001
1981
Complexity
Systemkomplexität
Siliziumkomplexität
Fertigungsschwankungen
Sinkende Zuverlässigkeit
…
Transistor per Chip (106)
Transistors / Chip
Productivity
(K) Transistors / Staff - Mo.
100,000
10,000
1991
Chipcapacity (transistors per chip)
Integrated Systems (R)Evolution
Source: U. Schlichtmann, “Das Zeitalter der
Entwurfsautomatisierung für Integrierte
Schaltungen”, EIKON Mitgliederversammlung,
3.02.2004
Autonomic SoC - 4
© Lehrstuhl für
Integrierte Systeme
2
Last Year’s EIKON Seminar
Aktuelle Herausforderungen an die EDA
10,000
Transistors / Staff Month.
1,000
100
58%/Yr. compounded
Complexity growth rate
10
10
1
0.1
xx
0.01
xx
x
x
x
1
21%/Yr. compound
Productivity growth rate
x
0.1
0.01
Kritischer
Pfad
100%
2007
2009
2005
2001
2003
1999
1997
1995
1993
1991
1989
1987
1985
1983
1981
0.001
• Entwurfskosten
100
Productivity
(K) Transistors / Staff - Mo.
100,000
Transistors / Chip
1,000
Systemkomplexität
Nominale Bibliothekscharakterisierung
Siliziumkomplexität
NOM
Fertigungsschwankungen
Fertigungsschwankungen
Sinkende Zuverlässigkeit Abwägung möglich zwischen
• Produkteigenschaften
Korrelationen
…
• Fertigungskosten
Zellbibliothek
Complexity
Transistorcharakterisierung
Transistor per Chip (106)
10,000
ASIC-Design morgen (Statistisches
Design)
Source: U. Schlichtmann, “Das Zeitalter der
für Integrierte
Schaltungen”, EIKON Mitgliederversammlung,
f 3.02.2004
Parametrische Ausbeute
Entwurfsautomatisierung
tcrit
CLK
Autonomic SoC - 5
© Lehrstuhl für
Integrierte Systeme
Autonomic Computing – IBM Manifesto
Server Trends:
Inter-networked computer
systems witness ever increasing
complexity and diversity
Millions of lines of code
Heterogeneous OS and
application SW stacks
#1 I/T Challenge:
Dealing with the complexity of
I/T systems originating as a
consequence of technological
progress
How Does Nature Deal with
Complexity?
Example: human organism and its
autonomic nervous system
Conquer Complexity
with Complexity…
Sub-conscious selfcontrol, -management,
-organisation,
-healing capabilities
and learn to work
around imperfections
Figure source: Encarta Encyclopedia © Microsoft Corporation
Autonomic SoC - 6
© Lehrstuhl für
Integrierte Systeme
3
Autonomic System on Chip (ASoC)
Systems are defined at different levels:
Compute Servers, Internet Routers
“Box level”
Chip-level multi-processors, platformbased mixed-signal SoCs
“Chip level”
Why to embed “autonomic” concepts into chip HW layer?
Holistic approach:
SoC CMOS technology is basis of higher-order IT systems
Investigate and establish HW
“hooks” for higher layer autonomic software
Nanosecond control loop cycle
periods don’t allow for SW-in-theloop
HW defects may prevent SW
diagnosis activation
Autonomic SoC - 7
© Lehrstuhl für
Integrierte Systeme
Autonomic SoC – Future Perspectives
CPU
stand-by
SRAM
CPU
Network i/f
CPU
su
B
HW
accel.
SDRAM
Cntrl.
Dealing with defects:
Continuous, SoC component-
specific diagnosis determines and
disables faulty:
(sub-) building blocks,
interconnect resources,
memory regions
Active/Stand-by selection among
redundant IP functions
Determines minimum feature set
for current system operation
Peripherals
Autonomic SoC - 8
© Lehrstuhl für
Integrierte Systeme
4
Autonomic SoC – Future Perspective
CPU
stand-by
SRAM
su
B
Performance/Power Optimization:
fF
CPU
vV
When work load increases,
raw bus capacity is increased,
critical transactions among bus
HW
accel.
entities are prioritized,
voltage and frequency of CPU are
raised,
secondary interfaces are temp.
SDRAM
Cntrl.
deactivated,
redundant resources may be
activated for load balancing,
Network i/f
reconfigurable HW accelerators
Peripherals
are activated on demand to
facilitate CPU off-load
Autonomic SoC - 9
© Lehrstuhl für
Integrierte Systeme
Existing Techniques Reusable for ASoC
DVFS: dynamic voltage and frequency scaling
BIST: built-in-self test
On-Chip debugging aids: ChipScope, RISCWatch
Reconfigurable computing platforms
General area of fault tolerant systems
However,
... either not yet widely used, or mostly in isolation
Have to be made syntactically compatible and semantically coupled
Standardization in industry forum required
Autonomic SoC - 10
© Lehrstuhl für
Integrierte Systeme
5
ASoC Architecture Platform
FUNCTIONAL layer
Evolutionary approach with
compatibility to SoC design method
CPU
CPU
Bus
RAM
PU
Rededicate part of SoC capacity to
self-x surveillance and control
Autonomic
Network i/f
Element
functions in Autonomic SoC layer
Autonomic Element (AE) library
monitor
Closed HW control loop between
evaluator
FE and AE element
actuator
communication i/f
monitor, evaluate, memorize,
AUTONOMIC layer
(communicate), execute
Autonomic SW functions have
Maintaining state info on PU history
access to AE parameters
in AE elements can be exploited for
emergent system behaviour
Autonomic SoC - 11
© Lehrstuhl für
Integrierte Systeme
ASoC Architecture: Planned Activities
CPU:
Extension of DIVA/RAZOR concepts
Functional and runtime errors
Dynamic switch-over between
parallel EXE pipelines
Demonstrate with public domain soft
core (e.g. LEON2)
Memories:
On the fly interleaved rd/wr
AE Interconnect:
Shared Buffer-Insertion Ring
Native broadcast support
Simultaneous p2p connections
Simple comm. i/f
Distributed AE Control:
accesses (w/o parity, ECC) to
monitor memory locations
Detect / repair sporadic and
persistent bit defects
Exclude defective locations from
future accesses by addr. translation
Couple network / system i/f event
Autonomic SoC - 12
rate with DVFS and on-chip bus
data path width & frequency
modulation
© Lehrstuhl für
Integrierte Systeme
6
ASoC Research Challenges
New Architectures, Design Methods and Tools
Dealing with graceful degradation and redundancy in
distributed SoCs with changing structure
Interleaving and/or co-existence of functional and
autonomic layers
Validation techniques for partially verified interdependent
macros
Concepts for dynamic and coupled power/performance
mgmt.
Methods for flexible and dynamic HW/SW repartitioning
Defective HW is replaced on demand by equivalent SW
process
Dynamically loaded HW configuration replaces lowerperforming SW process
Autonomic SoC - 13
© Lehrstuhl für
Integrierte Systeme
Conclusions
Reliable design of complex SoCs is the key challenge for
nanoelectronics when CMOS technology approaches its
physical limits
Application of autonomic or organic computing principles
has been proposed to tackle this challenge
Requires conceptual change in SoC design process,
architecture enhancements, and tools support
Requires innovative contributions at all levels of
abstraction: technology, device, gate, circuit, system,
software
Thanks!
Autonomic SoC - 14
© Lehrstuhl für
Integrierte Systeme
7