Document 6505325

Transcription

Document 6505325
How to Bridge the Abstraction Gap in System
Level Modeling and Design
A. Bernstein, Intel Corp., Israel
M. Burton, ARM, UK
F. Ghenassia, STMicroelectronics, France
Abstract— As more and more processors and subsystems are
integrated in a single system, the verification bottleneck is
driving designers away from RTL and RTL-like strategies for
verification and design to higher abstraction levels. Increasing
system complexity at the other hand requires much faster
simulation and analysis tools. This is leading to new standards
and tools around transaction level modeling. Languages such as
SystemC and SystemVerilog are rich in behavioral and
structural constructs which enable modelling designs at different
levels of abstraction without imposing a top-down or bottom-up
design flow. In fact, most design flows are iterative and modules
at different levels of abstractions have to be considered. A more
abstract model is very useful to increase simulation speed and to
improve formal verification. SystemC and SystemVerilog stress
the importance of verification support for complex SOCs
including improvement for hardware verification as well as for
the verification of hardware dependent software. In today’s
design flows the software development can often only start after
the hardware is available. This causes unacceptable delays for
the software development. The idea of transaction level
modelling (TLM) is to provide in an early phase of the hardware
development transaction level models of the hardware. Based on
these TLMs a fast enough simulation environment is the basis for
the development of hardware and hardware dependent software.
The presumption is to run these transaction level models at
several tens or some hundreds of thousand transactions per
second which should be fast enough for system level modelling
and verification.
D
I. INTRODUCTION
EVELOPING an Integrated Circuit today is a totally
different experience compared to the same task a decade
ago. It used to be that the specification of an IC could be
defined adequately in terms of input-output relationship. The
relationship might have been deterministic or statistical in
nature but was self-contained. The development team could
design and validate an IC based on these concise
requirements. Designers were mainly concerned about
executing the "RTL to GDSII" flow. With ever finer silicon
geometries and densities this flow continues to be challenging
today, however it now represents only a small portion of a
bigger picture. In today's systems an IC can rarely be designed
0-7803-8702-3/04/$20.00 ©2004 IEEE.
in isolation from the rest of the system. For example, in the
world of cost sensitive Cellular and Handheld products it is
critical to tune the architecture to meet performance, cost and
power requirements. Incorporating RF, Processors, Graphics
and Memory subsystems and running user applications on top
of complex software stacks and Operating System, the real
challenge is to optimize the "Antenna to Application" flow.
Use of spreadsheets or hand calculations is not sufficient to
cope with the complex hardware and software interactions.
The system is so complicated that end-to-end modeling has
become essential for the assessment of performance and
effectiveness prior to design completion.
II. DESIGN FLOW
The ideal design flow would encompass product
development from preliminary requirements definition to
system performance measurements. Ideally the product
requirement specifications should be captured in an
executable form (a "program") and then continuously be
refined to get the final product design. Such product will be
correct by construction. Its performance will be formally
traceable to the initial requirement specifications and its
quality will be limited only by the quality of the tool
managing the flow. In recent years this holistic approach is
becoming usable in certain limited cases, one notable example
being Processor Design [1]. However for the general case this
is still not feasible today. The best known methods today still
need to apply different tools to different steps of the product
development flow and do not formally guarantee end-to-end
equivalence. Understanding the gaps in the capabilities of
today's flows is the first step in solving the problem.
Examples from ARM, Intel (especially Intel's Cellular and
Handheld Group), STMicroelectronics will show different
aspects of this ideal flow and relate it to current status of
design flows. Before the different applications are described
Transaction Level Modeling (TLM) will be introduced.
III. TRANSACTION LEVEL MODELING
Transaction Level Modeling (TLM) addresses a number of
practical system level design problems. The OSCI Transaction
910
Level Working Group (TLMWG) has identified a number of
abstraction levels at which modeling can take place [3, 5].
AL: Algorithmic At the algorithmic level, there is no
distinction made between hardware and software.
SW: A Software View At this level, there is a division made
between hardware and software. The model is at least suitable
for programmers to develop their software. Of course, not
every system will have programmable elements. This level is
also referred to as the Architectural view.
HW: A Hardware View Finally, this level has enough
information for hardware engineers to develop both the device
itself and/or the devices surrounding the device being
modeled. It may not have the fidelity of the RTL, but enough
for the hardware designer. This level is also referred to as the
Micro-Architectural view.
Orthogonally to this, two technologies have been suggested
that people are using to build models. Models are either built
in a “parallel” style, or a “sequential” style (we have avoided
using the terms blocking and non blocking, as to different
people, they mean different things!). Parallel models break
functionality down into units that can be executed at the same
time. Communication between these units is often achieved
using a “FIFO” style interface. The expectation on this
interface is that the initiator to target path and the target to
initiator response path will be separate.
In a sequential model, functionality is broken down into
sub-functional blocks which call each-other in sequence. An
initiator will call a target for a “service”, and receive a
response directly. Communication between these units is often
achieved using a function call interface. The function call
typically transports information both from an initiator to a
target and back from the target to the initiator.
The expectation on this interface is that the initiator to
target path will be combined with the target to initiator repose.
(Of course it is totally possible to use either technology to
replicate the other, but that serves no purpose other than to
hide the problem!)
In addition to these abstraction levels, several “types” of
model have been described that operate over one or more
abstraction levels and use one (or for PVT both) model
technologies.
In order to visualize them, it’s convenient to describe a
modeling space (Fig. 1). The horizontal access is about how
fine, or course grained the functionality of the mode is split.
The vertical access is about the level of abstraction that the
model attempts to represent.
The types that have been identified are:
1. AL : Algorithmic
2. CP : Communicating Processes
3. PV : Programmers View
4. PVT : PV + Timing
5. TA : (Interconnect) Transaction Accurate
6. CC : Cycle Callable
7. RT : RTL level
Functional
Explore algorithms
Concurrent processing
Software Development
AL
CP
Parallel
PV
Sequential
PVT
CC
TA
Hardware Development RTL
Implementation
Hardware/
Software
Partitioning and
Benchmarking
Fig. 1: TLM space
IV. TLM BRIDGING THE GAP
Since similar technology can be used at many different
abstraction levels, the issue is not how to bridge the
abstraction gap, but how, or whether to mix simulations of
different technologies.
Using one technology, it is possible to write a model at the
algorithmic level, and move all the way down to a hardware
view.
However, fundamentally the combined initiator, target
initiator path of a function call paradigm is very different from
the separate initiator target, target initiator duel paths of a
FIFO style interface.
Attempting to mix PV models with TA models is a tricky
problem. One conclusion is to only use (say) A FIFO style
interface. But there are difficulties with this approach.
At this point it is worth re-visiting the key fundamental
reasons that modeling is used to reduce time to market. There
are three principle areas in which this can be achieved: early
embedded software development, performance analysis and
functional verification.
Possibly the single biggest effect can be achieved by
commencing early embedded software development activity
while the hardware is still unavailable. In order to provide
models of systems that are suitable for software engineers, the
software programmer will have two principle considerations:
Speed (Ideally “real time”)
Ability of the environment to ensure that the software
written will work on the hardware.
At the heart of a model used for software development will
be a programmable device. For these, using today’s
technology, the speed of a model, encapsulating the same
(software view) information written using a function call
paradigm can be significantly higher than one using a FIFOstyle interface. Hence, today, the majority of models used for
this task are based on function call technology.
The second of the two requirements is interesting in its own
right. It is often confused and misinterpreted as a requirement
911
to exactly replicate the hardware. In fact in some cases this is
an inappropriate decision as it may be very difficult to find
hardware/software interaction bugs on the hardware (or an
exact replication of it). A model can potentially do “better”
than the hardware for this task by specifically stressing the
software. This will be explained in the example from
STMicroelectronics in the following chapter in more detail.
Multi-million gate circuits currently under design with the
latest CMOS technologies do not only include hardwired
functionalities but also embedded software running most often
on more than one processor. In fact, they embed so much
functionality of the system they are designed for that they
have become systems on their own! This extended scope is
driving the need for extensions of the traditional RTL-toLayout design and verification flow. These extensions must
address 3 key topics: early embedded software development,
performance analysis and functional verification.
The embedded software accounts for more than half of the
total expected functionality of the circuit and most often most
of the modifications that occur during the design of a chip
based on an existing platform are software updates. An
obvious consequence of this statement is that the critical path
for the development of such a circuit is the software, not the
hardware. Enabling software development to start very early
in the development cycle is therefore of paramount
importance to reduce the time-to-market. At the same time, it
is worthwhile noticing that adding significant amount of
functionality to an existing core platform may have a
significant impact on the real-time behavior of the circuit, and
many applications that these chips are used in have strong
real-time constraints (e.g. automotive, multimedia, telecom). It
is therefore equally important to be able to analyze the impact
of adding new functionality to a platform with respect to the
expected real-time behavior. This latter activity relates to the
performance analysis of the defined architecture. The
functional verification of IPs that composes the system as well
as their integration has also become crucial. The design flow
must support an efficient verification process to reduce the
development time and also to avoid silicon re-spins that could
jeopardize the return on investment of the product under
design.
At STMicroelectronics, one direction to address the above
issues is to extend the CAD solution proposed to product
divisions, known as Unicad, beyond the RTL entry point; this
extension is referred to as the System-to-RTL flow [4] (Fig.
2).
As the current ASIC flow mainly relies on three
implementation views of a design, namely the layout, gate and
RTL levels, the extended flow adds two new views: TLM and
algorithmic.
Algorithm models the expected behavior of the
circuit, without taking into account how it is implemented.
TLM
RTL
Embedded SW
V. TLM BASED DESIGN AND VERIFICATION FLOW AT
STMICROELECTRONICS
SoC Verification
SoC Performance
CUSTOMER SYSTEM SPEC
Fig. 2: Unicad System-to-RTL flow
SoC architecture (i.e. SoC TLM platform) captures
all information required to program the embedded software of
the circuit, using SystemC [5].
SoC micro-architecture (i.e. SoC RTL platform)
captures all information that enables cycle-accurate
simulations. Most often, it is modeled at the register-transfer
level (RTL) using HDL or Verilog language. Such models are
almost always available because they are used as input for
logic synthesis.
VI. DESIGN FLOW FOR HANDHELD MOBILE TERMINAL AT
INTEL
A modern handheld mobile terminal includes two major
elements, the communication subsystem and the application
subsystem. The communications subsystem – sometimes
referred to as the "modem" – handles data flow between the
antenna and the digital bit stream. The application subsystem
can best be described as a general purpose computer platform,
running user applications.
Historically, the communications subsystem has been the
more challenging to develop and received more attention. Its
performance, being easily measurable, is dictated by
international standards which are strictly enforced by
powerful mobile carriers. Following many years of academic
research and industrial experience, the art of modem design
and validation has progressed to a stage where its performance
can be specified, modeled and verified to sub-dB resolution.
Starting from a floating-point model and moving to a fixedpoint model, the modem performance is simulated in the
presence of a noisy channel. The fixed-point model is
manually transformed into an implementation, partially DSP
firmware and partially dedicated digital logic circuitry. The
fixed-point model is bit-exact i.e. at the same abstraction level
as the actual implementation. Written in plain C and running
on a small Linux computer farm, the simulation speed is
adequate to suit the development team needs.
912
The application subsystem environment is quite different.
Its processing needs is not as regular and predictable and is
heavily influenced by end-user compound usage scenarios.
There are no pre-set minimum performance targets governed
by a regulatory body nor are there any established
benchmarks. Historically the approach was to bundle an
available CPU core and memory subsystem and accept the
resulting performance ("you get what you get"). While this
was sufficient for the first generation of data-enabled phones,
it is no longer adequate for modern 3G devices with their
heavy multimedia workloads. As vendors differentiate
themselves by optimizing in multiple dimensions (power, cost,
speed), a proper modeling infrastructure becomes essential.
Such infrastructure includes the following basic elements:
modeling engine, model, workload collection and results
analysis.
The modeling engine is sometimes referred to as the
"simulator". Modeling of silicon can be done at different
abstraction levels. RTL modeling is very detailed and allows
direct synthesis to gates and layout. RTL simulation is useful
for validation at the module level and above, however the
slow speed makes it is practically impossible to use a
complete chip RTL for system simulations. At the other end of
the possible range of abstractions is functional simulation,
accurately modeling the instruction execution flow of a
processor. This is much faster and useful for software
development but not for performance analysis. In between
these alternatives lies Transaction Level Models (TLM),
which includes timing information. Mixed mode modeling is
useful to allow the inclusion of RTL models into the system
model. Although very slow, including a RTL unit allows
cross verification of the two models.
which can be characterized as "cycle counting accurate". This
TLM provides higher simulation speed at the cost of slightly
lower accuracy. It complements a commercially-available
"cycle accurate" TLM.
Having settled on the simulator and the model, the next step
is to develop the input stimuli (workload). In the past system
models were rarely constructed. Only very simple
benchmarks, like Dhrystone or MPEG decode, were run on
simple CPU models to assess CPU performance. Since small
benchmarks typically fit inside the instruction cache, the
impact of external program memory can be completely
overlooked. To verify a Cellular handset will not drop a call
or a graphics-intensive game will perform properly, it is
necessary to port real world software and operating systems to
the model. This places a lower bound on the simulation speed
(around 1 MHz); operating system boot could not take more
that a few minutes otherwise software debug becomes
impractical. Porting large software packages must also be
supported by a proper software debugging environment.
Finally, having the system running, processing workloads
and producing an output (for example a graphics frame image)
enable performance optimization. To do this peeking into
internal system nodes and resources is necessary. Typically
information about interconnect and buffer usage is necessary
and runtime statistics have to be collected. The analysis of this
data would allow locating bottlenecks or identifying
redundant resources. Since the SystemC model, at this point,
is just a compiled C program it is tempting to think that
generic C code debuggers can do the job. While true to some
extent, a generic C debugger is not aware of the SystemC
higher-level constructs. Dedicated, commercial SystemCaware analysis tools provide significant value at this step.
VII. SYSTEMC APPLICATION AT CHG
VIII. CONCLUSION
Intel’s Cellular and Handheld Group (CHG) has chosen
SystemC as their standard modeling language [2]. The
technology of developing a functional model is familiar to
most engineers and has been an established practice in the
world of CPU design for decades. Likewise, RTL modeling is
well understood and being used for silicon design. C is
typically used to develop Functional models, and converting
such model to SystemC is a simple exercise. Converting
Verilog or VHDL RTL to SystemC is also simple (or can be
avoided by mixed-mode simulations). Unfortunately neither
functional nor RTL is the right abstraction level for effective
system modeling. It is the TLM level that brings most rewards
– but also poses most challenges in model development.
Design engineers are not used to think at this abstraction level;
trade-offs exist between development speed, accuracy and
runtime speed. The required accuracy level, lower than 100%,
has to be derived empirically. It is obvious that lower accuracy
models are quicker to develop and run faster, but when
Validation and maintenance costs are considered the decision
complicates. TLM technology is evolving and there are no
agreed classifications as yet. CHG developed its own TLM,
The success criterion for any modeling-related investment
is the impact it has on product architecture. To have an impact
the analysis results must be available early enough in the
product development process when tradeoff can still be made.
In practice the chip design team will not wait for modeling
results and will make ad-hoc decisions as needed to match the
chip design and tape-out schedule
Timely delivery depends on proper schedule planning,
Modeling tool ("engine") selection and model development
strategy. Schedule planning means the modeling activity has
to start early. The point in which the design team is already
struggling with system performance issues is too late to start
and will not result in a real architectural impact. Therefore the
ability to develop models in-house is essential. The turnaround for custom model development by third-parties is such
that models quickly become stale. Attempts to bring the
modeling engine development in-house were abandoned
because the large investment could not be justified as
SystemC tools became available outside. An early project
engagement during the first requirements collection phase is
supported by a standard SystemC-based tool infrastructure
913
and in-house model development for non-standard IP
modules.
TLM especially in connection with SystemC is a new
modeling and design style to support this process. But the
problem is as described above the mixing of different
modeling technologies which is always very hard. Especially
for the purpose of performance analysis and for verification
lower level models with detailed timing are very important.
One aspect of both verification and performance analysis is
typically replacing components written at a higher level of
abstraction with those written at a lower level, finally, RTL. In
order for this to work, clearly, it is preferable to use a model
that is written using similar technology like a FIFO based
interface model.
What becomes important for an IP providers perspective is
to be able to support all the possible combinations of design
flows and use models. Fortunately the TLM working group
has only identified two basic technologies, a function call
paradigm suitable for “PV” and “PVT” models, and a FIFO
interface paradigm. IP providers like ARM can then satisfy
most of the people most of the time by providing two basic
classes of models. For EDA vendors, the challengers are not
only to provide means by which models can be progressively
refined through different abstraction levels, but also to provide
means by which models of different technologies can be
deployed.
REFERENCES
[1] A. Hofman, H. Meyr and R. Leupers, "Architecture
Exploration for Embedded Processors with LISA",
Kluwer Academic Publishers, 2002
[2] T. Groetker, S. Liao, G. Martin and S. Swan, "System
Design with SystemC", Kluwer Academic Publishers,
2002
[3] W. Müller, W. Rosenstiel, J. Ruf (Eds.), "SystemC Methodologies and Applications,", Kluwer Academic
Publishers, 2003
[4] A. Clouard, K. Jain, F. Ghenassia, L. Maillet-Contoz, and
J. P. Strassen, “Using transactional level models in a SoC
design flow”, in: [3]
[5] “SystemC 2.0 Functional Specification” – Open SystemC
Initiative, 2000
914