RISC-V “Rocket Chip” SoC Generator in Chisel

Transcription

RISC-V “Rocket Chip” SoC Generator in Chisel
RISC-V “Rocket Chip”
SoC Generator in Chisel
Yunsup Lee
UC Berkeley
[email protected]
What is the Rocket Chip SoC Generator?
§  Parameterized SoC generator written in Chisel
§  Generates Tiles
- (Rocket) Core + Private Caches
§  Generates Uncore (Outer Memory System)
- Coherence Agent
- Shared Caches
- DMA Engines
- Memory Controllers
§  Glues all the pieces together
2
“Rocket Chip” SoC Generator
Tile
Rocket
Core
ROCC
Accel.
FPU
Tile
Rocket
Core
ROCC
Accel.
FPU
L1 Inst
L1 Data
L1 Inst
L1 Data
sets,
ways
sets,
ways
sets,
ways
sets,
ways
L1 Network
Coherence Manager
HTIF
§  Generates n Tiles
- (Rocket) Core
- RoCC Accelerator
- L1 I$
- L1 D$
§  Generates HTIF
- Host DMA Engine
§  Generates Uncore
- L1 Crossbar
- Coherence Manager
- Exports MemIO
Interface
TileLink / MemIO Converter
3
Why SoC Generators?
§  Helps tune the design under different
performance, power, area constraints, and
diverse technology nodes
§  Parameters include:
- number of cores
- instantiation of floating-point units, vector units
- cache sizes, associativity, number of TLB entries,
cache-coherence protocol
- number of floating-point pipeline stages
- width of off-chip I/O, and more
4
Why Chisel?
§  RTL generator written in Chisel
- HDL embedded in Scala
§  Full power of Scala
for writing generators
- object-oriented
programming
- functional
programming
Chisel Program
Scala/JVM
C++
code
FPGA
Verilog
ASIC
Verilog
C++ Compiler
Software
Simulator
FPGA Tools
ASIC Tools
FPGA
Emulation
GDS
Layout
5
Rocket Scalar Core
PC
PC
Gen.
IF
ITLB
I$
Access
bypass paths omitted
for simplicity
ID
Int.RF
Inst.
Decode
EX
Int.EX
FP.RF
MEM
DTLB
D$
Access
FP.EX1
WB
Commit
To RoCC
to Hwacha
Accelerator
FP.EX2
FP.EX3
Rocket Pipeline
VITLB
- 64-bit
5-stage
single-issue
in-order pipeline
PC
VInst.
SeqBank1 ...
Bank8
Expand
VI$
- Design
minimizes
impact
of
long
clock-to-output
delays
Gen.
Decode
uencer
R/W
R/W
Access
of compiler-generated RAMs
from Rocket
- 64-entry
BTB, 256-entry BHT, 2-entry RAS Hwacha Pipeline
- MMU supports page-based virtual memory
- IEEE 754-2008-compliant FPU
-  Supports SP, DP FMA with hw support for subnormals
6
ARM Cortex-A5 vs. RISC-V Rocket
Category
ARM Cortex-A5
RISC-V Rocket
ISA
32-bit ARM v7
64-bit RISC-V v2
Architecture
Single-Issue In-Order
Single-Issue In-Order 5-stage
Performance
1.57 DMIPS/MHz
1.72 DMIPS/MHz
Process
TSMC 40GPLUS
TSMC 40GPLUS
Area w/o Caches
0.27 mm2
0.14 mm2
Area with 16K Caches
0.53 mm2
0.39 mm2
Area Efficiency
2.96 DMIPS/MHz/mm2
4.41 DMIPS/MHz/mm2
Frequency
>1GHz
>1GHz
Dynamic Power
<0.08 mW/MHz
0.034 mW/MHz
- PPA reporting conditions
-  85% utilization, use Dhrystone for benchmark, frequency/power
at TT 0.9V 25C, all regular VT transistors
- 10% higher in DMIPS/MHz, 49% more area-efficient
7
HTIF: Host-Target Interface
§  UC Berkeley specific block mainly used to
emulate devices for simple test chips
- Emulates system calls, console, block devices,
frame buffer, network devices
- No need for this block once the SoC has actual
devices on the target machine
§  Consider it as a “host DMA engine”
§  A port for for host system to read/write
- Core CSRs (control and status registers)
- Target Memory
8
Important Interfaces in the Rocket Chip
Tile
Rocket
Core
Tile
Rocket
Core
ROCC
Accel.
ROCCIO
FPU
HTIF
HTIFIO
ROCC
Accel.
FPU
HostIO
L1 Inst
L1 Data
sets,
ways
sets,
ways
sets,
ways
sets,
ways
client
client
client
client
Tile
L
L1 NetworkO
Lin
Tile
kIO
Li
Tile
nkI
client
arb
TileLink
L1 Data
inkI
O
L1 Inst
Coherence Manager
mngr
mngr
TileLink / MemIO Converter
TileLink
arb
TileLinkIO
client
MemIO
§  ROCCIO
- Interface between
Rocket/Accelerator
§  HTIFIO
- Read/Write CSRs
§  TileLinkIO
- Coherence Fabric
§  MemIO
- Simple AXI-like
memory interface
§  HostIO
- Host Interface to
HTIF
9
TileLinkIO
Client
Client
Finish
Grant
Release
Probe
Acquire
Cache
Finish
Grant
Release
Probe
Acquire
Cache
Manager
- TileLinkIO consists of Acquire, Probe, Release, Grant,
Finish
10
UncachedTileLinkIO
Client
Client
Finish
Grant
Acquire
Finish
Grant
Release
Probe
Acquire
Cache
Manager
- UncachedTileLinkIO consists of Acquire, Grant, Finish
- Convertors for TileLinkIO/UncachedTileLinkIO in uncore
library
11
MemIO
Master
MemReqCmd
Slave
MemReqCmd.valid
MemReqCmd.ready
Decoupled(MemData)
Decoupled(MemResp)
- MemReqCmd consists of addr, rw (write=true), tag
- MemData consists of 128 bit data payload
- MemResp consists of 128 bit data payload, tag
- Decoupled(interface) means an interface with ready/
valid signals
12
ROCCIO
§  Rocket sends
Rocket
Decoupled(Cmd)
Decoupled(Resp)
ROCC
Accel.
§ 
CacheIO
busy
IRQ
§ 
supervisor bit
§ 
§ 
UncachedTileLinkIO
§ 
PTWIO
exception
§ 
coprocessor instruction
via the Cmd interface
Accelerator responds
through Resp interface
Accelerator sends
memory requests to
L1D$ via CacheIO
busy bit for fences
IRQ, S, exception bit
used for virtualization
UncachedTileLinkIO for
instruction cache on
accelerator
PTWIO for page-table
walker ports on
accelerator
13
HTIFIO
HTIF
reset
Tile
core_id
Decoupled(CSRReq)
Decoupled(CSRResp)
Decoupled(IPIReq)
Decoupled(IPIResp)
- reset signal and core_id routed from HTIF (historical
reasons nothing technical)
- CSR Read/Write requests go through CSRReq/CSRResp
- IPI Requests go through IPIReq/IPIResp
- HTIFIO likely to be modified in the near future
14
Rocket Chip C++ Emulator Setup
Rocket Chip
HostIO
RISC-V
Frontend
Server
MemIO
DRAMSim2
15
Rocket Chip FPGA Setup
Rocket Chip
HostIO
MemIO
HostIO/AXI
Convertor
MemIO/AXIHP
Convertor
AXI HP
AXI
AXI Master
AXI HP Slave
ARM
Processing System
RISC-V Frontend Server
DDR3
DRAM
16
Rocket Chip Berkeley Test Chip Setup
MemIO
Rocket Chip
MemIO
Serializer
HostIO
M
em
Se
HostIO/AXI
Convertor
MemIO
lIO
MemIO
Deserializer
AXI HP
AXI
AXI Master
MemIO/AXIHP
Convertor
ria
AXI HP Slave
ARM
Processing System
RISC-V Frontend Server
DDR3
DRAM
17
Rocket Chip “SoC” Setup
Interrupts
Rocket Chip
TiileLinkIO
Devices
Uncached
TiileLinkIO
MemIO
LPDDR3
Memory Controller
Me
mIO
LPDDR3
DRAM
18
Who should use the Rocket Chip Generator?
People who would like to develop …
Tile
Rocket
Core
Tile
Rocket
Core
ROCC
Accel.
ROCCIO
FPU
HTIF
HTIFIO
ROCC
Accel.
FPU
HostIO
L1 Inst
L1 Data
sets,
ways
sets,
ways
sets,
ways
sets,
ways
client
client
client
client
Tile
L
L1 NetworkO
Lin
Tile
kIO
Li
Tile
nkI
client
arb
TileLink
L1 Data
inkI
O
L1 Inst
Coherence Manager
mngr
mngr
TileLink / MemIO Converter
TileLink
arb
TileLinkIO
client
§  A RISC-V SoC
-  Look into Chisel
parameters
§  New Accelerators
-  Drop in at ROCCIO level
§  Own RISC-V Core
-  Drop in at TileLinkIO level
or MemIO level
§  Own Device
-  Drop in at TileLinkIO or
UncachedTileLinkIO
MemIO
19
New Features: L2$ with Directory Bits
Tile
ROCC
Accel.
ROCC
Accel.
FPU
L1 Inst
L1 Data
L1 Inst
L1 Data
sets,
ways
sets,
ways
sets,
ways
sets,
ways
client
client
client
client
L1 Network
L2Cache
client
arb
ManagerL2Cache
L2CacheCoherence
L2Cache
L2Cache
mngr
mngr
mngr
mngr
mngr
sets,
ways
sets,
ways
sets,
ways
sets,
ways
sets,
ways
client
client
client
client
client
mngr
§  Shared L2$ with
TileLink / MemIO Converter
TileLink
FPU
Rocket
Core
HTIF
multiple banks
§  Each L2$ will act as a
coherence manager
with directory bits
(snoop filter)
§  These caches can be
composed to build
outer-level caches
such as an L3$
TileLink
Rocket
Core
Tile
20
New Features: ROCC interfaces with L2$
Tile
ROCC
Accel.
FPU
L1 Inst
L1 Data
L1 Inst
L1 Data
sets,
ways
sets,
ways
sets,
ways
client
client
client
ROCC
Accel.
client
L1 Network
L2Cache
to the L2$ to
address more data
Rocket
Core
client
arb
ManagerL2Cache
L2CacheCoherence
L2Cache
L2Cache
mngr
mngr
mngr
mngr
mngr
sets,
ways
sets,
ways
sets,
ways
sets,
ways
sets,
ways
client
client
client
client
client
mngr
TileLink / MemIO Converter
TileLink
FPU
§  ROCC talks directly
HTIF
TileLink
Rocket
Core
Tile
21
New Features on the Deck
§  Dual-issue Rocket Core
§  Hwacha Vector Unit (checkout hwacha.org)
§  Dump MemIO and use AXI
22