Bixby - Hot Chips

Transcription

Bixby - Hot Chips
1
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
Bixby: the Scalability and Coherence
Directory ASIC in Oracle's Highly Scalable
Enterprise Systems
Thomas Wicki and Jürgen Schulz
Senior Principal Hardware Engineers, Microelectronics
Hot Chips 25 – August 25-27, 2013
The following is intended to outline our general product
direction. It is intended for information purposes only, and
may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality,
and should not be relied upon in making purchasing
decisions. The development, release, and timing of any
features or functionality described for Oracle’s products
remains at the sole discretion of Oracle.
3
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
Outline
 Motivation and Design Objectives
 M5 System and Beyond
 System RAS Features
 Implementation Details
 Debug and DFT Features
 Summary
4
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
Motivation
SL
SL
added to M5 and M6
– Bixby ASICs are needed
(Glued system)
M6
M6
SL
M6
M6
SL
M6
SL
5
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
SL
CL
-
– Scalability Links (SL) were
SL
T
scale up to 8 processors using
Coherence Links (CL)
(Glueless system)
 To enable systems to scale
beyond 8 processors:
M6
M6
 M5 and M6’s direct interconnects
M6
SL
Bixby Design Objectives
 Scalable up to 96 processors
 Communication switch
between 8-processor SMPs
Large
System
Scaling
Coherence
Directory &
Processing
 Directory for L3 caches of
all processors
 Multi-generation support
 Enabling mixed processor
systems
Bixby
Enterprise
System
Focus
6
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
 Enterprise-Class RAS feature set
 High bandwidth, low latency
Challenges and Trade-Offs
7
Challenge
Solution
Directory Size
Large directory size requirement
Scale up number of Bixbys with system
size
Directory
Width
Massive number of L3 cache ways x
number of processors per look-up
Pipeline look-ups
Switch Size
24 x 24 crossbar efficiency
Overprovision switching bandwidth
Shared
Resources
Some resources shared by multiple
hardware domains
Associate errors with single domain and
clean up shared resources after error
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
Outline
 Motivation and Design Objectives
 M5 System and Beyond
 System RAS Features
 Implementation Details
 Debug and DFT Features
 Summary
8
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
Oracle’s M5-32 System
 32 M5 SPARC processors
 12 Bixbys
 4 physical (hardware) domains
 3.1TB/s payload coherence
bandwidth
 1.5TB/s payload scalability
bandwidth
9
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
Bixbys
M5-32 System Coherence Interconnect
 Coherence Links (CL)
– 12 lanes per direction
 Scalability Links (SL)
– 4 lanes per direction
 12Gbps per lane
 7 CLs + 6 SLs per processor
 16 SLs per Bixby
10
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
Scalability Link Bandwidth
11
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
Outline
 Motivation and Design Objectives
 M5 System and Beyond
 System RAS Features
 Implementation Details
 Debug and DFT Features
 Summary
12
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
Physical (Hardware) Domain Support
S S S S
S S S S
Domain A

✓
BX
BX
BX
BX
S S S S
S S S S
Domain B
13
S S S S
S S S S
Domain A
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
S S S S
S S S S
Domain C
 Up to 12 physical domains
 Dynamically configurable by
Service Processor
 Packet filtering and Physical
Address fencing
 Errors resolved to physical
domain
 Per-domain Cease Operation
support
5-of-6 Redundancy Mode
Normal configuration:
BX BX BX BX BX BX
Failover configuration:

BX BX BX BX BX BX
14
 System can boot with any
BX BX BX BX BX BX
Still
Boots!

BX BX BX BX BX BX
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
5 out of each group of 6
Bixbys
 Increases availability since
system can be used until
service is performed
Hot Maintenance Support
S S S S
S S S S
Domain A
S S S S
S S S S
Domain A
 In a running system, failing
Bixby or SMP can be:
BIST
BX
BX
BX
BX
– Replaced
IBIST
– Tested
– Re-integrated
S S S S
S S S S
Domain B
15
BIST
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
S S S S
S S S S
New SMP
Link Protection
Chip A
 CRC check and auto retry
– Replay, if CRC error detected
LFU
– Guaranteed lane failure detection
 Built in PRBS testing during link training
 Auto link re-initialization
– Re-training link, if Replay unsuccessful
– No Service Processor intervention required
SL
LFU
 Auto single lane failover (per direction)
– Based on PRBS testing
– No Service Processor intervention required
16
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
Chip B
Outline
 Motivation and Design Objectives
 M5 System and Beyond
 System RAS Features
 Implementation Details
 Debug and DFT Features
 Summary
17
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
Implementation Details
 96 Tx + 96 Rx 16Gb/s Long-Reach AC coupled SerDes
 Package: 45mm x 45mm 1677-pin FPBGA (~500 signal IO)
 Process: 28nm 10 layer metal 0.85V ASIC
 ~160 Mbits SRAM (~20MB Tags)
 ~70M Gates (nand2 equivalent)
18
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
Functional Blocks
Forward Packet
SerDes Links (24 x4)
Link Framing Units (LFU)
LQU Input Queues (IQU)
ASU Crossbar In (AXI)
Forwarding
Crossbar
(FXU)
ASU
ASU
Address Serialization Units
ASU ASU ASU ASU
ASU Crossbar Out (AXO)
LQU Output Queues (OQU)
Link Framing Units (LFU)
SerDes Links (24 x4)
19
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
ASU
ASU
Functional Blocks
SerDes Links (24 x4)
Directory
Lookup
Link Framing Units (LFU)
LQU Input Queues (IQU)
ASU Crossbar In (AXI)
Forwarding
Crossbar
(FXU)
ASU
ASU
Address Serialization Units
ASU ASU ASU ASU
ASU Crossbar Out (AXO)
LQU Output Queues (OQU)
Link Framing Units (LFU)
SerDes Links (24 x4)
20
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
ASU
ASU
Floorplan
CRC/Retry
CRC/Retry
 SEC-DED on all major datapaths
21
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
CRC/Retry
CRC/Retry
CRC/Retry CRC/Retry
CRC/Retry
CRC/Retry
ECC for Datapath
Parity for Control
 Parity on control signals
 Custom top level wires on top two
routing layers
– Critical nets implemented by
Buffer on route
– Faster ps/mm
 PVT invariant clock distribution
Link Framing Units (LFU)
Link Queuing Unit (LQU)
LQU Input Queues (IQU)
ASU Crossbar In (AXI)
Address Serialization Units
Forwarding
Crossbar ASU ASU ASU ASU ASU ASU ASU ASU
(FXU)
ASU Crossbar Out (AXO)
LQU Output Queues (OQU)
Link Framing Units (LFU)
 Each manages an x4 Scalability Link
 Provides queuing support for multiple Virtual Channels
 Each LQU is part of a single physical domain
 SEC-DED on Link FIFOs (RAM based)
22
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
Link Framing Units (LFU)
Cross Bar Units (XBU)
LQU Input Queues (IQU)
ASU Crossbar In (AXI)
Address Serialization Units
Forwarding
Crossbar ASU ASU ASU ASU ASU ASU ASU ASU
(FXU)
ASU Crossbar Out (AXO)
 A separate data path forwards traffic and is sized to
LQU Output Queues (OQU)
Link Framing Units (LFU)
account for any Head-of-Line blocking inefficiencies
– FXU: 24in x 24out (2-cycle packet)
Switch Efficiencies
– AXI: 24in x 8out
1
0.8
– Switch fabrics implemented as
custom layout hard macros
 Bixby fully sustains mixed request and
% Efficiency
– AXO: 16in x 24out
0.6
Output
Input
0.4
0.2
0
data traffic at full line rate
FXU
AXI
 FXU is single domain, AXI/AXO are multi-domain logic
 Flow through SEC-DED, parity on routing control
23
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
AXO
Address Serialization Unit (ASU)
Link Framing Units (LFU)
LQU Input Queues (IQU)
ASU Crossbar In (AXI)
Address Serialization Units
Forwarding
Crossbar ASU ASU ASU ASU ASU ASU ASU ASU
(FXU)








24
ASU Crossbar Out (AXO)
Partitioned into eight parallel units
Each directory unit can compare and process up to
22,656 bits per cycle (total 181,248 bits per chip per cycle)
0.5 request lookups per cycle (total 4 per chip)
Flow-through correction on incoming packets and tag
directory contents
Retry on directory tag staging flops
Supports up to 12 hardware domains with error steering
Per domain Built-In Self Initialization (BISI)
Tag RAM scrubber
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
LQU Output Queues (OQU)
Link Framing Units (LFU)
Outline
 Motivation and Design Objectives
 M5 System and Beyond
 System RAS Features
 Implementation Details
 Debug and DFT Features
 Summary
25
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
Debug and DFT Features
 Monitoring link at full signaling speed is challenging
– Two internal rings to allow capturing packet flow in ingress or egress
direction
– Internal triggering logic and RAM to store packet flow
– External DDR interface to allow capturing packet flow on Logic Analyzer
 In-system test features:
– MemBIST
– InterconnectBIST
– ASU tag RAM can be read, written or read-modify-write
26
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
Outline
 Motivation and Design Objectives
 M5 System and Beyond
 System RAS Features
 Implementation Details
 Debug and DFT Features
 Summary
27
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
Bixby Design Objectives Accomplished
 Scales to 96 processors
 Provides communication
switching between
8-processor SMPs
 Directory for L3 caches of
Large
System
Scaling
Coherence
Directory &
Processing
all processors
 Multi-generation support
 Enabling mixed processor
systems
Bixby
Enterprise
System
Focus
28
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
 Enterprise-Class RAS feature set
 High bandwidth, low latency
invisible
Bixby Scalability ASIC
 Hosts L3 cache directory
 Processes coherence requests
 Includes comprehensive Enterprise-Class RAS
 Provides extensive debug and DFT features
 Pushes ASIC boundaries: technology,
complexity, die size, SerDes count,
power, …
 Flexible scaling up to 12x
8
2
29
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
4
16
Glueless
systems
24
96
64
48
32
Glued
systems
Q&A
30
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
References
• White Paper:
• SPARC M5-32 Server Architecture
http://www.oracle.com/technetwork/server-storage/sun-sparcenterprise/documentation/o13-024-m5-32-architecture-1920556.pdf
• Data Sheets:
• SPARC M5-32 Server
http://www.oracle.com/us/products/servers-storage/servers/sparc/oracle-sparc/m532/sparc-m5-32-ds-1922642.pdf
• SPARC M5 Processor
http://www.oracle.com/us/products/servers-storage/servers/sparc/oracle-sparc/m5-32/m5processor-ds-1922646.pdf
31
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
Glossary
 BISI – Built-In Self Initialization
 BIST – Built-In Self Test
 BX – Bixby ASIC
 CL – Coherence Link
 CRC – Cyclic Redundancy Check
 IBIST – Interconnect Built-In Self Test
 MemBIST – Memory Built-In Self Test
 PRBS – Pseudo-Random Binary Sequence
 PVT – Process Voltage Temperature
 RAS – Reliability Availability Serviceability
 SEC-DED – Single-bit Error Correction - Double-bit Error Detection
 SL – Scalability Link
 SMP – Shared Memory Processor
 SPARC - Scalable Processor ARChitecture
32
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
33
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.
34
Copyright © 2013 Oracle and/or its affiliates. All rights reserved.