FPGA Accelerator Virtualization in an OpenPOWER cloud

Transcription

FPGA Accelerator Virtualization in an OpenPOWER cloud
FPGA Accelerator Virtualization in an
OpenPOWER cloud
Fei Chen, Yonghua Lin
IBM China Research Lab
Trend of Acceleration Technology
Acceleration in Cloud is Taking Off
• Used FPGA to accelerate
Bing search on 1632 servers
• A 6*8 2D-torus design for
high throughput network
topology
• Storage >2000PB,
processing 10~100PB/day,
log 100TB~1PB/day
• Using FPGA for storage
controller
• Used GPU for Deep
Learning
Acceleration programming becomes
hot topics
OpenCL, Sumatra (Oracle), LiMe (IBM), …
Appliance
Acceleration in Cloud
 TB scale problem
 Acceleration
architecture for single
node
 Dedicate acceleration
resource
 PB scale problem
 Architecture for
thousands of nodes
 Proprietary accelerator
framework
 Open framework to
enable accelerator sharing
& integration
 Close innovation model
 Open innovation model
through eco-system
 Shared acceleration
resources
Innovations Required
• Scalable acceleration fabric
• Open framework for accelerator integration and sharing
• Accelerator resource abstraction, re-configuration and
scheduling in cloud
• Modeling & advisory tool for dynamic acceleration system
composition
2
Resources on FPGA are huge
•
Resources on FPGA
– Programmable resources
•Logic cells (LCs)
•DSP slices: Fixed/floating-point
•On-chip memory blocks
•Clock resources
– Miscellaneous periphrals
(Xilinx Virtex as an example)
•DDR3 controllers
•PCIe Gen3 interfaces
•10G Ethernet controllers
•...
– Hard processor core
•PowerPC: Xilinx Virtex-5 FXT
•ARM: Xilinx Zynq-7000
•Atom: Intel + Altera E600C
FPGA Capacity Trends
Xilinx Virtex UltraScale 440 FPGA, the largestscale FPGA in the world delivered in2014, consists of
more than 4 million logical cells (LCs).
Using this chip, we can build up to 250 AES
crypto accelerators, or 520 ARM7 processor cores.
3
FPGA on Cloud – Double Win
 Cloud benefits from FPGA
•
•
Performance
Power consumption
 FPGA benefits from cloud
•
•
•
Lower the cost
• Tenants need not purchase and maintain FPGAs.
• Tenants pay for accelerators only when using them.
More applications
• High FPGA utilization
Ecosystem
• Grow with the cloud ecosystem
4
Motivation of Accelerator/FPGA as
Service in Cloud
Enable the manageability
Reduce system cost
Can FPGA (pool) be managed in data
center?
How to reduce system cost through sharing FPGA
resources among applications, VMs, containers?
App.
App.
ID, location, reconfiguration,
performance, etc.
VM
App.
VM
Container
VM
Container Container
Dynamic, Flexible, Priority controllable
Reduce deployment complexity
How to orchestrate FPGA/accelerator resources
with VM, network and storage resources easily,
according to the need of application?
host
Bring high value of cloud infrastructure
Could we generate new value for IaaS ?
Container
Network
VM
Storage
5
FPGA Ecosystem in Cloud
Accelerator Market Place
•Companies or
individual
developers could
upload and sale
their accelerator
through market
place (e.g. on
OpenPOWER)
Accelerator Cloudify Tool
(in plan)
• Accelerator market place
will do the cloudify for
accelerator, through
integrating the service layer
with accelerator and
compilation
•All the integration,
compilation, test, verification
and certification will be done
in automatic way.
•Pay for the usage of accelerator,
rather than license and hardware
•Get the accelerator service in selfservice way
•Use the single HEAT orchestrator
to finish the workload deployment
with accelerator, together with
compute, network, and storage.
HEAT orchestrator
Accelerator
developers
•Cloud service provider will
buy the “Cloudified”
accelerators on market place
•Create the Service Category
for FPGA accelerator, and
sale on the cloud as service
Cloud
Tenants
Compute
Network
Storage
FPGA
accelerator
POWER8/PowerKVM
FPGA
cards
Cloud Service Provider
OpenStack extension
for accelerator service
Service logics
for accelerator
service in FPGA
Accelerator as Service on SuperVessel
• Accelerator MarketPlace for developers to
upload and compile the accelerators for
SuperVessel POWER cloud
• Allow user to request different size of cluster
Fig.2 Cloud users could apply
accelerator when creating VM
Fig.1 Accelerator MarketPlace for SuperVessel Cloud
7
Enabling FPGA virtualization in
OpenStack cloud
KVM-based Compute Node
Utilities
HW
Modules
Virtual
FPGA
Guest
OS
Virtual
Machine
Guest Process
Bitfile
Library
APIs
Guest
Control
Module
Library
Guest
Driver
Virtual
Machine
Openstack
Agent
Utilities
APIs
Hypervisor
Docker-based Compute Node
Host Control Module
Openstack
Agent
Applications
APIs
Driver
Images
Library
Host Driver
Kernel
Utilities
APIs
Driver
Images
Control Module / Driver
……
Hardware
DRAM
Openstack-based Cloud
Tenant
Control Node
Tenant
Scheduler
Service Logic
FPGA
Compute
Node
FPGA
Compute Node
Hardware
CAPI
FPGA
Components for
FPGA framework
Enhanced OpenStack
8
FPGA accelerator as Services online on
SuperVessel Cloud
Try it here:
www.ptopenlab.com
Super Marketplace
(Online)
SuperVessel
Cloud
Service
1. VM and
container service
2. Storage service
3. Network service
4.Accelerator
as service
(Preparing)
SuperVessel
Big Data and HPC
Service
1. Big Data:
MapReduce
(Symphony),
SPARK
2. Performance
tuning service
OpenPOWER
Enablement
Service
1. X-to-P migration
2. AutoPort Tool
3. OpenPOWER
new system test
service
5. Image service
(Online)
Super
Class
Service
1. On-line video
courses
2. Teacher course
management
3. User
contribution
management
(Preparing)
Super Project
Team
Service
1. Project
management
service
2. DevOps
automation
Docker
SuperVessel Cloud Infrastructure
9
Storage
IBM POWER servers
OpenPOWER server
FPGA/GPU
Thanks!
10
FPGA Implementation
PCIe / CAPI
High Bandwidth I/O
Service Logic
Registers
C
D
A
B
Job Queue
Reconfig
Controller
Job Scheduler
DMA Engine
FPGA chip
Context Controller
Eth
DRAM
Computer
Su
bl
ay
Su
er
bl
ay
er
C
D
fo
rm
Pl
at
Hardware
vi
ce
OS
Se
r
Apps
B
A
……
Switch
Security Controller
A
User Sublayer
B
C
D
: Shared FPGA resource
Service Sublayer : Job Queue, Switch, …
Platform Sublayer : DRAM, PCIe, ICAP, …
The FPGA subsystem is designed as a computer system.
11
System Implementation
3. VM request
(with accelerator)
Compute Node
Compiler
Scheduler
1. Accelerator source
code package
5. Launch VM
Control Node
Dashboard
Compute
4. acc_file
Glance
2. image_file
Compiler
• Control Node: Nova, Glance, Horizon, Neutron, Swift
• Compute Node: Nova Compute
• Compiler: FPGA incremental compiler environment
12
Evaluation
Host
(2) Management – Bandwidth Control
Process 0
One VM
1600
Bandwidth (BM/s)
1200
1100
1000
Process 1
Total
Reduce VM
bandwidth
1200
Increase P0
bandwidth
800
Process 0
Reduce P0
bandwidth
400
1194 MB/s
1000
100
25 MB/s
10
1
1
2
3
4
5
6
Number of Processes
7
8
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61
Time
Host: All processes run in host environment
One VM: All processes run in one VM
VMs : Each process runs in one VM
AESs: Each VM uses one independent AES accelerator
Average Latency (ms)
1
(1)
•
•
•
•
11
21
31
0
900
Process 1
10000
41
51
61
Time (second)
4
3
2
2.3ms
1
0.21ms
0.22ms
0
1
11
21
31
1
11
21
31
41
51
61
Time (second)
80%
60%
CV
Total Bandwidth(MB/s)
1300
(3) Management – Priority Control
Bandwidth (MB/s)
(1) Accelerator Sharing Evaluation
40%
(3)




20%
Process 0 : 256KB payload, 100 times per second
Process 1 : 4MB payload, best effort use.
Same priority during second 1 ~ 38.
Raise process 0 priority at second 38.
0%
Process 1 begin
41
51
61
Time (second)
Priority control
13