Prateek Sharma, Stephen Lee, Tian Guo, David

Transcription

Prateek Sharma, Stephen Lee, Tian Guo, David
SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market
Prateek Sharma, Stephen Lee, Tian Guo, David Irwin, Prashant Shenoy
University of Massachusetts — Amherst
O UR S OLUTION : D ERIVATIVE C LOUDS
• System to manage mix of spot and on-demand instances
• Intermediate layer between users and IaaS cloud provider
• Provide similar interface to users as provided by IaaS (Virtual
Machines)
• Transparently migrate VMs between pools
User VMs
Cloud servers have different costs and availability tradeoffs:
On-demand Servers:
• Fixed price per unit time
• Non-revocable
On-demand pool
Spot Instance Pool
Migrate
Spot Servers:
• Variable prices based on market conditions
• Revocable =⇒ lower availability
• Prices tend to be lower than On-demand servers
• Allow cloud provider to sell surplus capacity
• 2nd price auction determines spot price
• (Spot price > Bid) =⇒ termination
• Small termination warning(~2 minutes)
0.6
0.4
0.6
Spot−price
Ondemand−price ratio
0.8
Spot instances are
really cheap!
1.0
1. VM dirty memory pages transferred to a backup server continuously, incrementally, and asynchronously
2. Backup server able to support multiple (~50) VMs
3. Memory checkpoint lazily restored from backup server if spot
instance terminates — fetch page on first access
On-demand Instance
Spot Instance
0.10
Instance Terminated
Price ($/hr)
0.08
Spot price
bid price
0.06
User VM
Xen Blanket
User VM
Xen Blanket
Linux Kernel (Dom 0)
Linux Kernel (Dom 0)
0.04
Write Dirty Memory Pages
0.02
0.00
0
10
20
30
Time
40
20
30
40
50
50
Lazily Restore VM pages
Backup Server
Xen Live migration
Unoptimized Full restore
SpotCheck with Full restore
SpotCheck with Lazy restore
0.03
0.02
0.01
0.00
1-Pool
2-Pools
4-Pools
4-Pools
Equal Distributed Cost
4-Pools
Stability
Save 80% on your EC2 bill
AVAILABILITY
• Unavailability is due to migrations from spot to on-demand
• Small downtime(~20 seconds) during migration
• Due to latency of IaaS operations detaching & reattaching network & storage
Xen Live migration
Unoptimized Full restore
0.20
Unavailability (%)
Availability CDF
0.8
0.2
10
0.04
B OUNDED TIME VM MIGRATION
1.0
0.0
50
TPC-W response time
0.05
Average cost per hour ($)
1. Ability to run interactive, disruption-intolerant applications
2. Not lose application state
3. Provide servers to customers at low cost
4. Not adversely impact application performance
Run on spot when possible, move to on-demand when evicted
Bounded time migration : VM migrates within specified time
0.2
40
35
30
25
20
15
10
5
0 01
20
30
40
Num. VMs per backup server
Expected Cost = 0.2 × On-Demand = $ 0.014 / hour
SpotCheck : derivative cloud on spot and on-demand instances
S POT I NSTANCES
m3.medium
m3.large
m3.xlarge
m3.2xlarge
10
40 VMs can share one backup server
O UR S YSTEM : S POT C HECK
Run interactive applications on mix of Spot & On-demand servers
0.4
SpecJBB Throughput
C OST
P ROBLEM S TATEMENT
0.0
User 2
12000
10000
8000
6000
4000
2000
0 01
Response time (ms)
• Infrastructure as a Service (IaaS)
• Examples : Amazon EC2, Google Compute Engine, Rackspace
• IaaS rents out physical or virtual computing resources
P ERFORMANCE & S CALABILITY
Throughput (bops)
I AA S C LOUDS
SpotCheck with Full restore
SpotCheck with Lazy restore
0.15
0.10
0.05
0.00
1-Pool
2-Pools
4-Pools
4-Pools
Equal Distributed Cost
99.9989% Availability
4-Pools
Stability