midterm_present_20141002(scheduling)

Comments

Transcription

midterm_present_20141002(scheduling)
Research on Embedded Hypervisor
Scheduler Techniques
2014/10/02
1
Background

Asymmetric multi-core is becoming
increasing popular over homogeneous
multi-core systems.
◦ An asymmetric multi-core platform consists of
cores with different capabilities.
 ARM: big.LITTLE architecture.
 Qualcomm: asynchronous Symmetrical MultiProcessing (aSMP)
 Nvidia: variable Symmetric Multiprocessing (vSMP)
 …etc.
2
Motivation

Scheduling goals differ between
homogenous and asymmetric multi-core
platforms.
◦ Homogeneous multi-core: load-balancing.
 Distribute workloads evenly in order to obtain
maximum performance.
◦ Asymmetric multi-core: maximize power
efficiency with modest performance sacrifices.
3
Motivation(Cont.)

Need new scheduling strategies for
asymmetric multi-core platform.
◦ The power and computing characteristics
vary from different types of cores.
◦ Take the differences into consideration while
scheduling.
4
Project Goal
Research on the current scheduling algorithms
for homogenous and asymmetric multi-core
architecture.
 Design and implement the hypervisor scheduler
on asymmetric multi-core platform.



Assign virtual cores to physical cores for execution.
Minimize the power consumption with performance
guarantee.
5
Hypervisor Architecture with VMI
Low computing
resource requirement
High computing
resource requirement
GUEST1
[1|0] Task 1
[1|0] Task 2
VM Introspector gathers
task information from
Guest OS
Task 3
Task 4
[0|1]
[0|1]
Android Framework
Android Framework
OS Kernel
OSLinux
Kernel
Linaro
Kernel
Scheduler
Scheduler
VCPU
Modify the CPU mask
of each task according
to the task
information from VMI
GUEST2
VCPU
VCPU
VCPU
VM
Introspector
Task-tob-L vCPU
vCPU
Scheduler
Mapper Hypervisor
ARM
Cortex-A15
ARM
Cortex-A7
Performance
Power-saving
Treat this vCPU as LITTLE
core since tasks with low
computing requirement
are scheduled here.
Hypervisor vCPU
scheduler will schedule big
vCPU to A15, and LITTLE
vCPU to A7.
6
Hypervisor Scheduler

Assigns the virtual cores to physical cores
for execution.
◦ Determines the execution order and amount
of time assigned to each virtual core
according to a scheduling policy.
◦ Xen - credit-based scheduler
◦ KVM - completely fair scheduler
7
Virtual Core Scheduling Problem
For every time period, the hypervisor
scheduler is given a set of virtual cores.
 Given the operating frequency of each
virtual core, the scheduler will generate a
scheduling plan, such that the power
consumption is minimized, and the
performance is guaranteed.

8
Scheduling Plan

Must satisfy three constraints.
◦ Each virtual core should run on each physical
core for a certain amount of time to satisfy
the workload requirement.
◦ A virtual core can run only on a single
physical core at any time.
◦ The virtual core should not switch among
physical cores frequently, so as to reduce the
overheads.
9
Example of A Scheduling Plan
Execution Slice
t100
t4
t3
t2
t1
x
…
V4
x
x
V4
Core0
x
…
x
V3
x
V3
Core1
V4
…
V2 V1 V4 V2
Core2
V1
…
V3 V2 V1 V1
Core3
◦ x: physical core idle
10
Three-phase Solution
[Phase 1] generates the amount of time
each virtual core should run on each
physical core.
 [Phase 2] determines the execution order
of each virtual core on a physical core.
 [Phase 3] exchanges the order of
execution slice in order to reduce the
number of core switching.

11
Phase 1

Given the objective function and the
constraints, we can use integer
programming to find ai,j.
◦ ai,j : the amount of time slices virtual core i
should run on physical core j.
 Divide a time interval into time slices.
◦ Integer programming can find a feasible
solution in a short time when the number of
vCPUs and the number of pCPUs are small
constants.
12
Phase 1(Cont.)

If the relationship between power and
load is linear.
◦ Use greedy instead.
◦ Assign virtual core to the physical core with
the least power/instruction ratio and load
under100%.
13
Phase 2

With the information from phase 1, the
scheduler has to determine the execution
order of each virtual core on each
physical core.
◦ A virtual core cannot appear in two or more
physical core at the same time.
14
Example
t=100
t=0
vCPU0
(50,40,0, 0)
vCPU1
(20,20,20, 20)
vCPU2
(10,10,20, 20)
vCPU3
(10,10,20, 20)
vCPU4
(10,10,10, 10)
vCPU5
(0, 0,10, 10)
Phase 2(Cont.)

We can formulate the problem into an
Open-shop scheduling problem (OSSP).
◦ OSSP with preemption can be solved in
polynomial time. [1]
[1] T. Gonzalez and S. Sahni. Open shop scheduling to minimize finish time. J. ACM, 23(4):665–679,
Oct. 1976.
16
After Phase 1 & 2

After the first two phases, the scheduler
generates a scheduling plan.
◦ x: physical core idle
Execution Slice
t100
t4
t3
t2
t1
x
…
V4
x
x
V4
Core0
x
…
x
V3
x
V3
Core1
V4
…
V2 V1 V4 V2
Core2
V1
…
V3 V2 V1 V1
Core3
17
Phase 3
Migrating tasks between cores incurs
overhead.
 Reduce the overhead by exchanging the
order to minimize the number of core
switching.

18
Number of Switching Minimization
Problem

Given a scheduling plan, we want to find
an order of the execution slice, such that
the cost is minimized.
◦ An NPC problem
 Reduce from the Hamilton Path Problem.
◦ Propose a greedy heuristic.
19
Example
#switching = 0
p1
1
2
3
3
1
2
x
2
3
x
x
1
4
3
2
2
1
3
p2
p3
Example
#switching = 0
p1
1
2
3
0
3
1
2
0
x
2
3
0
x
x
1
0
4
3
2
0
2
1
3
0
p2
p3
Example
1
2
3
3
1
2
x
2
3
4
3
2
2
1
3
#switching = 0
p1
p2
p3
x
x
1
t1
Example
1
2
3
1
3
1
2
1
x
2
3
0
4
3
2
0
2
1
3
1
#switching = 0
p1
p2
p3
x
x
1
t1
Example
#switching = 0
p1
p2
p3
1
2
3
3
1
2
4
3
2
x
2
3
t2
2
1
3
x
x
1
t1
Example
#switching = 0
p1
p2
p3
1
2
3
1
3
1
2
3
4
3
2
2
x
2
3
t2
2
1
3
2
x
x
1
t1
Example
3
1
#switching = 1
p1
p2
p3
1
2
3
t3
2
4
3
2
x
2
3
t2
2
1
3
x
x
1
t1
Example
#switching = 7
p1
p2
p3
2
1
3
t6
3
1
2
t5
4
3
2
t4
1
2
3
t3
x
2
3
t2
x
x
1
t1
Evaluation
Conduct simulations to compare the
power consumption of our asymmetryaware scheduler with that of a creditbased scheduler.
 Compare the numbers of core switching
from our greedy heuristic and that from
an optimal solution.

28
Evaluation
Conduct simulations to compare the
power consumption of our asymmetryaware scheduler with that of a creditbased scheduler.
 Compare the numbers of core switching
from our greedy heuristic and that from
an optimal solution.

29
Environment

Two types of physical cores
◦ power-hunger “big” cores
 frequency: 1600MHz
◦ power-efficient “little” cores
 frequency: 600MHz
◦ The DVFS mechanism is disabled.
30
Power Model
Relation between power consumption,
core frequency, and load.
◦ bzip2
2.5
2
250MHz
功耗(Watt)

600MHz
1.5
8000MHz
1600MHz
1
Linear (250MHz)
Linear (600MHz)
Linear (8000MHz)
0.5
Linear (1600MHz)
0
0
20
40
60
Loading(%)
80
100
120
Scenario I – 2 Big and 2 Little

Dual-core VM.

Two sets of input:
◦ Case 1: Both VMs with light workloads.
 250MHz for each virtual core.
◦ Case 2: One VM with heavy workloads, the
other with modest workloads.
 Heavy:1200MHz for each virtual core
 Modest:600MHz for each virtual core.
32
Scenario I - Results
Power(Watt)
Case 1
Light-load VMs
Asymmetry-aware
0.295
Credit-based
0.683
Case 2
Heavy-load VM +
Modest-load VM
Asymmetry-aware
2.382
Credit-based
2.491
◦ Case 1: asymmetry-aware method is about
43.2% of that of credit-based method.
◦ Case 2:asymmetry-aware method uses 95.6%
of energy used by the credit-base method.
33
Scenario 2 – 4 Big and 4 Little
Quad-core VM.
 Three cases

VM1
VM2
VM3
Case 1
Light-load
All 250 MHz
All 250 MHz
All 250 MHz
Case 2
Modest-load
All 600MHz
All 600 MHz
All 250 MHz
Case 3
Heavy-load
All 1600MHz
All 1600MHz
All 1600MHz
34
Scenario 2 - Results
Power(Watt)
Savings
Case 1
Light-load
Asymmetry-aware
1.205
Credit-based
2.049
Case 2
Modest-Load
Asymmetry-aware
3.524
Credit-based
3.960
Case 3*
Heavy-load
Asymmetry-aware
6.009
Credit-based
6.009
41.2%
11.1%
0%
 In case 3, the loading of physical cores are 100%
using both methods.
 Cannot save power if the computing resources are not
enough.
35
Evaluation
Conduct simulations to compare the
power consumption of our asymmetryaware scheduler with that of a creditbased scheduler.
 Compare the numbers of core switching
from our greedy heuristic and that from
an optimal solution.

36
Setting

25 sets of input
◦ 4 physical cores, 12 virtual cores, 24 distinct
execution slices.

Optimal solution
◦ Enumerates all possible permutations of the
execution slices.
◦ Use A* search to reduce the search space.
37
Evaluation Result
Greedy Heuristic
A* Search
Average number of
switching
31.2
27.7
Average execution
time
0.006 seconds
10+ minutes
38
XEN HYPERVISOR
SCHEDULER:
CODE STUDY
39
Xen Hypervisor

Scheduler:
◦ xen/common/





schedule.c
sched_credit.c
sched_credit2.c
sched_sedf.c
sched_arinc653.c
40
xen/common/schedule.c

Generic CPU scheduling code
◦ implements support functionality for the Xen
scheduler API.
◦ scheduler: default to credit-base scheduler

static void schedule(void)
◦ de-schedule the current domain.
◦ pick a new domain.
41
xen/common/sched_credit.c

Credit-based SMP CPU scheduler

static struct task_slice csched_schedule;
◦ Implementation of credit-base scheduling.
◦ SMP Load balance.
 If the next highest priority local runnable VCPU has
already eaten through its credits, look on other
PCPUs to see if we have more urgent work.
42
xen/common/sched_credit2.c

Credit-based SMP CPU scheduler
◦ Based on an earlier version.

static struct task_slice csched2_schedule;
◦ Select next runnable local VCPU (i.e. top of
local run queue).

static void balance_load(const struct
scheduler *ops, int cpu, s_time_t now);
43
Scheduling Steps
Xen call do_schedule() of current
scheduler on each physical CPU(PCPU).
 Scheduler selects a virtual CPU(VCPU)
from run queue, and return it to Xen
hypervisor.
 Xen hypervisor deploy the VCPU to
current PCPU.

44
Adding Our Scheduler
Our scheduler periodically generates a
scheduling plan.
 Organize the run queue of each physical
core according to the scheduling plan.
 Xen hypervisor assigns VCPU to PCPU
according to the run queue.

45
Current Status
We propose a three-phase solution for
generating a scheduling plan on
asymmetric multi-core platform.
 Our simulation results show that the
asymmetry-aware strategy results in a
potential energy savings of up to 56.8%
against the credit-based method.
 On going: implement the solution into
Xen hypervisor.

46
Questions or Comments?
47

Similar documents