How NUMA Balancing Moves KVM Guests - Linux

Transcription

How NUMA Balancing Moves KVM Guests - Linux
How NUMA Balancing Moves KVM Guests
Takuya Yoshikawa, Mariko Onishi
NTT OSS Center
Copyright©2014 NTT Corp. All Rights Reserved.
Agenda
•
•
•
•
•
•
Introduction
What’s NUMA
Automatic NUMA balancing
NUMA and KVM
Testing basic KVM guests placement
Summary
Copyright©2014 NTT Corp. All Rights Reserved.
1
Introduction
• Why we chose this topic
• Not just interesting…
• Virt management is getting more complicated
•
•
•
Cloud services with many advanced features
VMs can be dynamically deployed
Less static configurations are welcome
Copyright©2014 NTT Corp. All Rights Reserved.
2
The objective of this talk
• Let you know
• Our KVM virtualization use cases
• How NUMA balancing can affect them
• Feedback to the upstream developers
• Expect NUMA balancing to mitigate our
problems in the future
• KVM management software should know NUMA
balancing to some extent
Copyright©2014 NTT Corp. All Rights Reserved.
3
What’s NUMA
• Abbreviation for Non Uniform Memory Access
• Each CPU has memory attached to it
•
Local memory
• Accessing other CPU’s memory is slower
•
Remote memory
• Currently, NUMA is general architecture
• NUMA system example (simple x86 server)
•
Each socket corresponds to one NUMA node
Node 1
Node 0
Processor
memory
Processor
memory
Copyright©2014 NTT Corp. All Rights Reserved.
4
NUMA nodes
• CPUs and memory are divided into nodes
• A node is a physical CPU paired with its local
memory
• Nodes are connected by bus
• Interconnect
Copyright©2014 NTT Corp. All Rights Reserved.
5
Automatic NUMA Balancing
• Kernel places tasks and their memory to get
better performance on NUMA systems
• Kernel tries to achieve this by
•
•
running tasks where their memory is
moving memory to where it’s accessed
See the following presentation slides for more details
Automatic NUMA Balancing -- Rik van Riel, Vinod Chegu
Copyright©2014 NTT Corp. All Rights Reserved.
6
NUMA hinting page faults
• Faults incurred by the kernel to get hints for
NUMA balancing
• Each task’s memory is unmapped periodically
•
•
Period based on run time
Unmapped 256MB at a time
• Used to know which part of memory a task uses
• Can drive page migration
Copyright©2014 NTT Corp. All Rights Reserved.
7
Task grouping
• Related tasks get grouped to improve NUMA
task placement
• Threads in a multi-threaded application process
•
•
KVM guest (QEMU process)
JVM
Copyright©2014 NTT Corp. All Rights Reserved.
8
NUMA and KVM
• KVM virtualization with x86 servers
• NUMA systems are common
• Multiple KVM guests on a host
•
•
•
•
E.g. cloud services
Servers get consolidated
Better resource utilization
May be dynamically deployed
Copyright©2014 NTT Corp. All Rights Reserved.
9
What constitutes KVM VM?
• QEMU process
• VCPU threads
• Memory for guest’s RAM
• Many other things (not focused in this talk)
These are placed by automatic NUMA balancing.
Copyright©2014 NTT Corp. All Rights Reserved.
10
What’s special in the case of KVM?
• VCPU thread’s memory access depends on
• Applications running inside the guest
• How the guest schedules these tasks
What it accesses may suddenly change
if the guest schedules another task on it.
Copyright©2014 NTT Corp. All Rights Reserved.
11
Basic guests placement problem
• Problem
• Each guest can fit into one node
•
•
# VCPU threads < # CPU cores in one node
Guest memory < Node size
• Has a trivial solution
•
Placing each guest into certain node
–
We can do this by pinning VCPU threads manually
Basic but important for many cloud services.
Copyright©2014 NTT Corp. All Rights Reserved.
12
Basic guests placement problem
VM2
VM3
VM1
VM4
VM5
VM3
VM4
VM3
VM5
VM2
VM2
VM1
VM4
VM1
VM5
Note: automatic NUMA balancing is not designed
to achieve this placement, but …
Copyright©2014 NTT Corp. All Rights Reserved.
13
Testing basic KVM guests placement
• Multiple KVM guests on 2-socket x86 server
• Host ran 3.14 kernel
•
3.14 upstream kernel installed on Fedora 19
• Server with two NUMA nodes
•
32GB memory in total
–
•
16GB in each node
16 CPU cores in total
–
8 CPU cores in each node
Copyright©2014 NTT Corp. All Rights Reserved.
14
Test cases
• Multiple KVM guests on the host
• 2 KVM guests with equal size
•
•
(8VCPUs, 16GB mem) * 2
(7VCPUs, 16GB mem) * 2
• 4 KVM guests with equal size
•
•
(4VCPUs, 8GB mem) * 4
(5VCPUs, 8GB mem) * 4
• 4 KVM guests with different sizes
•
(8VCPUs, 16GB mem), (4VCPUs, 8GB mem),
(2VCPUs, 4GB mem), (1VCPU, 2GB mem)
Copyright©2014 NTT Corp. All Rights Reserved.
15
What’s running inside the guest
• Ran the following C program in parallel
• Allocate 896MB of memory and access it
Allocate and access
896MB
Allocate and access
896MB
Allocate and access
896MB
Copyright©2014 NTT Corp. All Rights Reserved.
16
Getting VCPU scheduling information
• We checked how VCPU threads got scheduled
• Use virsh vcpuinfo
• 1 sec interval
Shows where each virtual CPU is running
#virsh vcpuinfo numa-vm2
VCPU:
0
CPU:
14
…
VCPU:
1
CPU:
12
…
VCPU:
2
CPU:
11
…
VCPU:
3
CPU:
10
Copyright©2014 NTT Corp. All Rights Reserved.
17
NUMA hinting faults statistics
• We checked NUMA hinting faults stats
• /proc/vmstat includes
•
numa_hint_faults
–
•
Number of NUMA hinting faults
numa_hint_faults_local
–
Number of hinting faults caused by local access
• 1 sec interval
Calculated convergence rate as:
numa_hint_faults_local / numa_hint_faults
# cat /proc/meminfo
…
numa_hint_faults 13945
numa_hint_faults_local 10586
…
Copyright©2014 NTT Corp. All Rights Reserved.
18
Test results
• How VCPU threads got moved
• Convergence rate
Copyright©2014 NTT Corp. All Rights Reserved.
19
Case 1: (8, 16), (8, 16)
• This seems the easiest
• Two guests with equal size
•
•
8 VCPUs (= # CPU cores in one node)
16GB memory (= size of one node)
Copyright©2014 NTT Corp. All Rights Reserved.
20
Case 1: (8, 16), (8, 16)
Each VM gradually moved to one
node, but not perfect as the
convergence rate shows.
Copyright©2014 NTT Corp. All Rights Reserved.
21
Case 2: (7, 16), (7, 16)
• Relaxing the problem a bit
• Two guests with equal size
•
•
7 VCPUs (= #{CPU cores in one node} - 1)
16GB memory (= size of one node)
Copyright©2014 NTT Corp. All Rights Reserved.
22
Case 2: (7, 16), (7, 16)
Almost perfectly converged
Copyright©2014 NTT Corp. All Rights Reserved.
23
Case 3: (4, 8), (4, 8), (4, 8), (4, 8)
• More than two guests
• Four guests with equal size
•
•
4 VCPUs (= #{CPU cores in one node} / 2)
8GB memory (= {size of one node} / 2)
Copyright©2014 NTT Corp. All Rights Reserved.
24
Case 3: (4, 8), (4, 8), (4, 8), (4, 8)
Each VM gradually
moved to one node, but
some noise still exists.
Copyright©2014 NTT Corp. All Rights Reserved.
25
Case 4: (5, 8), (5, 8), (5, 8), (5, 8)
• Over-committing CPUs a bit
• Four guests with equal size
•
•
5 VCPUs (= #{CPU cores in one node} / 2 + 1)
8GB memory (= {size of one node} / 2)
Copyright©2014 NTT Corp. All Rights Reserved.
26
Case 4: (5, 8), (5, 8), (5, 8), (5, 8)
Similar result as the non over-committed case.
Copyright©2014 NTT Corp. All Rights Reserved.
27
Case 5: (8, 16), (4, 8), (2, 4), (1, 2)
• Non uniform
• Four guests with different sizes
•
•
•
•
(8 VCPUs, 16GB mem)
(4 VCPUs, 8GB mem)
(2 VCPUs, 4GB mem)
(1 VCPUs, 2GB mem)
Copyright©2014 NTT Corp. All Rights Reserved.
28
Case 5: (8, 16), (4, 8), (2, 4), (1, 2)
VM1 (8VCPUs, 16GB mem)
alone did not show any
convergence.
Copyright©2014 NTT Corp. All Rights Reserved.
29
Summary
• Automatic NUMA balancing got the same
results as manually pinning the VCPU threads
for many cases
• For other cases, some manual pinning may
still be needed
• Can we know such cases before running guests?
Copyright©2014 NTT Corp. All Rights Reserved.
30