How NUMA Balancing Moves KVM Guests - Linux
Transcription
How NUMA Balancing Moves KVM Guests - Linux
How NUMA Balancing Moves KVM Guests Takuya Yoshikawa, Mariko Onishi NTT OSS Center Copyright©2014 NTT Corp. All Rights Reserved. Agenda • • • • • • Introduction What’s NUMA Automatic NUMA balancing NUMA and KVM Testing basic KVM guests placement Summary Copyright©2014 NTT Corp. All Rights Reserved. 1 Introduction • Why we chose this topic • Not just interesting… • Virt management is getting more complicated • • • Cloud services with many advanced features VMs can be dynamically deployed Less static configurations are welcome Copyright©2014 NTT Corp. All Rights Reserved. 2 The objective of this talk • Let you know • Our KVM virtualization use cases • How NUMA balancing can affect them • Feedback to the upstream developers • Expect NUMA balancing to mitigate our problems in the future • KVM management software should know NUMA balancing to some extent Copyright©2014 NTT Corp. All Rights Reserved. 3 What’s NUMA • Abbreviation for Non Uniform Memory Access • Each CPU has memory attached to it • Local memory • Accessing other CPU’s memory is slower • Remote memory • Currently, NUMA is general architecture • NUMA system example (simple x86 server) • Each socket corresponds to one NUMA node Node 1 Node 0 Processor memory Processor memory Copyright©2014 NTT Corp. All Rights Reserved. 4 NUMA nodes • CPUs and memory are divided into nodes • A node is a physical CPU paired with its local memory • Nodes are connected by bus • Interconnect Copyright©2014 NTT Corp. All Rights Reserved. 5 Automatic NUMA Balancing • Kernel places tasks and their memory to get better performance on NUMA systems • Kernel tries to achieve this by • • running tasks where their memory is moving memory to where it’s accessed See the following presentation slides for more details Automatic NUMA Balancing -- Rik van Riel, Vinod Chegu Copyright©2014 NTT Corp. All Rights Reserved. 6 NUMA hinting page faults • Faults incurred by the kernel to get hints for NUMA balancing • Each task’s memory is unmapped periodically • • Period based on run time Unmapped 256MB at a time • Used to know which part of memory a task uses • Can drive page migration Copyright©2014 NTT Corp. All Rights Reserved. 7 Task grouping • Related tasks get grouped to improve NUMA task placement • Threads in a multi-threaded application process • • KVM guest (QEMU process) JVM Copyright©2014 NTT Corp. All Rights Reserved. 8 NUMA and KVM • KVM virtualization with x86 servers • NUMA systems are common • Multiple KVM guests on a host • • • • E.g. cloud services Servers get consolidated Better resource utilization May be dynamically deployed Copyright©2014 NTT Corp. All Rights Reserved. 9 What constitutes KVM VM? • QEMU process • VCPU threads • Memory for guest’s RAM • Many other things (not focused in this talk) These are placed by automatic NUMA balancing. Copyright©2014 NTT Corp. All Rights Reserved. 10 What’s special in the case of KVM? • VCPU thread’s memory access depends on • Applications running inside the guest • How the guest schedules these tasks What it accesses may suddenly change if the guest schedules another task on it. Copyright©2014 NTT Corp. All Rights Reserved. 11 Basic guests placement problem • Problem • Each guest can fit into one node • • # VCPU threads < # CPU cores in one node Guest memory < Node size • Has a trivial solution • Placing each guest into certain node – We can do this by pinning VCPU threads manually Basic but important for many cloud services. Copyright©2014 NTT Corp. All Rights Reserved. 12 Basic guests placement problem VM2 VM3 VM1 VM4 VM5 VM3 VM4 VM3 VM5 VM2 VM2 VM1 VM4 VM1 VM5 Note: automatic NUMA balancing is not designed to achieve this placement, but … Copyright©2014 NTT Corp. All Rights Reserved. 13 Testing basic KVM guests placement • Multiple KVM guests on 2-socket x86 server • Host ran 3.14 kernel • 3.14 upstream kernel installed on Fedora 19 • Server with two NUMA nodes • 32GB memory in total – • 16GB in each node 16 CPU cores in total – 8 CPU cores in each node Copyright©2014 NTT Corp. All Rights Reserved. 14 Test cases • Multiple KVM guests on the host • 2 KVM guests with equal size • • (8VCPUs, 16GB mem) * 2 (7VCPUs, 16GB mem) * 2 • 4 KVM guests with equal size • • (4VCPUs, 8GB mem) * 4 (5VCPUs, 8GB mem) * 4 • 4 KVM guests with different sizes • (8VCPUs, 16GB mem), (4VCPUs, 8GB mem), (2VCPUs, 4GB mem), (1VCPU, 2GB mem) Copyright©2014 NTT Corp. All Rights Reserved. 15 What’s running inside the guest • Ran the following C program in parallel • Allocate 896MB of memory and access it Allocate and access 896MB Allocate and access 896MB Allocate and access 896MB Copyright©2014 NTT Corp. All Rights Reserved. 16 Getting VCPU scheduling information • We checked how VCPU threads got scheduled • Use virsh vcpuinfo • 1 sec interval Shows where each virtual CPU is running #virsh vcpuinfo numa-vm2 VCPU: 0 CPU: 14 … VCPU: 1 CPU: 12 … VCPU: 2 CPU: 11 … VCPU: 3 CPU: 10 Copyright©2014 NTT Corp. All Rights Reserved. 17 NUMA hinting faults statistics • We checked NUMA hinting faults stats • /proc/vmstat includes • numa_hint_faults – • Number of NUMA hinting faults numa_hint_faults_local – Number of hinting faults caused by local access • 1 sec interval Calculated convergence rate as: numa_hint_faults_local / numa_hint_faults # cat /proc/meminfo … numa_hint_faults 13945 numa_hint_faults_local 10586 … Copyright©2014 NTT Corp. All Rights Reserved. 18 Test results • How VCPU threads got moved • Convergence rate Copyright©2014 NTT Corp. All Rights Reserved. 19 Case 1: (8, 16), (8, 16) • This seems the easiest • Two guests with equal size • • 8 VCPUs (= # CPU cores in one node) 16GB memory (= size of one node) Copyright©2014 NTT Corp. All Rights Reserved. 20 Case 1: (8, 16), (8, 16) Each VM gradually moved to one node, but not perfect as the convergence rate shows. Copyright©2014 NTT Corp. All Rights Reserved. 21 Case 2: (7, 16), (7, 16) • Relaxing the problem a bit • Two guests with equal size • • 7 VCPUs (= #{CPU cores in one node} - 1) 16GB memory (= size of one node) Copyright©2014 NTT Corp. All Rights Reserved. 22 Case 2: (7, 16), (7, 16) Almost perfectly converged Copyright©2014 NTT Corp. All Rights Reserved. 23 Case 3: (4, 8), (4, 8), (4, 8), (4, 8) • More than two guests • Four guests with equal size • • 4 VCPUs (= #{CPU cores in one node} / 2) 8GB memory (= {size of one node} / 2) Copyright©2014 NTT Corp. All Rights Reserved. 24 Case 3: (4, 8), (4, 8), (4, 8), (4, 8) Each VM gradually moved to one node, but some noise still exists. Copyright©2014 NTT Corp. All Rights Reserved. 25 Case 4: (5, 8), (5, 8), (5, 8), (5, 8) • Over-committing CPUs a bit • Four guests with equal size • • 5 VCPUs (= #{CPU cores in one node} / 2 + 1) 8GB memory (= {size of one node} / 2) Copyright©2014 NTT Corp. All Rights Reserved. 26 Case 4: (5, 8), (5, 8), (5, 8), (5, 8) Similar result as the non over-committed case. Copyright©2014 NTT Corp. All Rights Reserved. 27 Case 5: (8, 16), (4, 8), (2, 4), (1, 2) • Non uniform • Four guests with different sizes • • • • (8 VCPUs, 16GB mem) (4 VCPUs, 8GB mem) (2 VCPUs, 4GB mem) (1 VCPUs, 2GB mem) Copyright©2014 NTT Corp. All Rights Reserved. 28 Case 5: (8, 16), (4, 8), (2, 4), (1, 2) VM1 (8VCPUs, 16GB mem) alone did not show any convergence. Copyright©2014 NTT Corp. All Rights Reserved. 29 Summary • Automatic NUMA balancing got the same results as manually pinning the VCPU threads for many cases • For other cases, some manual pinning may still be needed • Can we know such cases before running guests? Copyright©2014 NTT Corp. All Rights Reserved. 30