Intel Open Network Platform Server (Release 1.2)

Transcription

Intel Open Network Platform Server (Release 1.2)
Intel® Open Network Platform Server
(Release 1.2)
Benchmark Performance Test Report
Document Revision 1.0
December 2014
Intel® Open Network Platform Server
Benchmark Performance Test Report
Revision History
2
Revision
Date
1.0
December 15, 2014
Comments
Initial document for release of Intel® Open Network Platform Server 1.2
Intel® Open Network Platform Server
Benchmark Performance Test Report
Contents
1.0 Introduction ................................................................................................................ 5
2.0 Ingredient Specifications ............................................................................................ 7
2.1
Hardware Versions ....................................................................................................................7
2.2
Software Versions .....................................................................................................................7
3.0 Test Cases ................................................................................................................... 9
4.0 Test Descriptions .......................................................................................................11
4.1
Performance Metrics ................................................................................................................ 11
4.2
Test Methodology ................................................................................................................... 12
4.2.1
Layer 2 and Layer 3 Throughput Tests.................................................................................. 12
4.2.2
Latency Tests.................................................................................................................... 12
5.0 Platform Configuration ...............................................................................................13
5.1
Linux Operating System Configuration .......................................................................................13
5.2
BIOS Configuration ................................................................................................................. 13
5.3
Core Usage for Intel DPDK Accelerated vSwitch .......................................................................... 14
5.4
Core Usage for OVS ................................................................................................................ 15
6.0 Test Results ...............................................................................................................17
6.1
Host Performance with DPDK .................................................................................................... 17
6.1.1
Test Case 0 - Host L3 Forwarding ........................................................................................ 18
6.1.2
Test Case 1 - Host L2 Forwarding ........................................................................................ 19
6.2
Virtual Switching Performance .................................................................................................. 20
6.2.1
Test Case 2 - OVS with DPDK-netdev L3 Forwarding (Throughput) ........................................... 21
6.2.2
Test Case 3 - OVS with DPDK-netdev L3 Forwarding (Latency) ................................................ 22
6.2.3
Test Case 4 - OVDK vSwitch L3 Forwarding (Throughput) ....................................................... 23
6.3
VM Throughput Performance without a vSwitch ........................................................................... 24
6.3.1
Test Case 5 - VM L2/L3 Forwarding with PCI Pass-through ...................................................... 25
6.3.2
Test Case 6 - DPDK L3 Forwarding: SR-IOV Pass-through ....................................................... 26
6.4
VM Throughput Performance with vSwitch .................................................................................. 27
6.4.1
Test Case 7 - Throughput Performance with One VM (OVS with DPDK-netdev, L3 Forwarding) ..... 27
6.4.2
Test Case 8 - Throughput Performance with Two VMs in Series (OVS with DPDK-netdev,
L3 Forwarding).................................................................................................................. 28
6.4.3
Test Case 9 - Throughput Performance with Two VMs in Parallel (OVS with DPDK-netdev,
L3 Forwarding).................................................................................................................. 30
6.5
Encrypt / Decrypt Performance .................................................................................................32
6.5.1
Test Case 10 - VM-VM IPSec Tunnel Performance Using Quick Assist Technology........................ 32
Appendix A
A.1
A.2
A.3
DPDK L2 and L3 Forwarding Applications....................................................35
Building DPDK and L3/L2 Forwarding Applications ........................................................................ 35
Running the DPDK L3 Forwarding Application .............................................................................. 35
Running the DPDK L2 Forwarding Application .............................................................................. 36
Appendix B
Building DPDK, Open vSwitch, and Intel DPDK Accelerated vSwitch ...........39
B.1
Getting DPDK .......................................................................................................................... 39
B.1.1
Getting DPDK Git Source Repository..................................................................................... 39
B.1.2
Getting DPDK Source tar from DPDK.ORG ............................................................................. 39
B.2
Building DPDK for OVS ............................................................................................................. 40
B.2.1
Getting OVS Git Source Repository ...................................................................................... 40
B.2.2
Getting OVS DPDK-netdev User Space vHost Patch ................................................................ 41
B.2.3
Applying OVS DPDK-netdev User Space vHost Patch .............................................................. 41
B.2.4
Building OVS with DPDK-netdev .......................................................................................... 41
B.3
Building DPDK for Intel DPDK Accelerated vSwitch ....................................................................... 42
3
Intel® Open Network Platform Server
Benchmark Performance Test Report
B.3.1
Building Intel DPDK Accelerated vSwitch............................................................................... 42
B.4
Host Kernel Boot Configuration .................................................................................................. 43
B.4.1
Kernel Boot Hugepage Configuration .................................................................................... 43
B.4.2
Kernel Boot CPU Isolation Configuration ............................................................................... 43
B.4.3
IOMMU Configuration ......................................................................................................... 44
B.4.4
Generating GRUB Boot File .................................................................................................44
B.4.5
Verifying Kernel Boot Configuration...................................................................................... 44
B.4.6
Configuring System Variables (Host System Configuration) ..................................................... 44
B.5
Setting Up Tests ...................................................................................................................... 45
B.5.1
OVS Throughput Tests (PHY-PHY without a VM) ..................................................................... 45
B.5.2
Intel DPDK Accelerated vSwitch Throughput Tests (PHY-PHY without a VM) ............................... 48
B.5.3
Intel DPDK Accelerated vSwitch with User Space vHost........................................................... 50
B.5.4
OVS with User Space vHost ................................................................................................ 53
B.5.5
SR-IOV VM Test ................................................................................................................ 61
B.5.6
Affinitization and Performance Tuning................................................................................... 68
B.6
PCI Passthrough with Intel® QuickAssist ..................................................................................... 69
B.6.1
VM Installation .................................................................................................................. 70
B.6.2
Installing strongSwan IPSec Software .................................................................................. 71
B.6.3
Configuring strongSwan IPsec Software ................................................................................ 72
Appendix C
Glossary ......................................................................................................75
Appendix D
Definitions ..................................................................................................77
D.1
D.2
Packet Throughput................................................................................................................... 77
RFC 2544 ............................................................................................................................... 78
Appendix E
4
References ..................................................................................................79
Intel® Open Network Platform Server
Benchmark Performance Test Report
1.0
Introduction
The primary audiences for this document are architects and engineers implementing the Intel® Open
Network Platform Server. Software ingredients of this architecture include:
• Fedora 20*
• Data Plane Development Kit (DPDK)
• Open vSwitch (OVS)*
• OpenStack*
• OpenDaylight*
An understanding of system performance is required to develop solutions that meet the demanding
requirements of the telecom industry and transform telecom networks. This document provides a guide
for performance characterization using the Intel® Open Network Platform Server (Intel ONP Server).
Ingredient versions, integration procedures, configuration parameters, and test methodologies all
influence performance.
This document provides a suite of Intel ONP Server performance tests, and includes learning to help
with optimizing system configurations, test cases, configuration details, and performance data. The test
methodologies and data provide a baseline for performance characterization of a Network Function
Virtualization (NFV) compute node using Commercial Off-the-Shelf (COTS) with Intel® Xeon® E5-2697
V3 processor. This data does not represent optimal performance but should be easy to reproduce to
help architects evaluate their specific performance requirements. The tests described are not
comprehensive and do not cover any conformance aspects. Providing a baseline configuration of well
tested procedures can help to achieve optimal system performance when developing an NFV/SDN
solution.
Benchmarking is not a trivial task, as it takes equipment, time, and resources. Engineers doing NFV
performance characterization need strong domain knowledge (telco, virtualization, etc.), good
understanding of the hardware (compute platforms, networking technologies), familiarity with network
protocols, and hands-on experience working with relevant open-source software (Linux virtualization,
vSwitch, software acceleration technologies like DPDK, system optimization/tuning, etc.).
5
Intel® Open Network Platform Server
Benchmark Performance Test Report
NOTE:
6
This page intentionally left blank.
Intel® Open Network Platform Server
Benchmark Performance Test Report
2.0
Ingredient Specifications
2.1
Hardware Versions
Table 2-1
Hardware Ingredients
Description
Notes
Platform
Intel® Server Board S2600WTT
Intel® Xeon® DP-based Server (2 CPU sockets). 120 GB SSD
2.5in SATA 6 GB/s Intel Wolfsville SSDSC2BB120G4.
Processors
Intel® Xeon® E5-2697 V3 processor
(Formerly code-named Haswell) 14 Core, 2.60GHz, 145W 35M
cache, 9.6 GT/s QPI, DDR4-1600/1866/2133. Tested with
DDR4-1067
Cores
14 physical cores per CPU (i.e. per socket)
28 Hyper-threaded cores per CPU (i.e. per socket) for 56 total
cores per platform (i.e. 2 sockets).
Memory
8 GB DDR4 RDIMM Crucial
Server capacity = 64 GB RAM (8 x 8 GB). Tested with 32 GB
memory.
NICs (Niantic)
2x Intel® 82599 10 Gigabit Ethernet
Controller
NICs are on socket zero (3 PCIe slots available on socket 0).
BIOS
BIOS Revision:
GRNDSDP1.86B.0038.R01.1409040644
Release Date: 09/04/2014
Intel® Virtualization Technology for Directed I/O (Intel® VT-d)
enabled. Hyper-threading disabled.
Quick Assist
Technology
Intel® QuickAssist Adapter 8950-SCCP
(formerly code-named Walnut Hill) with
Intel® Communications Chipset 8950
(formerly code-named Coleto Creek)
PCIe acceleration card with Intel® Communications Chipset
8950. Capabilities include RSA, SSL/IPsec, Wireless Crypto,
Security Crypto, Compression.
2.2
Software Versions
Table 2-2
Software Versions
Software Component
Function
Version/Configuration
Fedora 20 x86_64
Host OS
3.15.6-200.fc20.x86_64
Fedora 20 x86_64
VM
3.15.6-200.fc20.x86_64
Qemu‐kvm
Virtualization technology
1.6.2-9.fc20.x86_64
Data Plane Development
Kit (DPDK)
Network Stack bypass
DPDK 1.7.1
Intel® DPDK Accelerated
vSwitch (OVDK)1
vSwitch
v1.1.0.61
Open vSwitch (OVS)
vSwitch
Open vSwitch V 2.3.90
git commit 99213f3827bad956d74e2259d06844012ba287a4
git commit bd8e8ee7565ca7e843a43204ee24a7e1e2bf9c6f
git commit e9bbe84b6b51eb9671451504b79c7b79b7250c3b
7
Intel® Open Network Platform Server
Benchmark Performance Test Report
Table 2-2
Software Versions (Continued)
Software Component
2
DPDK-netdev
Function
Version/Configuration
vSwitch patch
This is currently an experimental feature of OVS that needs to be applied
to OVS version specified above.
http://patchwork.openvswitch.org/patch/6280/
®
Intel QuickAssist
Technology Driver (QAT)
Crypto Accelerator
Intel® Communications Chipset 89xx Series Software for Linux* - Intel®
QuickAssist Technology Driver (L.2.2.0-30).
Latest drivers posted at:
https://01.org/packet-processing/intel%C2%AE-quickassist-technologydrivers-and-patches
strongSwan
IPSec stack
strongSwan v.4.5.3
http://download.strongswan.org/strongswan-4.5.3.tar.gz
PktGen
Software Network
Package Generator
v.2.7.7
1. Intel has stopped further investment in the OVDK (available on Github). Intel OVDK demonstrated that the Data Plane Development
Kit (DPDK) accelerates performance. Intel has also contributed to the Open vSwitch project which has adopted the DPDK and
therefore it is not desirable to have two different code bases that use the DPDK for accelerated vSwitch performance. As a result,
Intel has decided to increase investment in the Open vSwitch community project with a focus on using the DPDK and advancing
hardware acceleration.
2. Keeping up with line rate at 10 Gb/s, in a flow of 64-byte packets, requires processing more than 14.8 million packets per second,
but current standard version of OVS kernel pushes more like 1.1 million to 1.3 million, OVS’ latency in processing small packets
becomes intolerable as well. The new, mainstream-OVS code goes under the name OVS with DPDK-netdev and has been on Github
since March. It’s available as an “experimental feature” in OVS version 2.3. Intel hopes to include the code officially in OVS 2.4,
which OVS committers are hoping to release early next year.
8
Intel® Open Network Platform Server
Benchmark Performance Test Report
3.0
Test Cases
Network throughput, latency, and jitter performance are measured for L2/L3 forwarding and uses
standard test methodology RFC 2544.
L3 forwarding on host and port-port switching performance using vSwitch also serves as verification
test after installing compute node ingredients. The table below is a summary of performance test-cases
(this is subset of many possible test configurations).
Table 3-1
Test Case
Test Case Summary
Description
Configuration
Workload/Metrics
Host-Only Performance
0
1
L3 Forwarding on host
L2 Forwarding on host
DPDK
DPDK
•
L3 Fwd (DPDK test Application)
•
No pkt modification
•
Throughput1
•
Latency (avg, max, min)
•
L2 Fwd (DPDK Application)
•
Pkt MAC modification
•
Throughput1
•
Latency (avg, max, min)
•
Switch L3 Fwd
•
Throughput1
•
Latency (avg, max, min)
•
Switch L3 Fwd
•
Latency distribution (percentile ranges)3
•
Switch L3 Fwd
Virtual Switching Performance
2
3
4
5
6
7
L3 Forwarding by vSwitch –
throughput
OVS with DPDK-netdev2
2
L3 Forwarding by vSwitch latency
OVS with DPDK-netdev
L3 Forwarding by vSwitch
Intel® DPDK Accelerated
vSwitch
L2 Forwarding by single VM
L3 Forwarding by single VM
L3 Forwarding by single VM
Pass-through
SR-IOV
2
OVS with DPDK-netdev
User space vhost
•
Throughput1
•
Latency (avg, max, min)
•
L2 Fwd in VM
•
Pkt MAC modification
•
Throughput1
•
Latency (avg, max, min)
•
L2 Fwd in VM
•
Pkt MAC modification
•
Throughput1
•
Latency (avg, max, min)
•
L3 table lookup in switch with Port fwd (test pmd
app) in VM without pkt modification
•
Throughput1
•
Latency (avg, max, min)
9
Intel® Open Network Platform Server
Benchmark Performance Test Report
Table 3-1
Test Case
8
9
Test Case Summary (Continued)
Description
L3 Forwarding by two VMs in
series
L3 Forwarding by VMs in
parallel
Configuration
OVS with DPDK-netdev
User space vhost
Workload/Metrics
2
OVS with DPDK-netdev2
User space vhost
•
L3 table lookup in switch with Port fwd (test pmd
app) in VM without pkt modification
•
Throughput1
•
Latency (avg, max, min)
•
L3 table lookup in switch with Port fwd (test pmd
app) in VM without pkt modification
•
Throughput1
•
Latency (avg, max, min)
•
Throughput1
Security
10
Encrypt/decrypt with Intel®
Quick Assist Technology
VM-VM IPSec tunnel
performance
1. Tests run for 60 seconds per load target.
2. OVS with DPDK-netdev test-cases were performed using OVS with DPDK-netdev which is available as an “experimental feature”
in OVS version 2.3. This shows performance improvements possible today with standard OVS code when using DPDK. Refer to
Section 2.2 for details of software versions with commit IDs.
3. Tests run 1 hour per load target.
10
Intel® Open Network Platform Server
Benchmark Performance Test Report
4.0
Test Descriptions
4.1
Performance Metrics
The Metro Ethernet Forum (http://metroethernetforum.org/) gives Ethernet network performance
parameters that affect service quality as:
• Frame delay (latency)
• Frame delay variation (jitter)
• Frame loss
• Availability
Furthermore the following metrics can characterize a network connected device (i.e. compute node):
• Throughput1 — The maximum frame rate that can be received and transmitted by the node without
any error for “N” flows.
• Latency — Introduced by the node that will be cumulative to all other end-to-end network delays.
• Jitter2 i— Introduced by the node infrastructure.
• Frame loss3 — While overall network losses do occur, all packets should be accounted for within a
node.
• Burst behavior4 — This measures the ability of the node to buffer packets.
• Metrics5 — Used to characterize the availability and capacity of the compute node. These include
start-up performance metrics as well as fault handling and recovery metrics. Metrics measured
when the network service is fully “up” and connected include:
— Power consumption of the CPU in various power states.
— Number of cores available/used.
— Number of NIC interfaces supported.
— Headroom available for processing workloads (for example, available for VNF processing).
1. See Appendix D for definition for packet throughput.
2. Jitter data is not included here. While “round-trip” jitter is fairly easy to measure, ideally it should be measured from the
perspective of the application that has to deal with the jitter (e.g. VNF running on the node).
3. All data provided is with zero packet loss conditions.
4. Burst behavior data is not included here.
5. Availability and capacity data is not included here.
11
Intel® Open Network Platform Server
Benchmark Performance Test Report
4.2
Test Methodology
4.2.1
Layer 2 and Layer 3 Throughput Tests
These tests determines the maximum Layer 2 or Layer 3 forwarding rate without traffic loss, and
average latency for different packet sizes for the Device Under Test (DUT). While packets do get lost in
telco networks, generally communications equipment is expected NOT to lose any packets. Therefore, it
is important to do a zero packet-loss test of sufficient duration. Test results are zero packet-loss unless
otherwise noted.
Test are usually performed full duplex with traffic transmitting in both directions. Tests here were
conducted with a variety of flows, for example, single flow (unidirectional traffic), two flows
(bidirectional traffic with one flow in each direction), 30,000 flows (bidirectional with 15k flows in each
direction).
For Layer 2 forwarding, the DUT must perform packet parsing and Layer 2 address look-ups on the
ingress port and then modify the MAC header for before forwarding the packet on the egress port.
For Layer 3 forwarding, the DUT must perform packet parsing and Layer 3 address look-ups on the
ingress port and then forward the packet on the egress port.
Test duration refers to the measurement period for and particular packet size with an offered load and
assumes the system has reached a steady state. Using RFC 2544 test methodology, this is specified as
at least 60 seconds. To detect outlier data, it is desirable to run longer duration “soak” tests once
maximum zero loss throughput is determined. This is particularly relevant for latency tests. In this
document, throughput tests used 60-second load iterations, and latency tests were run for one hour
(per load target).
4.2.2
Latency Tests
Minimum, Maximum, and average latency numbers were collected with throughput tests. This test aims
to understand latency distribution for different packet sizes and over an extended test run to uncover
outliers. This means that latency tests should run for extended periods (at least 1 hour and ideally 24
hours). Practical considerations dictate that we pick the highest throughput that has demonstrated zero
packet loss (for a particular packet size) as determined with above throughput tests.
Note:
12
Latency through the system depends on various parameters configured by the software,
such as the ring size in the Ethernet adapters, Ethernet tail pointer update rates, platforms
characteristics, etc.
Intel® Open Network Platform Server
Benchmark Performance Test Report
5.0
Platform Configuration
The configuration parameters described in this section are based on experience running various tests
and may not be optimal.
5.1
Linux Operating System Configuration
Table 5-1
Linux Configuration
Parameter
Enable/Disable
Explanation
Firewall
Disable
Disable.
NetworkManager
Disable
As set by install.
irqbalance
Disable
ssh
Enabled
Address Space Layout Randomization
(ASLR)
Disable
IPV4 Forwarding
Disable
Host kernel boot parameters
(regenerate grub.conf)
default_hugepagesz=1G hugepagesz=1G
hugepages=32 'intel_iommu=off’
isolcpus=1,2,3,4,5,6,7,8,9..n-1 for n core
systems
Isolate DPDK cores on host.
VM kernel boot parameters
(regenerate grub.conf)
default_hugepagesz=1GB hugepagesz=1GB
hugepages=1 isolcpus=1,2
Isolate DPDK cores on VM.
5.2
BIOS Configuration
The configuration parameters described in this section are based on experience running the tests
described and may not be optimal.
Table 5-2
BIOS Configuration
Parameter
Enable/Disable
Explanation
Power Management Settings
CPU Power and Performance Policy
Intel®
Turbo boost
Traditional
This is the equivalent of “performance” setting.
Enable
Processor C3
Disable
Prevents low power state
Processor C6
Enable
Not relevant when C3 is disabled.
13
Intel® Open Network Platform Server
Benchmark Performance Test Report
Table 5-2
BIOS Configuration (Continued)
Parameter
Enable/Disable
Explanation
Processor Configuration
Intel® Hyper-Threading Technology (HTT)
Disable
MLC Streamer
Enable
Hardware Prefetcher: Enabled
MLC Spatial Prefetcher
Enable
Adjacent Cache Prefetch: Enabled
DCU Data Prefetcher
Enable
DCU Streamer Prefetcher: Enabled
DCU Instruction Prefetcher
Enable
DCU IP Prefetcher: Enabled
Enable
This is disabled by the kernel when iommu = off
(default).
Virtualization Configuration
Intel® Virtualization Technology for Directed I/O
(VT-d)
Memory Configuration
Memory RAS and Performance Configuration ->
NUMA Optimized
Auto
I/O Configuration
DDIO (Direct Cache Access)
Auto
Other Configuration
HPET (High Precision Timer)
5.3
Disable
Requires a kernel rebuild to utilize (Fedora 20 by
default sets to disabled).
Core Usage for Intel DPDK Accelerated
vSwitch
Performance results in this section do not include VM tests with Intel DPDK Accelerated vSwitch
(OVDK). The core usage information here is provided as a reference as a suggestion for doing
additional tests with OVDK.
Table 5-3
Process
Core Usage for Intel DPDK Accelerated vSwitch
Core
Linux Commands
OVS_DPDK Switching Cores
OVS_DPDK
0, 1, 2, 3
Affinity is set in `ovs_dpdk’ command line.
Tests used coremask 0x0F (i.e., cores 0, 1, 2, and 3.
vswitchd
8
taskset –a <pid_of_vswitchd_process>
This is only relevant when setting up flows/
OS/VM Cores
QEMU* VM1
VCPU0
4
taskset –p 10 <pid_of_vm1_qemu_vcpu0_process>
QEMU* VM1
VCPU1
5
taskset –p 20 <pid_of_vm1_qemu_vcpu1_process>
QEMU* VM2
VCPU0
6
taskset –c –p 40 <pid_of_vm2_qemu_vcpu0_process>
QEMU* VM2
VCPU1
7
taskset –c –p 80 <pid_of_vm2_qemu_vcpu1_process>
Kernel
14
0 + CPU socket 1 cores
All other CPUs isolated (`isolcpus’ boot parameter).
Intel® Open Network Platform Server
Benchmark Performance Test Report
5.4
Core Usage for OVS
Table 5-4
Core Usage for OVS
Process
Core
Linux Commands
OVS switchd Cores
ovs-vswitchd
(all tasks except pmd)
3
ovs-vswitchd pmd task
4
OVS user space processes affinitized to core mask 0x04.
# ./vswitchd/ovs-vswitchd --dpdk -c 0x4 -n 4 --socket-mem 4096,0 -unix:$DB_SOCK --pidfile
Sets the PMD core mask to 2 for CPU core 1 affinitization (sets in the OVS
database and is persistent),
# ./ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=2
All Linux threads for OVS vSwitch daemon can be listed with:
$ top -p `pidof ovs-vswitchd` -H -d1
Using taskset, the current task thread affinity can be checked and changed to other CPU cores as
needed.
15
Intel® Open Network Platform Server
Benchmark Performance Test Report
NOTE:
16
This page intentionally left blank.
Intel® Open Network Platform Server
Benchmark Performance Test Report
6.0
Note:
Test Results
In this section, all performance testing done using RFC 2544 test methodology (zero loss),
refer to Appendix D for more information.
6.1
Host Performance with DPDK
Figure 6-1
Host Performance with DPDK
Configuration:
1. No VMs are configured and no vSwitch is started or used.
2. DPDK-based port forwarding sample application (L2/L3fwd).
Data Path (Numbers Matching Red Circles):
1. The packet generator creates flows based on RFC 2544.
2. The DPDK L2 or L3 forwarding application forwards the traffic from the first physical port to the
second.
3. The traffic flows back to the packet generator and is measured per RFC 2544 throughput test.
17
Intel® Open Network Platform Server
Benchmark Performance Test Report
6.1.1
Test Case 0 - Host L3 Forwarding
Test Configuration Notes:
• Traffic is unidirectional (1 flow) or bidirectional (2 flows).
• Packets are not modified.
Figure 6-2
Host L3 Forwarding Performance – Throughput
The average latency data in Table implies that latency increases with packet size, however this is
because throughput is approaching maximum (where latency increases dramatically). Therefore these
latency figures may not represent realistic conditions (i.e. at lower data rates).
Table 6-1 shows performance with bidirectional traffic. Note that performance is limited by the network
card.
Table 6-1
18
Host L3 forwarding Performance – Average Latency
Packet Size
(Bytes)
Average Latency
(s)
Bandwidth
(% of 10 GbE)
64
9.77
78
128
12.33
97
256
13.17
99
512
20.24
99
1024
33.02
99
1280
39.82
99
1518
46.61
99
Intel® Open Network Platform Server
Benchmark Performance Test Report
6.1.2
Test Case 1 - Host L2 Forwarding
Test Configuration Notes:
• Traffic is bidirectional (2 flows).
• Packets are modified (Ethernet header).
Figure 6-3
Host L2 Forwarding Performance – Throughput
19
Intel® Open Network Platform Server
Benchmark Performance Test Report
6.2
Virtual Switching Performance
Figure 6-4
Virtual Switching Performance
Configuration:
1. No VMs are configured (i.e. PHY-to-PHY).
2. Intel® DPDK Accelerated vSwitch or OVS with DPDK-netdev, port-to-port.
Data Path (Numbers Matching Red Circles):
1. The packet generator creates flows based on RFC 2544.
2. The Intel® DPDK Accelerated vSwitch or OVS network stack forwards the traffic from the first
physical port to the second.
3. The traffic flows back to the packet generator.
20
Intel® Open Network Platform Server
Benchmark Performance Test Report
6.2.1
Test Case 2 - OVS with DPDK-netdev L3 Forwarding
(Throughput)
Test Configuration Notes:
• Traffic is bidirectional, except for one flow case (unidirectional). Number of flows in chart represents
total flows in both directions.
• Packets are not modified.
• These tests are “PHY-PHY” (physical port to physical port without a VM).
Figure 6-5
OVS with DPDK-netdev Switching Performance - Throughput
Figure 6-6
OVS with DPDK-netdev Switching Performance - Average Latency1
1. Latency is measured by the traffic generator on a per packet basis.
21
Intel® Open Network Platform Server
Benchmark Performance Test Report
6.2.2
Test Case 3 - OVS with DPDK-netdev L3 Forwarding
(Latency)
Test Configuration Notes:
• Traffic is bidirectional.
• Packets are not modified.
• Test run for 1 hour per target load.
• Only 64-byte results are shown in the chart.
• These tests are “PHY-PHY” (physical port to physical port without a VM).
Figure 6-7
Host L2 Forwarding Performance – Latency Distribution for 64B Packets
The chart in Figure 6-7 shows latency distribution of packets for a range of target loads. For example,
with target load of 10% (i.e. 1 Gb/s) almost all packets have a latency of between 2.5 and 5.1 s,
whereas at a target load of 50% (i.e. 5 Gb/s) almost all packets have a latency of between 10.2 and
20.5 s.
22
Intel® Open Network Platform Server
Benchmark Performance Test Report
6.2.3
Test Case 4 - OVDK vSwitch L3 Forwarding
(Throughput)
Test Configuration Notes:
• Traffic is bidirectional.
• Packets are not modified.
• Results for three different flow settings are shown (2, 10k, 30k).
• These tests are “PHY-PHY” (physical port to physical port without a VM).
Figure 6-8
OVDK Switching Performance - Throughput
Figure 6-9
OVDK Switching Performance - Average Latency
23
Intel® Open Network Platform Server
Benchmark Performance Test Report
6.3
VM Throughput Performance without a
vSwitch
Figure 6-10 VM Throughput Performance without a vSwitch
Configuration (Done Manually):
1. One VM gets brought up and connected to the NIC with PCI Passthrough or SR-IOV.
2. IP addresses of the VM gets configured.
Data Path (Numbers Matching Red Circles):
1. The packet generator creates flows based on RFC 2544.
2. Flow arrived at the first vPort of the VM.
3. The VM receives the flows and forwards them out through its second vPort.
4. The flow is sent back to the packet generator.
24
Intel® Open Network Platform Server
Benchmark Performance Test Report
6.3.1
Test Case 5 - VM L2/L3 Forwarding with PCI Passthrough
Test Configuration Notes for L3 Forwarding:
• Traffic is bidirectional.
• Packets not modified.
Figure 6-11 VM L3 Forwarding Performance with PCI Pass-through - Throughput
Test Configuration Notes for L2 Forwarding:
• Traffic is bidirectional.
• Packets are modified (Ethernet Header).
Figure 6-12 VM L2 Forwarding Performance with PCI Pass-through - Throughput
25
Intel® Open Network Platform Server
Benchmark Performance Test Report
6.3.2
Test Case 6 - DPDK L3 Forwarding: SR-IOV Passthrough
Test Configuration Notes for L3 Forwarding:
• Traffic is bidirectional.
• Packets not modified.
Figure 6-13 VM L3 Forwarding Performance with SR-IOV - Throughput
26
Intel® Open Network Platform Server
Benchmark Performance Test Report
6.4
VM Throughput Performance with vSwitch
6.4.1
Test Case 7 - Throughput Performance with One
VM (OVS with DPDK-netdev, L3 Forwarding)
Figure 6-14 VM Throughput Performance with vSwitch - One VM
Configuration (Done Manually):
1. One VM gets brought up and connected to the vSwitch.
2. IP address of the VM gets configured.
3. Flows get programmed to the vSwitch.
Data Path (Numbers Matching Red Circles):
1. The packet generator creates flows based on RFC 2544.
2. The vSwitch forwards the flows to the first vPort of the VM.
3. The VM receives the flows and forwards them out through its second vPort.
4. The vSwitch forwards the flows back to the packet generator.
Test Configuration Notes:
• Traffic is bidirectional.
• Packets not modified.
27
Intel® Open Network Platform Server
Benchmark Performance Test Report
Figure 6-15 VM Throughput Performance - One VM (OVS with DPDK-netdev)
6.4.2
Test Case 8 - Throughput Performance with Two
VMs in Series (OVS with DPDK-netdev, L3
Forwarding)
Figure 6-16 VM Throughput Performance with vSwitch - Two VMs in Series
28
Intel® Open Network Platform Server
Benchmark Performance Test Report
Configuration (Done Manually):
1. Setup 2 VMs and connect to vSwitch.
2. IP addresses of VMs are configured.
3. Flows get programmed to the vSwitch.
Data Path (Numbers Matching Red Circles):
1. The packet generator creates flows based on RFC 2544.
2. The vSwitch forwards the flows to the first vPort of VM #1.
3. VM #1 receives the flows and forwards them to VM #2 through its second vPort (via the vSwitch).
4. The vSwitch forwards the flows to the first vPort of VM #2.
5. VM #2 receives the flows and forwards them back to the vSwitch through its second vPort.
6. The vSwitch forwards the flows back to the packet generator.
Test Configuration Notes:
• Traffic is bidirectional
• Packets not modified
Figure 6-17 VM Throughput Performance - Two VMs in Series (OVS with DPDK-netdev)
29
Intel® Open Network Platform Server
Benchmark Performance Test Report
6.4.3
Test Case 9 - Throughput Performance with Two
VMs in Parallel (OVS with DPDK-netdev, L3
Forwarding)
Figure 6-18 VM Throughput Performance with vSwitch - Two VMs in Parallel
Configuration (Done Manually):
1. Setup 2 VMs and connect to vSwitch.
2. IP addresses of VMs are configured.
3. Flows get programmed to the vSwitch.
Data Path (Numbers Matching Red Circles):
1. The packet generator creates flows based on RFC 2544.
2. The vSwitch forwards the flows to the first vPort of VM #1 and first vPort of VM #2.
3. VM #1 receives the flows and forwards out through its second vPort. VM #2 receives the flows and
forwards out through its second vPort.
4. The vSwitch forwards the flows back to the packet generator.
Test Configuration Notes:
• Traffic is bidirectional
• Packets not modified
30
Intel® Open Network Platform Server
Benchmark Performance Test Report
Figure 6-19 VM Throughput Performance - Two VMs in Parallel (OVS with DPDK-netdev)
31
Intel® Open Network Platform Server
Benchmark Performance Test Report
6.5
Encrypt / Decrypt Performance
6.5.1
Test Case 10 - VM-VM IPSec Tunnel Performance
Using Quick Assist Technology
An IPSec Tunnel is setup between two tunnel endpoints, one on each Virtual Machine, and each VM uses
Quick Assist technology to offload encryption and decryption.
Figure 6-20 VM-VM IPSec Tunnel Performance Using QAT
Throughput tests use UDP data through the IPSEC tunnel and encap/decap in the VM is handled by
strongSwan (an openSource IPsec-based VPN Solution - https://www.strongswan.org/). Refer to
Appendix B.6 to setup a VM with QAT drivers, netkeyshim module and the strongSwan IPSec
application.
Configuration:
• Two Quick Assist Engines are required (one for each VM).
• PCI pass-through is used to attach the Quick Assist PCI accelerator to a VM.
• The virtual network interface of the VMs is virtio with vHost i.e. per standard Open vSwitch.
• Netperf UDP stream test is used to generate traffic in the VM.
The chart below provides a performance comparison with Quick Assist which indicates an improvement
of more than twice the throughput for 64-byte packets and more for larger packets.
32
Intel® Open Network Platform Server
Benchmark Performance Test Report
Figure 6-21 VM-to-VM IPsec Tunnel Performance Using Intel® Quick Assist Technology
Note:
Results using Intel Grizzly Pass Xeon® DP Server, 64 GB RAM (8x 8 GB), Intel® Xeon®
Processor Series E5-2680 v2 LGA2011 2.8GHz 25MB 115W 10 cores, 240GB SSD 2.5in
SATA. See Intel® Open Network Platform for Servers Reference Architecture (Version 1.1),
also is provided here for reference. Similar results are expected for the ingredients
specified in Section 2.0, “Ingredient Specifications”.
33
Intel® Open Network Platform Server
Benchmark Performance Test Report
NOTE:
34
This page intentionally left blank.
Intel® Open Network Platform Server
Benchmark Performance Test Report
Appendix A DPDK L2 and L3 Forwarding
Applications
A.1
Building DPDK and L3/L2 Forwarding
Applications
The DDPK is installed with IVSHMEM support. It can be also installed without IVSHMEM (see DPDK
documentation).
1. If previously built for OVS or OVDK, DPDK needs to be uninstalled first.
# make uninstall
# make install T=x86_64-ivshmem-linuxapp-gcc
2. Assuming DPDK Git repository, change DPDK paths for tar install.
# export RTE_SDK=/usr/src/dpdk
# export RTE_TARGET=x86_64-ivshmem-linuxapp-gcc
3. Build l3fwd.
# cd /usr/src/dpdk/examples/l3fwd/
# make
4. Build l2fwd.
# cd /usr/src/dpdk/examples/l2fwd/
# make
A.2
Running the DPDK L3 Forwarding
Application
Build DPDK and l3fwd as described in Appendix A.1. Setup host bootline 1 GB and 2 MB hugepage
memory and isolcpus.
1. Initialize host hugepage memory after reboot.
# mount -t hugetlbfs nodev /dev/hugepages
# mkdir /dev/hugepages_2mb
# mount -t hugetlbfs nodev /dev/hugepages_2mb -o pagesize=2MB
2. Initialize host UIO driver.
# modprobe uio
# insmod $RTE_SDK/x86_64-ivshmem-linuxapp-gcc/kmod/igb_uio.ko
35
Intel® Open Network Platform Server
Benchmark Performance Test Report
3. Locate target NICs to be used for test (assuming 06:00.0 and 06:00.01 was found target), attach
to UIO driver, and check.
#
#
#
#
lspci -nn |grep Ethernet
$RTE_SDK/tools/dpdk_nic_bind.py --bind=igb_uio 06:00.0
$RTE_SDK/tools/dpdk_nic_bind.py --bind=igb_uio 06:00.1
$RTE_SDK/tools/dpdk_nic_bind.py --status
4. To run the DPDK L3 forwarding application on the compute node host use the following command
line:
# cd $RTE_SDK/examples/l3fwd
# ./build/l3fwd -c 0x6 -n 4 --socket-mem 1024,0 -- -p 0x3 -config="(0,0,1),(1,0,2)"
Note:
The -c option enables cores (1,2) to run the application. The --socket-mem option just
allocates 1 GB of the hugepage memory from target NUMA node 0 and zero memory
from NUMA node 1. The -p option is the hexadecimal bit mask of the ports to configure.
The --config (port, queue, lcore) determines which queues from which ports are
mapped to which cores.
Additionally, to run the standard Linux network stack L3 forwarding without DPDK, enable the kernel
forwarding parameter as below and start the traffic.
# echo 1 > /proc/sys/net/ipv4/ip_forward
To setup the test for verifying the devstack installation on compute node and to contrast DPDK host L3
forwarding versus Linux kernel forwarding refer to Section 7.1. This section also details the
performance data for both test cases.
A.3
Running the DPDK L2 Forwarding
Application
The L2 forward application modifies the packet data by changing the Ethernet address in the outgoing
buffer which might impact performance. L3fwd program forwards the packet without modifying the
packet.
Build DPDK and l2fwd as described in Appendix A.1. Setup host bootline 1 GB and 2 MB hugepage
memory and isolcpus.
1. Initialize host hugepage memory after reboot.
# mount -t hugetlbfs nodev /dev/hugepages
# mkdir /dev/hugepages_2mb
# mount -t hugetlbfs nodev /dev/hugepages_2mb -o pagesize=2MB
2. Initialize host UIO driver.
# modprobe uio
# insmod $RTE_SDK/x86_64-ivshmem-linuxapp-gcc/kmod/igb_uio.ko
3. Locate target NICs to be used for test (assuming 06:00.0 and 06:00.01 was found target), attach
to UIO driver, and check.
#
#
#
#
36
lspci -nn |grep Ethernet
$RTE_SDK/tools/dpdk_nic_bind.py --bind=igb_uio 06:00.0
$RTE_SDK/tools/dpdk_nic_bind.py --bind=igb_uio 06:00.1
$RTE_SDK/tools/dpdk_nic_bind.py --status
Intel® Open Network Platform Server
Benchmark Performance Test Report
4. To run the DPDK 23 forwarding application on the compute node host use the following command
line:
# cd $RTE_SDK/examples/l3fwd
# ./build/l3fwd -c 0x6 -n 4 --socket-mem 1024,0 -- -p 0x3 -config="(0,0,1),(1,0,2)"
Note:
The -c option enables cores (1,2) to run the application. The --socket-mem option just
allocates 1 GB of the hugepage memory from target NUMA node 0 and zero memory
from NUMA node 1. The -p option is the hexadecimal bit mask of the ports to configure.
The --config (port, queue, lcore) determines which queues from which ports are
mapped to which cores.
37
Intel® Open Network Platform Server
Benchmark Performance Test Report
NOTE:
38
This page intentionally left blank.
Intel® Open Network Platform Server
Benchmark Performance Test Report
Appendix B Building DPDK, Open vSwitch,
and Intel DPDK Accelerated
vSwitch
B.1
Getting DPDK
The DPDK needs to be built differently for 1) the various test programs (e.g. DPDK L2 and L3 test
programs), 2) OVS with DPDK-netdev, and 3) Intel DPDK Accelerated vSwitch. The DPDK should be
built in the same way for the host machine and VM. There are two ways of obtaining DPDK source as
listed in Appendix B.1.1 and Appendix B.1.2.
B.1.1
Getting DPDK Git Source Repository
The git repository was the method used to obtain the code used for tests in this document.
The DPDK Git Source Repository which was used since the latest available tar file changes often. The
older tar files are more difficult to obtain, while the target git code is obtained by just checking out the
correct commit or tag.
# cd /usr/src
# git clone git://dpdk.org/dpdk
# cd /usr/src/dpdk
Need to checkout the target DPDK version:
# git checkout -b test_v1.7.1 v1.7.1
The export directory for DPDK git repository is:
# export RTE_SDK=/usr/src/dpdk
B.1.2
Getting DPDK Source tar from DPDK.ORG
This is an alternative way to get the DPDK source and is provided here as additional information.
Get DPDK release 1.7.1 and untar it. Download DPDK 1.7.1 at: http://www.dpdk.org/download
Move to install location, move DPDK tar file to destination, and untar:
#
#
#
#
cd /usr/src
mv ~/dpdk-1.7.1.tar.gz .
tar -xf dpdk-1.7.1.tar.gz
cd /usr/src/dpdk-1.7.1
39
Intel® Open Network Platform Server
Benchmark Performance Test Report
The specific DPDK version directory must be used for the code builds. The git repository DPDK directory
is shown in the documentation below and would need to be changed for the DPDK 1.7.1 target
directory.
# export RTE_SDK=/usr/src/dpdk-1.7.1
B.2
Building DPDK for OVS
The CONFIG_RTE_BUILD_COMBINE_LIBS=y needs to be set in config/common_linuxapp file. It can be
changed by using a text editor or using the following patch:
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 9047975..6478520 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -81,7 +81,7 @@ CONFIG_RTE_BUILD_SHARED_LIB=n
#
# Combine to one single library
#
-CONFIG_RTE_BUILD_COMBINE_LIBS=n
+CONFIG_RTE_BUILD_COMBINE_LIBS=y
CONFIG_RTE_LIBNAME="intel_dpdk"
#
The patch can be applied by copying it into a file in the upper level directory just above DPDK directory
(/usr/src). If the patch file was called patch_dpdk_single_lib, the following is an example of it being
applied:
cd /usr/src/dpdk
patch -p1 < ../patch_dpdk_single_lib
# make install T=x86_64-ivshmem-linuxapp-gcc
B.2.1
Getting OVS Git Source Repository
Make sure the DPDK is currently built for the OVS build, if not change as documented above and rebuild
DPDK.
Get the OVS Git Repository:
# cd /usr/src
# git clone https://github.com/openvswitch/ovs.git
# cd /usr/src/ovs
Assume that the latest commit is the target commit. Create a working branch off current branch to do
testing and for applying user vhost patch:
# git checkout –b test_branch
Or checkout specific commit used in testing:
# git checkout –b test_branch e9bbe84b6b51eb9671451504b79c7b79b7250c3b
The user vhost interface patch needs to be applied for User vHost Tests.
40
Intel® Open Network Platform Server
Benchmark Performance Test Report
B.2.2
Getting OVS DPDK-netdev User Space vHost Patch
Currently (10/31/14), not all the User vHost DPDK patches have been up streamed into the OVS source
code.
The current patches can be obtained from:
Sept. 29, 2014: http://patchwork.openvswitch.org/patch/6280/
[ovs-dev,v5,1/1] netdev-dpdk: add dpdk vhost ports
ovs-dev-v5-1-1-netdev-dpdk-add-dpdk-vhost-ports.patch
B.2.3
Applying OVS DPDK-netdev User Space vHost
Patch
Before applying the patch, it is desirable to move to and use your own git branch to prevent source tree
issues switching issues. The following creates and switches to the new git branch test_srt, but any
non-used branch name can be used:
git checkout -b test_srt
Switched to a new branch 'test_srt'
Assuming patch has been placed in /usr/src directory, the following applies the patch:
cd /usr/src/ovs
patch -p1 < ../ovs-dev-v5-1-1-netdev-dpdk-add-dpdk-vhost-ports.patch
Assuming that the patch applies correctly, the changes can be checked into the git repository for the
test branch allowing the switching between different OVS commit versions. At a later time the patch
can be applied to later point in the code repository.
B.2.4
Building OVS with DPDK-netdev
The DPDK needs to be built for OVS as described above. The following assumes DPDK is from git
repository, adjust DPDK path if other DPDK build directory:
#
#
#
#
#
#
cd /usr/src/ovs
export DPDK_BUILD=/usr/src/dpdk/x86_64-ivshmem-linuxapp-gcc
./boot.sh
./configure --with-dpdk=$DPDK_BUILD
make
make eventfd
Note:
The “ivshmem” option adds support for IVSHMEM and also supports other interface options
(user space vHost). By using this option keeps the build consistent across all test
configurations.
41
Intel® Open Network Platform Server
Benchmark Performance Test Report
B.3
Building DPDK for Intel DPDK Accelerated
vSwitch
The DPDK must be built as part of the DPDK Accelerated vSwitch master build, or may not build
correctly. It cannot have the CONFIG_RTE_BUILD_COMBINE_LIBS=y flag set.
B.3.1
Building Intel DPDK Accelerated vSwitch
The OVDK master build is the best way to build OVDK. Make sure DPDK is the base release code
without the CONFIG_RTE_BUILD_COMBINE_LIBS=y flag set. The following for the git repository can be
used as long as changes were not checked into the repository:
# cd /usr/src/dpdk
# git diff –M –C
If no changes in code, no output will be generated. If the CONFIG_RTE_BUILD_COMBINE_LIBS=y flag set,
the output will be similar to the OVS DPDK patch above. The following will discard the change:
# cd git checkout -- config/common_linuxapp
Applying the DPDK changes as a patch allows the easy switching back and forth between DPDK build.
The other method is to create 2 branches with and without the CONFIG_RTE_BUILD_COMBINE_LIBS=y
flag set and jump between them.
Getting OVDK git source repository:
# cd /usr/src/
# git clone git://github.com/01org/dpdk-ovs
Check out code for testing (recommended to use more recent code):
# cd /usr/src/dpdk-ovs
# git checkout –b test_my_xx
However, for testing OVDK equivalent to data in this document:
# cd /usr/src/dpdk-ovs
# git git checkout -b test_v1.2.0-pre1 bd8e8ee7565ca7e843a43204ee24a7e1e2bf9c6f1
The OVDK master build:
#
#
#
#
cd /usr/src/dpdk
export RTE_SDK=$(pwd)
cd ../dpdk-ovs
make config && make
1. Commit ID shown here has been used to collect data in this document.
42
Intel® Open Network Platform Server
Benchmark Performance Test Report
B.4
Host Kernel Boot Configuration
The host system needs several boot parameters added or changed for optimal operation.
B.4.1
Kernel Boot Hugepage Configuration
Hugepage system makes large memory pages available for use with applications to improve program
performance by reducing the processor overhead of updating Transaction Lookaside Buffer (TLB) table
entries in the CPU less frequently. The 1 GB huge pages need to be configured on the boot command
line and cannot be changed during runtime operation. However the 2 MB pages counts can be adjusted
during runtime operation. Configuration on the kernel bootline allows the system to support both 1 GB
hugepages and 2 MB hugepages at the same time.
The huge page configuration can be added to the default configuration file /etc/default/grub by adding
to the GRUB_CMDLINE_LINUX and the grub configuration file regenerated to get an updated configuration
file for Linux boot.
# vim /etc/default/grub // edit file
. . .
GRUB_CMDLINE_LINUX_DEFAULT="... default_hugepagesz=1GB hugepagesz=1GB hugepages=16
hugepagesize=2m hugepages=2048 ..."
. . .
This sets up huge pages for both 1 GB pages for 16 GB of 1 GB hugepage memory and 2 MB pages for
4 GB of 2 MB hugepage memory. After boot the number of 1 GB pages cannot be changed, but the
number of 2 MB pages can be changed during runtime operation. Half of each hugepage set comes
from each of the NUMA nodes.
This target is setup for OVDK or OVS using two 1-GB Node 0 hugepages and two VMs. Use three 1-GB
Node 0 hugepages each for a total of eight 1 GB Node 0 hugepages used for testing. This requires 16
1-GB hugepages are allocated on the Kernel Bootline for eight 1-GB Node 0 hugepages and eight 1-GB
Node 1 hugepages.
B.4.2
Kernel Boot CPU Isolation Configuration
CPU isolation removes the CPU cores from the scheduler to prevent a CPU core targeted for particular
processing to be assigned to general processing tasks.
You need to know what CPUs are available the system. This isolates 9 CPU cores used in testing.
# vim /etc/default/grub // edit file
. . .
GRUB_CMDLINE_LINUX_DEFAULT="... isolcpus=1,2,3,4,5,6,7,8,9 ..."
. . .
43
Intel® Open Network Platform Server
Benchmark Performance Test Report
B.4.3
IOMMU Configuration
The IOMMU is needed for PCI-passthrough and for SR-IVO tests and should be disabled for all other
tests. The IOMMU is enabled by setting kernel boot IOMMU parameters:
# vim /etc/default/grub // edit file
. . .
GRUB_CMDLINE_LINUX_DEFAULT="... iommu=pt intel_iommu=on ..."
. . .
B.4.4
Generating GRUB Boot File
After editing, the grub.cfg boot file needs to be regenerated to set the new kernel boot parameters:
# grub2-mkconfig -o /boot/grub2/grub.cfg
The system needs to be rebooted for the kernel boot changes to take effect.
B.4.5
Verifying Kernel Boot Configuration
Verify that the system booted with correct kernel parameters by reviewing the kernel boot up messages
using dmesgs:
# dmesg | grep command
[ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.11.10-301.fc20.x86_64
default_hugepagesz=1G hugepagesz=1G hugepages=32 root=UUID=978575a5-45f3-4675-9e4be17f3fd0a03 ro vconsole.font=latarcyrheb-sun16 rhgb quiet LANG=en_US.UTF-8
This can be used to verify that the kernel is set correctly for the system. The dmesg only holds the last
system message created. If the kernel boot command line is not present, then search /var/log/
messages file.
B.4.6
#
#
#
#
#
#
#
44
Configuring System Variables (Host System
Configuration)
echo "# Disable Address Space Layout Randomization (ASLR)" > /etc/sysctl.d/ aslr.conf
echo "kernel.randomize_va_space=0" >> /etc/sysctl.d/aslr.conf
echo "# Enable IPv4 Forwarding" > /etc/sysctl.d/ip_forward.conf
echo "net.ipv4.ip_forward=1" >> /etc/sysctl.d/ip_forward.conf
systemctl restart systemd-sysctl.service
cat /proc/sys/kernel/randomize_va_space 0
cat /proc/sys/net/ipv4/ip_forward 0
Intel® Open Network Platform Server
Benchmark Performance Test Report
B.5
Setting Up Tests
B.5.1
OVS Throughput Tests (PHY-PHY without a VM)
Build OVS as described in previous section, reboot system, and check the kernel boot line for 1 GB
hugepage, iosolcpu setting for target cores (1,2,3,4,5,6,7,8,9) and IOMMU not enabled.
1. Install OVS kernel module and check for being present.
# modprobe openvswitch
# lsmod |grep open
openvswitch
gre
vxlan
libcrc32c
70953
13535
37334
12603
0
1 openvswitch
1 openvswitch
1 openvswitch
2. Install kernel UIO driver and DPDK UIO driver.
# modprobe uio
# insmod /usr/src/dpdk/x86_64-ivshmem-linuxapp-gcc/kmod/igb_uio.ko
3. Find target Ethernet interfaces.
# lspci -nn |grep Ethernet
03:00.0 Ethernet controller [0200]:
X540-AT2 [8086:1528] (rev 01)
03:00.1 Ethernet controller [0200]:
X540-AT2 [8086:1528] (rev 01)
06:00.0 Ethernet controller [0200]:
Network Connection [8086:10fb] (rev
06:00.1 Ethernet controller [0200]:
Network Connection [8086:10fb] (rev
Intel Corporation Ethernet Controller 10-Gigabit
Intel Corporation Ethernet Controller 10-Gigabit
Intel Corporation 82599ES 10-Gigabit SFI/SFP+
01)
Intel Corporation 82599ES 10-Gigabit SFI/SFP+
01)
4. Bind to UIO interface (need to not be configured, ifconfig xxxx down if configured).
# /usr/src/dpdk/tools/dpdk_nic_bind.py --bind=igb_uio 06:00.0
# /usr/src/dpdk/tools/dpdk_nic_bind.py --bind=igb_uio 06:00.1
# /usr/src/dpdk/tools/dpdk_nic_bind.py –status
5. Mount hugepage file systems.
#
#
#
#
#
mount -t hugetlbfs nodev /dev/hugepages
mkdir /dev/hugepages_2mb
mount -t hugetlbfs nodev /dev/hugepages_2mb -o pagesize=2MB
cat /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
echo 2048 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
6. Terminate OVS, clear previous OVS or OVDK database and setup for new database.
#
#
#
#
pkill -9 ovs
rm -rf /usr/local/var/run/openvswitch/
rm -rf /usr/local/etc/openvswitch/
rm -f /tmp/conf.db
# mkdir -p /usr/local/etc/openvswitch
# mkdir -p /usr/local/var/run/openvswitch
7. Initialize new OVS database.
# cd /usr/src/ovs
# ./ovsdb/ovsdb-tool create /usr/local/etc/openvswitch/conf.db ./vswitchd/vswitch.ovsschema
8. Start OVS database server.
# cd /usr/src/ovs
# ./ovsdb/ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock
--remote=db:Open_vSwitch,Open_vSwitch,manager_options --private
key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate
--bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile --detach
45
Intel® Open Network Platform Server
Benchmark Performance Test Report
9. Initialize OVS database.
# cd /usr/src/ovs
# ./utilities/ovs-vsctl --no-wait init
10. Start OVS with DPDP portion using 1 GB of Node 0 memory.
# cd /usr/src/ovs
# ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 -unix:/usr/local/var/run/openvswitch/db.sock --pidfile –detach
11. Create OVS DPDK Bridge and add the two physical NICs.
#
#
#
#
cd /usr/src/ovs
./utilities/ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
./utilities/ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
./utilities/ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk
# ./utilities/ovs-vsctl show
12. Set OVS port to port flows for NIC 0 endpoint IP Address of 1.1.1.1 and NIC 0 endpoint IP Address
of 5.1.1.1 using script.
#! /bin/sh
# Move to command directory
cd /usr/src/ovs/utilities/
# Clear current flows
./ovs-ofctl del-flows br0
# Add Flow for port 0 to port 1 and port 1 to port 0
./ovs-ofctl add-flow br0
in_port=1,dl_type=0x800,nw_src=1.1.1.1,nw_dst=5.1.1.1,idle_timeout=0,action=output:2
./ovs-ofctl add-flow br0
in_port=2,dl_type=0x800,nw_src=5.1.1.1,nw_dst=1.1.1.1,idle_timeout=0,action=output:1
# ./ovs_flow_ports.sh
13. Check load to verify CPU load for the OVS PMD is operating.
# top (1)
top - 18:29:24 up 37 min, 4 users, load average: 0.86, 0.72, 0.63
Tasks: 275 total, 1 running, 274 sleeping, 0 stopped, 0 zombie
%Cpu0 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
14. Affinitize DPDK PDM task to CPU core 2.
# cd /usr/src/ovs/utilities
# ./ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=2
Example of check OVS thread affinitization:
# top –p `pidof ovs-switchd` -H –d1
…
46
PID
1724
1658
1656
1657
1659
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
USER
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
PR NI
VIRT
20 0 5035380
20 0 5035380
20 0 5035380
20 0 5035380
20 0 5035380
20 0 5035380
20 0 5035380
20 0 5035380
20 0 5035380
20 0 5035380
20 0 5035380
20 0 5035380
20 0 5035380
20 0 5035380
20 0 5035380
20 0 5035380
20 0 5035380
RES
6100
6100
6100
6100
6100
6100
6100
6100
6100
6100
6100
6100
6100
6100
6100
6100
6100
SHR
4384
4384
4384
4384
4384
4384
4384
4384
4384
4384
4384
4384
4384
4384
4384
4384
4384
S %CPU %MEM
R 99.7 0.0
S 11.0 0.0
S 0.0 0.0
S 0.0 0.0
S 0.0 0.0
S 0.0 0.0
S 0.0 0.0
S 0.0 0.0
S 0.0 0.0
S 0.0 0.0
S 0.0 0.0
S 0.0 0.0
S 0.0 0.0
S 0.0 0.0
S 0.0 0.0
S 0.0 0.0
S 0.0 0.0
TIME+
7:37.19
1:40.35
0:01.37
0:00.00
0:00.00
0:00.00
0:00.00
0:00.00
0:00.00
0:00.00
0:00.00
0:00.00
0:00.00
0:00.00
0:00.00
0:00.00
0:00.00
COMMAND
pmd62
urcu2
ovs-vswitchd
dpdk_watchdog1
cuse_thread3
handler60
handler61
handler39
handler38
handler37
handler36
handler33
handler34
handler35
handler40
handler46
handler47
Intel® Open Network Platform Server
Benchmark Performance Test Report
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
# taskset -p 1724
pid 1724's current
# taskset -p 1658
pid 1658's current
# taskset -p 1656
pid 1656's current
# taskset -p 1657
pid 1657's current
# taskset -p 1659
pid 1659's current
# taskset -p 1693
pid 1693's current
# taskset -p 1694
pid 1694's current
# taskset -p 1695
pid 1695's current
# taskset -p 1696
pid 1696's current
# taskset -p 1697
pid 1697's current
# taskset -p 1698
pid 1698's current
# taskset -p 1699
pid 1699's current
# taskset -p 1700
pid 1700's current
# taskset -p 1701
pid 1701's current
# taskset -p 1702
pid 1702's current
# taskset -p 1703
pid 1703's current
# taskset -p 1704
pid 1704's current
# taskset -p 1705
pid 1705's current
# taskset -p 1706
pid 1706's current
# taskset -p 1707
pid 1707's current
# taskset -p 1708
pid 1708's current
# taskset -p 1709
pid 1709's current
# taskset -p 1710
pid 1710's current
# taskset -p 1711
pid 1711's current
# taskset -p 1712
pid 1712's current
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
5035380
5035380
5035380
5035380
5035380
5035380
5035380
5035380
5035380
5035380
5035380
5035380
5035380
5035380
5035380
5035380
6100
6100
6100
6100
6100
6100
6100
6100
6100
6100
6100
6100
6100
6100
6100
6100
4384
4384
4384
4384
4384
4384
4384
4384
4384
4384
4384
4384
4384
4384
4384
4384
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0:00.00
0:00.00
0:00.00
0:00.00
0:00.00
0:00.00
0:00.00
0:00.00
0:00.02
0:00.01
0:00.00
0:00.00
0:00.00
0:00.01
0:00.00
0:00.01
handler48
handler45
handler50
handler49
handler41
handler44
handler42
handler43
revalidator55
revalidator51
revalidator56
revalidator58
revalidator52
revalidator57
revalidator54
revalidator53
affinity mask: 2
affinity mask: 4
affinity mask: 4
affinity mask: 4
affinity mask: 4
affinity mask: 4
affinity mask: 4
affinity mask: 4
affinity mask: 4
affinity mask: 4
affinity mask: 4
affinity mask: 4
affinity mask: 4
affinity mask: 4
affinity mask: 4
affinity mask: 4
affinity mask: 4
affinity mask: 4
affinity mask: 4
affinity mask: 4
affinity mask: 4
affinity mask: 4
affinity mask: 4
affinity mask: 4
affinity mask: 4
47
Intel® Open Network Platform Server
Benchmark Performance Test Report
# taskset -p 1713
pid 1713's current
# taskset -p 1714
pid 1714's current
# taskset -p 1715
pid 1715's current
# taskset -p 1716
pid 1716's current
# taskset -p 1717
pid 1717's current
# taskset -p 1718
pid 1718's current
# taskset -p 1719
pid 1719's current
# taskset -p 1720
pid 1720's current
Note:
affinity mask: 4
affinity mask: 4
affinity mask: 4
affinity mask: 4
affinity mask: 4
affinity mask: 4
affinity mask: 4
affinity mask: 4
OVS task affiliation will change in the future.
B.5.2
Intel DPDK Accelerated vSwitch Throughput Tests
(PHY-PHY without a VM)
1. Start with a clean system.
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
cd <Intel_DPDK_Accelerated_vSwitch_INSTALL_DIR>
pkill -9 ovs
rm -rf /usr/local/var/run/openvswitch/
rm -rf /usr/local/etc/openvswitch/
mkdir -p /usr/local/var/run/openvswitch/
mkdir -p /usr/local/etc/openvswitch/
rm -f /tmp/conf.db
rmmod vhost-net
rm -rf /dev/vhost-net
rmmod ixgbe
rmmod uio
rmmod igb_uio
umount /sys/fs/cgroup/hugetlb
umount /dev/hugepages
umount /mnt/huge
2. Mount hugepage filesystem on host.
# mount -t hugetlbfs nodev /dev/hugepages
# mount|grep huge
3. Install the DPDK kernel modules.
# cd <Intel_DPDK_Accelerated_vSwitch_INSTALL_DIR>
# modprobe uio
# insmod <DPDK_INSTALL_DIR>/x86_64-ivshmem-linuxapp-gcc/kmod/igb_uio.ko
4. Bind the physical interfaces to igb_uio driver. For example, if the PCI addresses on the setup are
08:00.0 and 08:00.1 the below command shows how to bind them to the DPDK driver. Replace the
PCI addresses according to the setup.
# ./dpdk*/tools/dpdk_nic_bind.py --status
# ./dpdk*/tools/dpdk_nic_bind.py --bind=igb_uio 08:00.0
# ./dpdk*/tools/dpdk_nic_bind.py --bind=igb_uio 08:00.1
5. Create an Intel® DPDK Accelerated vSwitch database file.
# ./ovsdb/ovsdb-tool create /usr/local/etc/openvswitch/conf.db
<OVS_INSTALL_DIR/openvswitch/vswitchd/vswitch.ovsschema
48
Intel® Open Network Platform Server
Benchmark Performance Test Report
6. Run the ovsdb-server.
# ./ovsdb/ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock
--remote=db:Open_vSwitch,Open_vSwitch,manager_options &
7. Create Intel® DPDK Accelerated vSwitch bridge, add physical interfaces (type: dpdkphy) to the
bridge:
# ./utilities/ovs-vsctl --no-wait add-br br0 -- set Bridge br0 datapath_type=dpdk
# ./utilities/ovs-vsctl --no-wait add-port br0 port1 -- set Interface port1
type=dpdkphy ofport_request=1 option:port=0
# ./utilities/ovs-vsctl --no-wait add-port br0 port2 -- set Interface port2
type=dpdkphy ofport_request=1 option:port=1
The output should be:
---------------------------------------------00000000-0000-0000-0000-000000000000
Bridge "br0"
Port "br0"
Interface "br0"
type: internal
Port "port1"
Interface
"port1"
type: dpdkphy
options: {port="0"}
Port "port2"
Interface
"port2"
type: dpdkphy
options: {port="1"}
---------------------------------------------8. Start the Intel® DPDK Accelerated vSwitch for the PHY-PHY throughput test.
# ./datapath/dpdk/ovs-dpdk -c 0x0F -n 4 --proc-type primary --socket-mem 4096 --p 0x03 --stats_core 0 --stats_int 5
Exit the process using CTRL-C. Copy the valid address on the system from the stdout output of the
above command. For example, EAL: virtual area found at 0x7F1740000000 (size = 0x80000000)
and use it for the base-virtaddr parameter of the following command:
# ./datapath/dpdk/ovs-dpdk -c 0x0F -n 4 --proc-type primary --basevirtaddr=0x7f9a40000000 --socket-mem 4096 -- -p 0x03 --stats_core 0 --stats_int 5
9. Run the vswitchd daemon.
# ./vswitchd/ovs-vswitchd -c 0x100 --proc-type=secondary -- --pidfile=/tmp/vswitchd.pid
10. Using the ovs-ofctl utility, add flows for bidirectional test. An example below shows how to add
flows when the source and destination IPs are 1.1.1.1 and 6.6.6.2 for a bidirectional use case.
# ./utilities/ovs-ofctl del-flows br0
# ./utilities/ovs-ofctl add-flow br0
in_port=1,dl_type=0x0800,nw_src=1.1.1.1,nw_dst=6.6.6.2,idle_timeout=0,action=output:2
# ./utilities/ovs-ofctl add-flow br0 in_port=2,dl_type=0x0800,nw_src=6.6.6.2,
nw_dst=1.1.1.1,idle_timeout=0,action=output:1
# ./utilities/ovs-ofctl dump-flows br0
# ./utilities/ovs-vsctl --no-wait -- set Open_vSwitch.other_config:n-handler-threads=1
49
Intel® Open Network Platform Server
Benchmark Performance Test Report
B.5.3
Intel DPDK Accelerated vSwitch with User Space
vHost
B.5.3.1
Host Configuration for Running a VM with User Space vHost
1. Start with a clean system.
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
cd <Intel_DPDK_Accelerated_vSwitch_INSTALL_DIR>
pkill -9 ovs
rm -rf /usr/local/var/run/openvswitch/
rm -rf /usr/local/etc/openvswitch/
mkdir -p /usr/local/var/run/openvswitch/
mkdir -p /usr/local/etc/openvswitch/
rm -f /tmp/conf.db
rmmod vhost-net
rm -rf /dev/vhost-net
rmmod ixgbe
rmmod uio
rmmod igb_uio
umount /sys/fs/cgroup/hugetlb
umount /dev/hugepages
umount /mnt/huge
2. Mount hugepage filesystem on host.
# mount -t hugetlbfs nodev /dev/hugepages
# mount|grep huge
3. Install the DPDK, cuse and eventfd kernel modules.
#
#
#
#
#
cd <Intel_DPDK_Accelerated_vSwitch_INSTALL_DIR>
modprobe cuse
insmod ./openvswitch/datapath/dpdk/fd_link/fd_link.ko
modprobe uio
insmod <DPDK_INSTALL_DIR>/x86_64-ivshmem-linuxapp-gcc/kmod/igb_uio.ko
4. Bind/Unbind to/from the igb_uio module.
# <DPDK_INSTALL_DIR>/tools/dpdk_nic_bind.py --status Network devices using IGB_UIO driver
====================================
<none>
Network devices using kernel driver
===================================
0000:01:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' if=p1p1 drv=ixgbe
unused=igb_uio
0000:01:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection' if=p1p2 drv=ixgbe
unused=igb_uio
0000:08:00.0 'I350 Gigabit Network Connection' if=em0 drv=igb unused=igb_uio *Active*
0000:08:00.1 'I350 Gigabit Network Connection' if=enp8s0f1 drv=igb unused=igb_uio
0000:08:00.2 'I350 Gigabit Network Connection' if=enp8s0f2 drv=igb unused=igb_uio
0000:08:00.3 'I350 Gigabit Network Connection' if=enp8s0f3 drv=igb unused=igb_uio
Other network devices
=====================
<none>
To bind devices p1p1 and p1p2, (01:00.0 and 01:00.1), to the igb_uio driver.
# <DPDK_INSTALL_DIR>/tools/dpdk_nic_bind.py --bind=igb_uio 01:00.0
# <DPDK_INSTALL_DIR>/tools/dpdk_nic_bind.py --bind=igb_uio 01:00.1
# <DPDK_INSTALL_DIR>/tools/dpdk_nic_bind.py --status Network devices using IGB_UIO driver
====================================
0000:01:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' drv=igb_uio unused=
0000:01:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection' drv=igb_uio unused=
Network devices using kernel driver
50
Intel® Open Network Platform Server
Benchmark Performance Test Report
===================================
0000:08:00.0 'I350 Gigabit Network Connection'
0000:08:00.1 'I350 Gigabit Network Connection'
0000:08:00.2 'I350 Gigabit Network Connection'
0000:08:00.3 'I350 Gigabit Network Connection'
Other network devices
=====================
<none>
if=em0 drv=igb unused=igb_uio *Active*
if=enp8s0f1 drv=igb unused=igb_uio
if=enp8s0f2 drv=igb unused=igb_uio
if=enp8s0f3 drv=igb unused=igb_uio
5. Create an Intel® DPDK Accelerated vSwitch database file.
# ./ovsdb/ovsdb-tool create /usr/local/etc/openvswitch/conf.db
<OVS_INSTALL_DIR/openvswitch/vswitchd/vswitch.ovsschema
6. Run the ovsdb-server.
# ./ovsdb/ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock
-- remote=db:Open_vSwitch,Open_vSwitch,manager_options &
7. Create Intel® DPDK Accelerated vSwitch bridge, add physical interfaces (type: dpdkphy) and vHost
interfaces (type: dpdkvhost) to the bridge.
# ./utilities/ovs-vsctl --no-wait add-br br0 -- set Bridge br0 datapath_type=dpdk
# ./utilities/ovs-vsctl --no-wait add-port br0 port1 -- set Interface port1
type=dpdkphy ofport_request=1 option:port=0
# ./utilities/ovs-vsctl --no-wait add-port br0 port2 -- set Interface port2
type=dpdkphy ofport_request=2 option:port=1
# ./utilities/ovs-vsctl --no-wait add-port br0 port3 -- set Interface port3
type=dpdkvhost ofport_request=3
# ./utilities/ovs-vsctl --no-wait add-port br0 port4 -- set Interface port4
type=dpdkvhost ofport_request=4
# ./utilities/ovs-vsctl show
The output should be:
---------------------------------------------00000000-0000-0000-0000-000000000000
Bridge "br0"
Port "br0"
Interface "br0"
type: internal
Port "port16"
Interface "port1"
type: dpdkphy
options: {port="1"}
Port "port17"
Interface "port2"
type: dpdkphy
options: {port="2"}
Port "port80"
Interface "port3"
type: dpdkvhost
Port "port81"
Interface "port4"
type: dpdkvhost
---------------------------------------------8. Start the Intel® DPDK Accelerated vSwitch for 1 VM with two user space vHost interfaces.
# ./datapath/dpdk/ovs-dpdk -c 0x0F -n 4 --proc-type primary --socket-mem 4096
-- -p 0x03 --stats_core 0 --stats_int 5
51
Intel® Open Network Platform Server
Benchmark Performance Test Report
Exit the process using CTRL-C. Copy the valid address on the system from the stdout output of the
above command. For example, EAL: virtual area found at 0x7F1740000000 (size = 0x80000000).
Use it for the base-virtaddr parameter of the following command:
# ./datapath/dpdk/ovs-dpdk -c 0x0F -n 4 --proc-type primary --basevirtaddr=0x7f9a40000000 --socket-mem 4096 -- -p 0x03 --stats_core 0 --stats_int 5
9. Run the vswitchd daemon.
# ./vswitchd/ovs-vswitchd -c 0x100 --proc-type=secondary -- --pidfile=/tmp/vswitchd.pid
10. Using the ovs-ofctl utility, add flows entries to switch packets from port 2 to port 4, port 3 to port
1 on ingress and egress path. An example below shows how to add flows for source and destination
IPs are 1.1.1.1 and 6.6.6.2 for a bidirectional case.
# ./utilities/ovs-ofctl del-flows br0
# ./utilities/ovs-ofctl add-flow br0 in_port=2,dl_type=0x0800,nw_src=1.1.1.1,nw_dst=6.6.6.2,
idle_timeout=0,action=output:4
# ./utilities/ovs-ofctl add-flow br0 in_port=3,dl_type=0x0800,nw_src=1.1.1.1,nw_dst=6.6.6.2,
idle_timeout=0,action=output:1
# ./utilities/ovs-ofctl add-flow br0 in_port=4,dl_type=0x0800,nw_src=6.6.6.2,nw_dst=1.1.1.1,
idle_timeout=0,action=output:2
# ./utilities/ovs-ofctl add-flow br0 in_port=1,dl_type=0x0800,nw_src=6.6.6.2,nw_dst=1.1.1.1,
idle_timeout=0,action=output:3
# ./utilities/ovs-ofctl dump-flows br0
11. Copy DPDK source to a shared directory to be passed to the VM.
#
#
#
#
#
rm -rf /tmp/qemu_share
mkdir -p /tmp/qemu_share
mkdir -p /tmp/qemu_share/DPDK
chmod 777 /tmp/qemu_share
cp -aL <DPDK_INSTALL_DIR>/* /tmp/qemu_share/DPDK
12. Start the VM using qemu command.
# cd <Intel_DPDK_Accelerated_vSwitch_INSTALL_DIR>
# taskset 0x30 ./qemu/x86_64-softmmu/qemu-system-x86_64 -cpu host -boot c -hda
<VM_IMAGE_LOCATION> -m 4096 -smp 2 --enable-kvm -name 'client 1' -nographic -vnc
:12 -pidfile /tmp/vm1.pid -drive file=fat:rw:/tmp/qemu_share -monitor unix:/tmp
vm1monitor,server,nowait -net none -no-reboot -mem-path /dev/hugepages -memprealloc -netdev type=tap,id=net1,script=no,downscript=no,ifname=port80,vhost=on
-device virtio-netpci,netdev=net1,mac=00:00:00:00:00:01,csum=off,gso=off,guest_tso4=off,guest_tso6=
off,guest_ecn=off -netdev
type=tap,id=net2,script=no,downscript=no,ifname=port81,vhost=on -device virtio
net-pci,netdev=net2,mac=00:00:00:00:00:02,csum=off,gso=off,guest_tso4=off,
guest_tso6=off,guest_ecn=off
The VM can be accessed using a VNC client at the port mentioned in the qemu startup command, which
is port 12 in this instance.
B.5.3.2
Guest Configuration to Run DPDK Sample Forwarding
Application with User Space vHost
1. Install DPDK on the VM using the source from the shared drive.
#
#
#
#
#
#
#
#
#
52
mkdir -p /mnt/vhost
mkdir -p /root/vhost
mount -o iocharset=utf8 /dev/sdb1 /mnt/vhost
cp -a /mnt/vhost/* /root/vhost
export RTE_SDK=/root/vhost/DPDK
export RTE_TARGET=x86_64-ivshmem-linuxapp-gcc
cd /root/vhost/DPDK
make uninstall
make install T=x86_64-ivshmem-linuxapp-gcc
Intel® Open Network Platform Server
Benchmark Performance Test Report
2. Build the DPDK vHost application testpmd.
# cd /root/vhost/DPDK/app/test-pmd
# make clean
# make
3. Run the testpmd application after installing the UIO drivers.
#
#
#
#
#
modprobe uio
echo 1280 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/ nr_hugepages
insmod /root/vhost/DPDK/x86_64-ivshmem-linuxapp-gcc/kmod/igb_uio.ko
/root/vhost/DPDK/tools/dpdk_nic_bind.py --bind igb_uio 0000:00:03.0 0000:00:04.0
./testpmd -c 0x3 -n 4 --socket-mem 128 -- --burst=64 -i
At the prompt enter the following to start the application:
testpmd> set fwd mac_retry testpmd> start
B.5.4
OVS with User Space vHost
Build OVS as described in previous section, reboot system, and check the kernel boot line for 1 GB
hugepage, iosolcpu setting for target cores (1,2,3,4,5,6,7,8,9) and IOMMU not enabled.
1. Check DPDK for correct build and build OVS with DPDK and eventfd module if not already built
2. Check system boot for valid Hugepage and isolcpus settings.
# dmesg|grep command
[
0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.15.6-200.fc20.x86_64
root=UUID=27da0816-bbbd-4b28-acaf-993939cfa758 ro default_hugepagesz=1GB hugepagesz=1GB
hugepages=8 hugepagesz=2M hugepages=2048 isolcpus=1,2,3,4,5,6,7,8,9
vconsole.font=latarcyrheb-sun16 rhgb quiet
3. Remove vhost-net module and device directory.
rmmod vhost-net
rm /dev/vhost-net
4. Mount hugepage file systems.
# mount -t hugetlbfs nodev /dev/hugepages
# mkdir /dev/hugepages_2mb
# mount -t hugetlbfs nodev /dev/hugepages_2mb -o pagesize=2MB
# cat /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
# echo 2048 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
5. Install OVS kernel module.
# modprobe openvswitch
# lsmod |grep open
openvswitch
gre
vxlan
libcrc32c
70953
13535
37334
12603
0
1 openvswitch
1 openvswitch
1 openvswitch
6. Install cuse and fuse kernel modules.
# modprobe cuse
# modprobe fuse
7. Install eventfd kernel module.
# modprobe /usr/src/ovs/utilities/eventfd_link/eventfd_link.ko
8. Install UIO and DPDK UIO kernel modules.
# modprobe uio
# insmod /usr/src/dpdk/x86_64-ivshmem-linuxapp-gcc/kmod/igb_uio.ko
53
Intel® Open Network Platform Server
Benchmark Performance Test Report
9. Find target Ethernet interfaces.
# lspci -nn |grep Ethernet
03:00.0 Ethernet controller
AT2 [8086:1528] (rev 01)
03:00.1 Ethernet controller
AT2 [8086:1528] (rev 01)
06:00.0 Ethernet controller
Connection [8086:10fb] (rev
06:00.1 Ethernet controller
Connection [8086:10fb] (rev
[0200]: Intel Corporation Ethernet Controller 10-Gigabit X540[0200]: Intel Corporation Ethernet Controller 10-Gigabit X540[0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network
01)
[0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network
01)
10. Bind to UIO interface (need to not be configured, ifconfig xxxx down if configured).
# /usr/src/dpdk/tools/dpdk_nic_bind.py --bind=igb_uio 06:00.0
# /usr/src/dpdk/tools/dpdk_nic_bind.py --bind=igb_uio 06:00.1
# /usr/src/dpdk/tools/dpdk_nic_bind.py -status
11. Terminate OVS, clear previous OVS or OVDK database and setup for new database.
#
#
#
#
pkill -9 ovs
rm -rf /usr/local/var/run/openvswitch/
rm -rf /usr/local/etc/openvswitch/
rm -f /tmp/conf.db
# mkdir -p /usr/local/etc/openvswitch
# mkdir -p /usr/local/var/run/openvswitch
12. Initialize new OVS database.
# cd /usr/src/ovs
# ./ovsdb/ovsdb-tool create /usr/local/etc/openvswitch/conf.db ./vswitchd/vswitch.ovsschema
13. Start OVS database server.
# cd /usr/src/ovs
# ./ovsdb/ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock
--remote=db:Open_vSwitch,Open_vSwitch,manager_options --privatekey=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate
--bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile --detach
14. Initialize OVS database.
# cd /usr/src/ovs
# ./utilities/ovs-vsctl --no-wait init
15. Start OVS with DPDP portion using 1 GB or 2 GB of Node 0 memory/
# cd /usr/src/ovs
# ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 2048,0
-- unix:/usr/local/var/run/openvswitch/db.sock --pidfile -detach
16. Set DPDK PDM thread affinity (persistent if OVS database not cleared).
# cd /usr/src/ovs
./utilities/ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=2
17. Create OVS DPDK Bridge and add the two physical NICs and user side vHost interfaces.
#
#
#
#
#
#
54
cd /usr/src/ovs
./utilities/ovs-vsctl
./utilities/ovs-vsctl
./utilities/ovs-vsctl
./utilities/ovs-vsctl
./utilities/ovs-vsctl
add-br br0 -- set bridge br0 datapath_type=netdev
add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk
add-port br0 dpdkvhost0 -- set Interface dpdkvhost0
add-port br0 dpdkvhost1 -- set Interface dpdkvhost1
Intel® Open Network Platform Server
Benchmark Performance Test Report
The ovs-vswitchd needs to be running and each added in the order after the database has been
cleared to establish dpdk0 as port 1, dpdk1 as port 2, dpdkvhost0 as port 3, and dpdkvhost1 as
port 4 for the flow example entries to be correct. If the database was not cleared the port ids
assignment remain and the port ids often are not in the expected order. In that case the flow
commands need to be adjusted or the OVS terminated, OVS database cleared, OVS restarted and
bridge created in correct order.
18. Check OVS DPDK bridge
# ./utilities/ovs-vsctl show
[root@F20-v3 ovs]# ./utilities/ovs-vsctl show
c2669cc3-55ac-4f62-854c-b88bf5b66c10
Bridge "br0"
Port "dpdk0"
Interface "dpdk0"
type: dpdk
Port "dpdkvhost0"
Interface "dpdkvhost0"
type: dpdkvhost
Port "dpdk1"
Interface "dpdk1"
type: dpdk
Port "br0"
Interface "br0"
type: internal
Port "dpdkvhost1"
Interface "dpdkvhost1"
type: dpdkvhost
Note:
UUID, c2669cc3-55ac-4f62-854c-b88bf5b66c10 for this instance, and is different for
each instance.
19. Apply the vhost to VM vhost flow operations.
# ./ovs_flow_vm_vhost.sh
# cat ovs_flow_vm_vhost.sh
#! /bin/sh
# Move to command directory
cd /usr/src/ovs/utilities/
# Clear current flows
./ovs-ofctl del-flows br0
# Add Flow for port 0 (16) to port 1 (17)
./ovs-ofctl add-flow br0
in_port=2,dl_type=0x800,nw_dst=5.1.1.1,idle_timeout=0,action=output:4
./ovs-ofctl add-flow br0
in_port=1,dl_type=0x800,nw_src=5.1.1.1,idle_timeout=0,action=output:3
./ovs-ofctl add-flow br0
in_port=4,dl_type=0x800,nw_src=5.1.1.1,idle_timeout=0,action=output:2
./ovs-ofctl add-flow br0
in_port=3,dl_type=0x800,nw_dst=5.1.1.1,idle_timeout=0,action=output:1
In this case, the end point port 2 has fixed IP endpoint 5.1.1.1 while the port 1 end point has
multiple endpoints IPs starting at IP 1.1.1.1 and incrementing up to half the bidirectional flows of
the test. This allows a large number of flows to be set using a small number of inexact flow
commands. The nw_src and nd_dst can also be deleted and would result in the same flows, but
without any IP matches on the inexact flow commands.
55
Intel® Open Network Platform Server
Benchmark Performance Test Report
B.5.4.1
VM Startup for (OVS User Space vHost) Throughput Test
The following test script is sample manual VM startup for vhost throughout test. 1 standard Linux
network interface for control and test interaction and 2 User side vHost compatible network interfaces
for attaching to DPDK bridge ports 3 and 4.This assumes that Linux network bridge “br-mgt” has been
created on the management network. The VM is started with 3 GB of memory using the 1 GB hugepage
memory. Three 1 GB pages is minimum of 1 GB hugepage memory that will have one 1 GB hugepage
available for use in VM.
# cat vm_vhost_start.sh
#!/bin/sh
vm=/vm/Fed20-mp.qcow2
vm_name="vhost-test"
vnc=10
n1=tap46
bra=br-mgt
dn_scrp_a=/vm/vm_ctl/br-mgt-ifdown
mac1=00:1f:33:16:64:44
if [ ! -f $vm ];
then
echo "VM $vm not found!"
else
echo "VM $vm started! VNC: $vnc, management network: $n1"
tunctl -t $n1
brctl addif $bra $n1
ifconfig $n1 0.0.0.0 up
taskset 0x30 qemu-system-x86_64 -cpu host -hda $vm -m 3072 -boot c -smp 2 -pidfile
/tmp/vm1.pid -monitor unix:/tmp/vm1monitor,server,nowait -mem-path /dev/hugepages mem-prealloc -enable-kvm -net nic,model=virtio,netdev=eth0,macaddr=$mac1 -netdev
tap,ifname=$n1,id=eth0,script=no,downscript=$dn_scrp_a -netdev
type=tap,id=net1,script=no,downscript=no,ifname=port3,vhost=on -device virtio-netpci,netdev=net1,mac=00:00:00:00:00:01,csum=off,gso=off,guest_tso4=off,guest_tso6=off
,guest_ecn=off -netdev type=tap,id=net2,script=no,downscript=no,ifname=port4,vhost=on
-device virtio-netpci,netdev=net2,mac=00:00:00:00:00:02,csum=off,gso=off,guest_tso4=off,guest_tso6=off
,guest_ecn=off -name $vm_name -vnc :$vnc &
fi
Script shutdown support script:
# cat br-mgt-ifdown
#!/bin/sh
bridge='br-mgt'
/sbin/ifconfig $1 0.0.0.0 down
brctl delif ${bridge} $1
56
Intel® Open Network Platform Server
Benchmark Performance Test Report
B.5.4.2
Operating VM for (OVS User Space vHost) Throughput Test
1. The VM kernel bootline needs to have 1 GB hugepage memory configured and vCPU 1 isolated.
Check for correct bootline parameters, change if needed and reboot.
# dmesg |grep command
[
0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.15.6-200.fc20.x86_64
root=UUID=6635b8e3-0312-4892-92e7-ccff453ec06d ro default_hugepagesz=1GB
hugepagesz=1G hugepages=1 isolcpus=1 vconsole.font=latarcyrheb-sun16 rhgb quiet
LANG=en_US.UTF-8
2. Get DPDK package (same as host).
# cd /root
# git clone git://dpdk.org/dpdk
# cd /root/dpdk
3. Need to checkout the target DPDK version.
# git checkout -b test_v1.7.1 v1.7.1
4. The CONFIG_RTE_BUILD_COMBINE_LIBS=y needs to be set in config/common_linuxapp file. It can be
changed by using a text editor or using a patch as discussed earlier in this document.
# make install T=x86_64-ivshmem-linuxapp-gcc
5. Build the vHost PMD application.
#
#
#
#
cd /root/dpdk/app/test-pmd/
export RTE_SDK=$(pwd)
export RTE_TARGET=x86_64-ivshmem-linuxapp-gcc
make
CC testpmd.o
CC parameters.o
CC cmdline.o
CC config.o
CC iofwd.o
CC macfwd.o
CC macfwd-retry.o
CC macswap.o
CC flowgen.o
CC rxonly.o
CC txonly.o
CC csumonly.o
CC icmpecho.o
CC mempool_anon.o
LD testpmd
INSTALL-APP testpmd
INSTALL-MAP testpmd.map
6. Setup hugepage filesystem.
# mkdir -p /mnt/hugepages
# mount -t hugetlbfs hugetlbfs /mnt/hugepage
7. Load the UIO kernel modules.
# modprobe uio
# insmod /root/dpdk/x86_64-ivshmem-linuxapp-gcc/kmod/igb_uio.ko
8. Find the passed Userside network devices. The order is in the order listed in the QEMU startup
command line. The first network device is the management network, the second and third network
device is the Userside vHost devices.
# lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
57
Intel® Open Network Platform Server
Benchmark Performance Test Report
00:01.3
00:02.0
00:03.0
00:04.0
00:05.0
Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
VGA compatible controller: Cirrus Logic GD 5446
Ethernet controller: Red Hat, Inc Virtio network device
Ethernet controller: Red Hat, Inc Virtio network device
Ethernet controller: Red Hat, Inc Virtio network device
9. Bind the Userside vHost devices to the igb_uio device driver.
# /root/dpdk/tools/dpdk_nic_bind.py -b igb_uio 0000:00:04.0
# /root/dpdk/tools/dpdk_nic_bind.py -b igb_uio 0000:00:05.0
10. Check bind status to make sure PCI 0000:00:04.0 and 0000:00:05.0 bound to igb_uio device
driver.
# /root/dpdk/tools/dpdk_nic_bind.py --status
Network devices using DPDK-compatible driver
============================================
0000:00:04.0 'Virtio network device' drv=igb_uio unused=virtio_pci
0000:00:05.0 'Virtio network device' drv=igb_uio unused=virtio_pci
Network devices using kernel driver
===================================
0000:00:03.0 'Virtio network device' if= drv=virtio-pci unused=virtio_pci,igb_uio
Other network devices
=====================
<none>
11. Run test-pmd application.
# cd /root/dpdk/app/test-pmd/
# ./testpmd -c 0x3 -n 4 --socket-mem 128 -- --burst=64 -i --txqflags=0xf00
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 0 on socket 0
EAL: Support maximum 64 logical core(s) by configuration.
EAL: Detected 2 lcore(s)
EAL:
unsupported IOMMU type!
EAL: VFIO support could not be initialized
EAL: Searching for IVSHMEM devices...
EAL: No IVSHMEM configuration found!
EAL: Setting up memory...
EAL: Ask a virtual area of 0x40000000 bytes
EAL: Virtual area found at 0x7fc980000000 (size = 0x40000000)
EAL: Requesting 1 pages of size 1024MB from socket 0
EAL: TSC frequency is ~2593994 KHz
EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable clock cycles !
EAL: Master core 0 is ready (tid=7f723880)
EAL: Core 1 is ready (tid=7eee5700)
EAL: PCI device 0000:00:03.0 on NUMA socket -1
EAL:
probe driver: 1af4:1000 rte_virtio_pmd
EAL:
0000:00:03.0 not managed by UIO driver, skipping
EAL: PCI device 0000:00:04.0 on NUMA socket -1
EAL:
probe driver: 1af4:1000 rte_virtio_pmd
EAL:
PCI memory mapped at 0x7fca7f72c000
EAL: PCI device 0000:00:05.0 on NUMA socket -1
EAL:
probe driver: 1af4:1000 rte_virtio_pmd
EAL:
PCI memory mapped at 0x7fca7f72b000
Interactive-mode selected
Configuring Port 0 (socket 0)
Port 0: 00:00:00:00:00:01
Configuring Port 1 (socket 0)
Port 1: 00:00:00:00:00:02
Checking link statuses...
Port 0 Link Up - speed 10000 Mbps - full-duplex
Port 1 Link Up - speed 10000 Mbps - full-duplex
Done
testpmd>
58
Intel® Open Network Platform Server
Benchmark Performance Test Report
12. Enter forward and mac_retry commands to setup operation.
testpmd> set fwd mac_retry
Set mac_retry packet forwarding mode
13. Start forwarding operation.
testpmd> start
mac_retry packet forwarding - CRC stripping disabled - packets/burst=64
nb forwarding cores=1 - nb forwarding ports=2
RX queues=1 - RX desc=128 - RX free threshold=0
RX threshold registers: pthresh=8 hthresh=8 wthresh=0
TX queues=1 - TX desc=512 - TX free threshold=0
TX threshold registers: pthresh=32 hthresh=0 wthresh=0
TX RS bit threshold=0 - TXQ flags=0xf00
testpmd>
14. Back on host, affinitize the vCPU to available target CPU core. Review QEMU tasks.
# ps -eLF|grep qemu
root
1658
1
1658 0
4 948308 37956 4 13:29 pts/2
00:00:01 qemusystem-x86_64 -cpu host -hda /vm/Fed20-mp.qcow2 -m 3072 -boot c -smp 2 -pidfile /
tmp/vm1.pid -monitor unix:/tmp/vm1monitor,server,nowait -mem-path /dev/hugepages mem-prealloc -enable-kvm -net
nic,model=virtio,netdev=eth0,macaddr=00:1f:33:16:64:44 -netdev
tap,ifname=tap46,id=eth0,script=no,downscript=/vm/vm_ctl/br-mgt-ifdown -netdev
type=tap,id=net1,script=no,downscript=no,ifname=port3,vhost=on -device virtionetpci,netdev=net1,mac=00:00:00:00:00:01,csum=off,gso=off,guest_tso4=off,guest_tso6=
off,guest_ecn=off -netdev
type=tap,id=net2,script=no,downscript=no,ifname=port4,vhost=on -device virtionetpci,netdev=net2,mac=00:00:00:00:00:02,csum=off,gso=off,guest_tso4=off,guest_tso6=
off,guest_ecn=off -name vhost-test -vnc :10
root
1658
1
1665 1
4 948308 37956 4 13:29 pts/2
00:00:20 qemusystem-x86_64 -cpu host -hda /vm/Fed20-mp.qcow2 -m 3072 -boot c -smp 2 -pidfile
/tmp/vm1.pid -monitor unix:/tmp/vm1monitor,server,nowait -mem-path /dev/hugepages
-mem-prealloc -enable-kvm -net
nic,model=virtio,netdev=eth0,macaddr=00:1f:33:16:64:44 -netdev
tap,ifname=tap46,id=eth0,script=no,downscript=/vm/vm_ctl/br-mgt-ifdown -netdev
type=tap,id=net1,script=no,downscript=no,ifname=port3,vhost=on -device virtionetpci,netdev=net1,mac=00:00:00:00:00:01,csum=off,gso=off,guest_tso4=off,guest_tso6=
off,guest_ecn=off -netdev
type=tap,id=net2,script=no,downscript=no,ifname=port4,vhost=on -device virtionetpci,netdev=net2,mac=00:00:00:00:00:02,csum=off,gso=off,guest_tso4=off,guest_tso6=
off,guest_ecn=off -name vhost-test -vnc :10
root
1658
1
1666 89
4 948308 37956 4 13:29 pts/2
00:18:59 qemusystem-x86_64 -cpu host -hda /vm/Fed20-mp.qcow2 -m 3072 -boot c -smp 2 -pidfile /
tmp/vm1.pid -monitor unix:/tmp/vm1monitor,server,nowait -mem-path /dev/hugepages mem-prealloc -enable-kvm -net
nic,model=virtio,netdev=eth0,macaddr=00:1f:33:16:64:44 -netdev
tap,ifname=tap46,id=eth0,script=no,downscript=/vm/vm_ctl/br-mgt-ifdown -netdev
type=tap,id=net1,script=no,downscript=no,ifname=port3,vhost=on -device virtionetpci,netdev=net1,mac=00:00:00:00:00:01,csum=off,gso=off,guest_tso4=off,guest_tso6=
off,guest_ecn=off -netdev
type=tap,id=net2,script=no,downscript=no,ifname=port4,vhost=on -device virtionetpci,netdev=net2,mac=00:00:00:00:00:02,csum=off,gso=off,guest_tso4=off,guest_tso6=
off,guest_ecn=off -name vhost-test -vnc :10
root
1658
1
1668 0
4 948308 37956 4 13:29 pts/2
00:00:00 qemusystem-x86_64 -cpu host -hda /vm/Fed20-mp.qcow2 -m 3072 -boot c -smp 2 -pidfile /
tmp/vm1.pid -monitor unix:/tmp/vm1monitor,server,nowait -mem-path /dev/hugepages mem-prealloc -enable-kvm -net
nic,model=virtio,netdev=eth0,macaddr=00:1f:33:16:64:44 -netdev
tap,ifname=tap46,id=eth0,script=no,downscript=/vm/vm_ctl/br-mgt-ifdown -netdev
59
Intel® Open Network Platform Server
Benchmark Performance Test Report
type=tap,id=net1,script=no,downscript=no,ifname=port3,vhost=on -device virtionetpci,netdev=net1,mac=00:00:00:00:00:01,csum=off,gso=off,guest_tso4=off,guest_tso6=
off,guest_ecn=off -netdev
type=tap,id=net2,script=no,downscript=no,ifname=port4,vhost=on -device virtionetpci,netdev=net2,mac=00:00:00:00:00:02,csum=off,gso=off,guest_tso4=off,guest_tso6=
off,guest_ecn=off -name vhost-test -vnc :10
15. In this case, we see that task 1666 is accumulating CPU run time. We change the affinitization to
the target core.
# taskset -p 1666
pid 1666's current affinity mask: 30
# taskset -p 40 1666
pid 1666's current affinity mask: 30
pid 1666's new affinity mask: 40
16. Check CPU core load to make sure real-time task running on correct CPU.
# top (1)
top - 13:51:26 up
Tasks: 272 total,
%Cpu0 : 0.0 us,
%Cpu1 : 96.7 us,
%Cpu2 : 4.5 us,
%Cpu3 : 0.0 us,
%Cpu4 : 0.0 us,
%Cpu5 : 0.0 us,
%Cpu6 :100.0 us,
%Cpu7 : 0.0 us,
%Cpu8 : 0.0 us,
%Cpu9 : 0.0 us,
%Cpu10 : 0.0 us,
%Cpu11 : 0.0 us,
%Cpu12 : 0.0 us,
%Cpu13 : 0.0 us,
%Cpu14 : 0.0 us,
%Cpu15 : 0.0 us,
%Cpu16 : 0.0 us,
%Cpu17 : 0.0 us,
%Cpu18 : 0.0 us,
%Cpu19 : 0.0 us,
%Cpu20 : 0.0 us,
%Cpu21 : 0.0 us,
%Cpu22 : 0.0 us,
%Cpu23 : 0.0 us,
%Cpu24 : 0.0 us,
%Cpu25 : 0.0 us,
%Cpu26 : 0.0 us,
%Cpu27 : 0.0 us,
KiB Mem: 32813400
KiB Swap: 8191996
PID
1581
1658
1
2
3
4
5
6
60
USER
root
root
root
root
root
root
root
root
1:37, 3 users, load average: 2.16, 2.12, 1.93
1 running, 271 sleeping,
0 stopped,
0 zombie
0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
3.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si,
7.6 sy, 0.0 ni, 87.6 id, 0.0 wa, 0.3 hi, 0.0 si,
0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
total, 17556504 used, 15256896 free,
18756 buffers
total,
0 used, 8191996 free,
301248 cached
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
st
st
st
st
st
st
st
st
st
st
st
st
st
st
st
st
st
st
st
st
st
st
st
st
st
st
st
st
PR NI
VIRT
RES
SHR S %CPU %MEM
TIME+ COMMAND
20
0 10.802g
6268
4412 S 116.3 0.0 31:02.42 ovs-vswitchd
20
0 3793232 37956 20400 S 100.3 0.1 20:08.86 qemu-system-x86
20
0
47528
5080
3620 S
0.0 0.0
0:02.72 systemd
20
0
0
0
0 S
0.0 0.0
0:00.00 kthreadd
20
0
0
0
0 S
0.0 0.0
0:00.01 ksoftirqd/0
20
0
0
0
0 S
0.0 0.0
0:02.23 kworker/0:0
0 -20
0
0
0 S
0.0 0.0
0:00.00 kworker/0:0H
20
0
0
0
0 S
0.0 0.0
0:00.00 kworker/u288:0
Intel® Open Network Platform Server
Benchmark Performance Test Report
17. Set the OVS flows.
# ./ovs_flow_vm_vhost.sh
# cat ovs_flow_vm_vhost.sh
#! /bin/sh
# Move to command directory
cd /usr/src/ovs/utilities/
# Clear current flows
./ovs-ofctl del-flows br0
# Add Flow for port 0 (16) to port 1 (17)
./ovs-ofctl add-flow br0
in_port=1,dl_type=0x800,nw_dst=5.1.1.1,idle_timeout=0,action=output:3
./ovs-ofctl add-flow br0
in_port=2,dl_type=0x800,nw_src=5.1.1.1,idle_timeout=0,action=output:4
./ovs-ofctl add-flow br0
in_port=3,dl_type=0x800,nw_src=5.1.1.1,idle_timeout=0,action=output:1
./ovs-ofctl add-flow br0
in_port=4,dl_type=0x800,nw_dst=5.1.1.1,idle_timeout=0,action=output:2
B.5.5
SR-IOV VM Test
The SR-IOV test gives Niantic 10 GbE SR-IOV Virtual Function interfaces to a VM to do a network
throughput test of the VM using SR-IOV network interface.
1. Make sure the host has boot line configuration for 1 GB hugepage memory, CPU isolation, and the
Intel IOMMU is on and running in PT mode.
# dmesg |grep command
[
0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.15.6-200.fc20.x86_64
root=UUID=27da0816-bbbd-4b28- acaf-993939cfa758 ro default_hugepagesz=1GB hugepagesz=1GB
hugepages=16 hugepagesz=2M hugepages=2048 isolcpus=1,2 ,3,4,5,6,7,8,9 iommu=pt
intel_iommu=on vconsole.font=latarcyrheb-sun16 rhgb quiet
2. The hugepage file systems are initialized.
# mount -t hugetlbfs nodev /dev/hugepages
# mkdir /dev/hugepages_2mb
# mount -t hugetlbfs nodev /dev/hugepages_2mb -o pagesize=2MB
3. Black list the IXGBE virtual function device driver to prevent host loading and VF NIC configuration
by adding it to /etc/modprobe.d/blacklist.conf.
cat /etc/modprobe.d/blacklist.conf
. . .
# Intel ixgbe sr-iov vf (virtual driver)
blacklist ixgbevf
4. Uninstall and reinstall the IXGBE 10 GbE Niantic NIC device driver to enable SR_IOV and create the
SR-IOV interfaces.
# modprobe -r ixgbe
# modprobe ixgbe max_vfs=1
# service network restart
In this case only one SR-IOV VF (Virtual Function) was created per PF (Physical Function) using
max_vfs=1, but often more than 1 is desired. The network needs to be restarted afterward.
61
Intel® Open Network Platform Server
Benchmark Performance Test Report
5. Bring up and configure the Niantic PF interfaces to allow the SR-IOV interfaces to be used. Leaving
the PF unconfigured will cause the SR-IOV interfaces to not be usable for the test. The Niantic 10
GbE interfaces happen to have default labeling of p786p1 and p786p2 in this example.
# ifconfig p786p1 11.1.1.225/24 up
# ifconfig p786p2 12.1.1.225/24 up
6. Find the target SR-IOV interfaces to use.
# lspci -nn |grep Ethernet
03:00.0 Ethernet controller [0200]:
X540-AT2 [8086:1528] (rev 01)
03:00.1 Ethernet controller [0200]:
X540-AT2 [8086:1528] (rev 01)
03:10.0 Ethernet controller [0200]:
Function [8086:1515] (rev 01)
03:10.1 Ethernet controller [0200]:
Function [8086:1515] (rev 01)
06:00.0 Ethernet controller [0200]:
Connection [8086:10fb] (rev 01)
06:00.1 Ethernet controller [0200]:
Connection [8086:10fb] (rev 01)
06:10.0 Ethernet controller [0200]:
Function [8086:10ed] (rev 01)
06:10.1 Ethernet controller [0200]:
Function [8086:10ed] (rev 01)
Intel Corporation Ethernet Controller 10-Gigabit
Intel Corporation Ethernet Controller 10-Gigabit
Intel Corporation X540 Ethernet Controller Virtual
Intel Corporation X540 Ethernet Controller Virtual
Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network
Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network
Intel Corporation 82599 Ethernet Controller Virtual
Intel Corporation 82599 Ethernet Controller Virtual
We see the two Niantic (device 82599) Virtual Functions, 1 for each Physical Device, are PCI
devices 06:10.0 and 06:10.1 with PCI device types [8086:10ed].
7. Make sure the pci-stub device driver is loaded since it is used to transfer physical devices to VMs. It
was observed that the pci-stub loaded by default on Fedora 20 and not loaded by default on Ubuntu
14.04. The following checks for pci-stub present.
#ls /sys/bus/pci/drivers |grep pci-stub
pci-stub
If not present load
#modprobe pci-stub
8. The Niantic (82599) Virtual Function interfaces are the target interfaces which have a type ID of
8086:10ed. Load the VF devices into pci-stub.
# echo "8086 10ed" > /sys/bus/pci/drivers/pci-stub/new_id
9. List the devices owned by the pci-stub driver.
# ls /sys/bus/pci/drivers/pci-stub
0000:06:10.0 0000:06:10.1 bind new_id
remove_id
uevent
unbind
10. If the target Virtual Function devices are not listed, they are probably attached to an IXGBEVF
driver on the host. The following might be used in that case.
# echo 0000:06:10.0 > /sys/bus/pci/devices/0000:06:10.0/driver/unbind
# echo 0000:06:10.1 > /sys/bus/pci/devices/0000:06:10.1/driver/unbind
# echo "8086 10ed" > /sys/bus/pci/drivers/pci-stub/new_id
11. Verify devices present for transfer to VM.
# ls /sys/bus/pci/drivers/pci-stub
0000:06:10.0 0000:06:10.1 bind new_id
62
remove_id
uevent
unbind
Intel® Open Network Platform Server
Benchmark Performance Test Report
B.5.5.1
SR-IOV VM Startup
The VM is started with the SR-IOV physical devices passed to the VM in the QEMU command line. The
VM is started with 3 GB of preloaded 1 GB huge memory (minimum 3 GB for one 1 GB page to be
available in VM) from /dev/hugepages using 3 vCPUS of host type with first network interface device
being virtio used for the bridged management network and second network device being SR-IOV
06:10.0 and third being SR-IOV 06:10.1. The following is a script that was used to start the VM for the
test:
# cat vm_sr-iov_start.sh
#!/bin/sh
vm=/vm/Fed20-mp.qcow2
vm_name="SRIOVtest"
vnc=10
n1=tap46
bra=br-mgt
dn_scrp_a=/vm/vm_ctl/br-mgt-ifdown
mac1=00:1f:33:16:64:44
if [ ! -f $vm ];
then
echo "VM $vm not found!"
else
echo "VM $vm started! VNC: $vnc, management network: $n1"
tunctl -t $n1
brctl addif $bra $n1
ifconfig $n1 0.0.0.0 up
taskset 0x70 qemu-system-x86_64 -cpu host -hda $vm -m 3072 -boot c -smp 3 pidfile /tmp/vm1.pid -monitor unix:/tmp/vm1monitor,server,nowait -mem-path /dev/
hugepages -mem-prealloc -enable-kvm -net
nic,model=virtio,netdev=eth0,macaddr=$mac1 -netdev
tap,ifname=$n1,id=eth0,script=no,downscript=$dn_scrp_a -device pciassign,host=06:10.0 -device pci-assign,host=06:10.1 -name $vm_name -vnc :$vnc &
fi
Script for VM shutdown support:
# cat br-mgt-ifdown
#!/bin/sh
bridge='br-mgt'
/sbin/ifconfig $1 0.0.0.0 down
brctl delif ${bridge} $1
Start the VM:
# ./vm_sr-iov_start.sh
VM /vm/Fed20-mp.qcow2 started! VNC: 10, management network: tap46
Set 'tap46' persistent and owned by uid 0
B.5.5.2
Operating VM for SR-IOV Test
In the VM, the first network is configured for management network access and used for test control.
1. Verify that the kernel has bootline parameters for 1 GB hugepage support with one 1 GB page and
that vCPU 1 and vCPU 2 have been isolated from the VM task scheduler.
# dmesg |grep command
[
0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.15.6-200.fc20.x86_64
root=UUID=6635b8e3-0312-4892-92e7-ccff453ec06d ro default_hugepagesz=1GB
hugepagesz=1G hugepages=1 isolcpus=1,2 vconsole.font=latarcyrheb-sun16 rhgb quiet
63
Intel® Open Network Platform Server
Benchmark Performance Test Report
If not configure generate new grub file and reboot VM.
# vim /etc/default/grub
// edit file
. . .
GRUB_CMDLINE_LINUX="default_hugepagesz=1GB hugepagesz=1G hugepages=1
isolcpus=1,2..."
. . .
# grub2-mkconfig -o /boot/grub2/grub.cfg
Generating grub.cfg ...
Found linux image: /boot/vmlinuz-3.15.6-200.fc20.x86_64
Found initrd image: /boot/initramfs-3.15.6-200.fc20.x86_64.img
Found linux image: /boot/vmlinuz-3.11.10-301.fc20.x86_64
Found initrd image: /boot/initramfs-3.11.10-301.fc20.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-92606197a3dd4d3e8a2b95312bca842f
Found initrd image: /boot/initramfs-0-rescue-92606197a3dd4d3e8a2b95312bca842f.img
Done
# reboot
2. Verify 1 GB memory page available.
# cat /proc/meminfo
...
HugePages_Total:
1
HugePages_Free:
1
HugePages_Rsvd:
0
HugePages_Surp:
0
Hugepagesize:
1048576
DirectMap4k:
45048
DirectMap2M:
2052096
DirectMap1G:
1048576
kB
kB
kB
kB
3. Check CPU is host type and 3 vCPUs are available.
# cat /proc/cpuinfo
...
processor
: 2
vendor_id
: GenuineIntel
cpu family
: 6
model
: 63
model name
: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
stepping
: 2
microcode
: 0x1
cpu MHz
: 2593.992
cache size
: 4096 KB
physical id
: 2
siblings
: 1
core id
: 0
cpu cores
: 1
apicid
: 2
initial apicid : 2
fpu
: yes
fpu_exception
: yes
cpuid level
: 13
wp
: yes
flags
: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36
clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good
nopl eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt
tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm xsaveopt fsgsbase
tsc_adjust bmi1 avx2 smep bmi2 erms invpcid
bogomips
: 5187.98
clflush size
: 64
cache_alignment : 64
address sizes
: 40 bits physical, 48 bits virtual
power management:
64
Intel® Open Network Platform Server
Benchmark Performance Test Report
4. Install DPDK.
#
#
#
#
#
cd /root
git clone git://dpdk.org/dpdk
cd /root/dpdk
git checkout -b test_v1.7.1 v1.7.1
make install T=x86_64-ivshmem-linuxapp-gcc
5. Build l3fwd example program.
# cd examples/l3fwd
# export RTE_SDK=/root/dpdk
# export RTE_TARGET=x86_64-ivshmem-linuxapp-gcc
# make
CC main.o
LD l3fwd
INSTALL-APP l3fwd
INSTALL-MAP l3fwd.map
6. Mount the hugepage file system.
# mount -t hugetlbfs nodev /dev/hugepages
7. Install UIO driver in VM.
# modprobe uio
# insmod /root/dpdk/x86_64-ivshmem-linuxapp-gcc/kmod/igb_uio.ko
8. Find the SR-IOV devices.
# lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation 440FX - 82441FX PMC [Natoma] [8086:1237] (rev
02)
00:01.0 ISA bridge [0601]: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
[8086:7000]
00:01.1 IDE interface [0101]: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
[8086:7010]
00:01.3 Bridge [0680]: Intel Corporation 82371AB/EB/MB PIIX4 ACPI [8086:7113] (rev 03)
00:02.0 VGA compatible controller [0300]: Cirrus Logic GD 5446 [1013:00b8]
00:03.0 Ethernet controller [0200]: Red Hat, Inc Virtio network device [1af4:1000]
00:04.0 Ethernet controller [0200]: Intel Corporation 82599 Ethernet Controller Virtual
Function [8086:10ed] (rev 01)
00:05.0 Ethernet controller [0200]: Intel Corporation 82599 Ethernet Controller Virtual
Function
The networks are in the order of the QEMU command line. The first network interface, 00:03.0 is
the bridged control network. The second network interface 00:04.0 is the first SR-IOV physical PCI
device passed on the QEMU command lined with the third network interface 00:05.0 is the second
SR-IOV physical PCI device passed.
9. Bind the SR-IOV devices to the igb_uio device driver.
# /root/dpdk/tools/dpdk_nic_bind.py --bind=igb_uio 00:04.0
# /root/dpdk/tools/dpdk_nic_bind.py --bind=igb_uio 00:05.0
10. Run the l3fwd example program.
# cd /root/dpdk/examples/l3fwd
]# ./build/l3fwd -c 6 -n 4 --socket-mem 1024 -- -p 0x3 --config="(0,0,1),(1,0,2)" &
The -c 6 and -config parameters correctly Affinitizes the l3fwd task to the correct vCPUs in the VM,
however since the CPU cores on the host are isolated and QEMU task was started on isolated cores,
the real-time l3fwd task will be thrashing since both vCPUs are running on the same CPU core,
requiring the CPU affinitization to set correctly on the host or poor network performance will occur.
65
Intel® Open Network Platform Server
Benchmark Performance Test Report
11. Locate the vCPU QEMU thread IDs on the host.
# ps -eLF |grep qemu
root
1718
1
1718 0
6 982701 44368 4 17:41 pts/0
00:00:05 qemu-systemx86_64 -cpu host -hda /vm/Fed20-mp.qcow2 -m 3072 -boot c -smp 3 -pidfile /tmp/vm1.pid monitor unix:/tmp/vm1monitor,server,nowait -mem-path /dev/hugepages -mem-prealloc -enablekvm -net nic,model=virtio,netdev=eth0,macaddr=00:1f:33:16:64:44 -netdev
tap,ifname=tap46,id=eth0,script=no,downscript=/vm/vm_ctl/br-mgt-ifdown -device pciassign,host=06:10.0 -device pci-assign,host=06:10.1 -name SRIOVtest -vnc :10
root
1718
1
1720 2
6 982701 44368 4 17:41 pts/0
00:00:32 qemu-systemx86_64 -cpu host -hda /vm/Fed20-mp.qcow2 -m 3072 -boot c -smp 3 -pidfile /tmp/vm1.pid monitor unix:/tmp/vm1monitor,server,nowait -mem-path /dev/hugepages -mem-prealloc -enablekvm -net nic,model=virtio,netdev=eth0,macaddr=00:1f:33:16:64:44 -netdev
tap,ifname=tap46,id=eth0,script=no,downscript=/vm/vm_ctl/br-mgt-ifdown -device pciassign,host=06:10.0 -device pci-assign,host=06:10.1 -name SRIOVtest -vnc :10
root
1718
1
1721 30
6 982701 44368 4 17:41 pts/0
00:06:07 qemu-systemx86_64 -cpu host -hda /vm/Fed20-mp.qcow2 -m 3072 -boot c -smp 3 -pidfile /tmp/vm1.pid monitor unix:/tmp/vm1monitor,server,nowait -mem-path /dev/hugepages -mem-prealloc -enablekvm -net nic,model=virtio,netdev=eth0,macaddr=00:1f:33:16:64:44 -netdev
tap,ifname=tap46,id=eth0,script=no,downscript=/vm/vm_ctl/br-mgt-ifdown -device pciassign,host=06:10.0 -device pci-assign,host=06:10.1 -name SRIOVtest -vnc :10
root
1718
1
1722 30
6 982701 44368 4 17:41 pts/0
00:06:06 qemu-systemx86_64 -cpu host -hda /vm/Fed20-mp.qcow2 -m 3072 -boot c -smp 3 -pidfile /tmp/vm1.pid monitor unix:/tmp/vm1monitor,server,nowait -mem-path /dev/hugepages -mem-prealloc -enablekvm -net nic,model=virtio,netdev=eth0,macaddr=00:1f:33:16:64:44 -netdev
tap,ifname=tap46,id=eth0,script=no,downscript=/vm/vm_ctl/br-mgt-ifdown -device pciassign,host=06:10.0 -device pci-assign,host=06:10.1 -name SRIOVtest -vnc :10
root
1718
1
1724 0
6 982701 44368 4 17:41 pts/0
00:00:00 qemu-systemx86_64 -cpu host -hda /vm/Fed20-mp.qcow2 -m 3072 -boot c -smp 3 -pidfile /tmp/vm1.pid monitor unix:/tmp/vm1monitor,server,nowait -mem-path /dev/hugepages -mem-prealloc -enablekvm -net nic,model=virtio,netdev=eth0,macaddr=00:1f:33:16:64:44 -netdev
tap,ifname=tap46,id=eth0,script=no,downscript=/vm/vm_ctl/br-mgt-ifdown -device pciassign,host=06:10.0 -device pci-assign,host=06:10.1 -name SRIOVtest -vnc :10
root
1718
1
1815 0
6 982701 44368 4 18:01 pts/0
00:00:00 qemu-systemx86_64 -cpu host -hda /vm/Fed20-mp.qcow2 -m 3072 -boot c -smp 3 -pidfile /tmp/vm1.pid monitor unix:/tmp/vm1monitor,server,nowait -mem-path /dev/hugepages -mem-prealloc -enablekvm -net nic,model=virtio,netdev=eth0,macaddr=00:1f:33:16:64:44 -netdev
tap,ifname=tap46,id=eth0,script=no,downscript=/vm/vm_ctl/br-mgt-ifdown -device pciassign,host=06:10.0 -device pci-assign,host=06:10.1 -name SRIOVtest -vnc :10
We see that QEMU thread task IDs 1721 and 1722 are accumulating process time of 6:07 and 6:06
respectively, compared to the other low usage threads in this case. We know that the two 100%
load DPDK treads are running on vCPU1 and vCPU2 respectively, so this indicates that vCPU1 is
task ID 1721 and vCPU2 is task ID 1722. Since QEMU creates vCPU task threads sequentially,
vCPU0 must be QEMU thread task 1720.
12. Having CPU core 8 and 9 available on this 10 CPU core processors, we will move vCPU1 to CPU core
8 and vCPU2 to CPU core 9 for the test.
# taskset -p 100 1721
pid 1721's current affinity mask: 70
pid 1721's new affinity mask: 100
# taskset -p 200 1722
pid 1722's current affinity mask: 70
pid 1722's new affinity mask: 200
13. Using top, we verify that tasks are running on target CPU cores.
# top
(1)
top - 18:03:21 up 35 min, 2 users, load average: 1.93, 1.89, 1.24
Tasks: 279 total,
1 running, 278 sleeping,
0 stopped,
0 zombie
%Cpu0 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
%Cpu1 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
%Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
%Cpu3 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
66
0.0
0.0
0.0
0.0
st
st
st
st
Intel® Open Network Platform Server
Benchmark Performance Test Report
%Cpu4 : 0.0 us,
%Cpu5 : 0.0 us,
%Cpu6 : 0.0 us,
%Cpu7 : 0.0 us,
%Cpu8 :100.0 us,
%Cpu9 :100.0 us,
%Cpu10 : 0.0 us,
%Cpu11 : 0.0 us,
%Cpu12 : 0.0 us,
%Cpu13 : 0.0 us,
%Cpu14 : 0.0 us,
%Cpu15 : 0.0 us,
%Cpu16 : 0.0 us,
%Cpu17 : 0.0 us,
%Cpu18 : 0.0 us,
%Cpu19 : 0.0 us,
%Cpu20 : 0.0 us,
%Cpu21 : 0.0 us,
%Cpu22 : 0.0 us,
%Cpu23 : 0.0 us,
%Cpu24 : 0.0 us,
%Cpu25 : 0.0 us,
%Cpu26 : 0.0 us,
%Cpu27 : 0.0 us,
KiB Mem: 32879352
KiB Swap: 8191996
PID
1718
218
1
2
USER
root
root
root
root
0.0 sy, 0.0 ni,100.0
0.0 sy, 0.0 ni,100.0
0.0 sy, 0.0 ni,100.0
0.0 sy, 0.0 ni,100.0
0.0 sy, 0.0 ni, 0.0
0.0 sy, 0.0 ni, 0.0
0.0 sy, 0.0 ni,100.0
0.0 sy, 0.0 ni,100.0
0.0 sy, 0.0 ni,100.0
0.0 sy, 0.0 ni,100.0
0.0 sy, 0.0 ni,100.0
0.0 sy, 0.0 ni,100.0
0.0 sy, 0.0 ni,100.0
0.0 sy, 0.0 ni,100.0
0.0 sy, 0.0 ni,100.0
0.0 sy, 0.0 ni,100.0
0.0 sy, 0.0 ni,100.0
0.0 sy, 0.0 ni,100.0
0.0 sy, 0.0 ni,100.0
0.0 sy, 0.0 ni,100.0
0.0 sy, 0.0 ni,100.0
0.0 sy, 0.0 ni,100.0
0.0 sy, 0.0 ni,100.0
0.0 sy, 0.0 ni,100.0
total, 21912988 used,
total,
0 used,
PR
20
20
20
20
id, 0.0
id, 0.0
id, 0.0
id, 0.0
id, 0.0
id, 0.0
id, 0.0
id, 0.0
id, 0.0
id, 0.0
id, 0.0
id, 0.0
id, 0.0
id, 0.0
id, 0.0
id, 0.0
id, 0.0
id, 0.0
id, 0.0
id, 0.0
id, 0.0
id, 0.0
id, 0.0
id, 0.0
10966364
8191996
NI
VIRT
RES
SHR
0 3930804 44368 20712 S
0
0
0
0
0
47536
5288
3744
0
0
0
0
wa, 0.0 hi, 0.0 si,
wa, 0.0 hi, 0.0 si,
wa, 0.0 hi, 0.0 si,
wa, 0.0 hi, 0.0 si,
wa, 0.0 hi, 0.0 si,
wa, 0.0 hi, 0.0 si,
wa, 0.0 hi, 0.0 si,
wa, 0.0 hi, 0.0 si,
wa, 0.0 hi, 0.0 si,
wa, 0.0 hi, 0.0 si,
wa, 0.0 hi, 0.0 si,
wa, 0.0 hi, 0.0 si,
wa, 0.0 hi, 0.0 si,
wa, 0.0 hi, 0.0 si,
wa, 0.0 hi, 0.0 si,
wa, 0.0 hi, 0.0 si,
wa, 0.0 hi, 0.0 si,
wa, 0.0 hi, 0.0 si,
wa, 0.0 hi, 0.0 si,
wa, 0.0 hi, 0.0 si,
wa, 0.0 hi, 0.0 si,
wa, 0.0 hi, 0.0 si,
wa, 0.0 hi, 0.0 si,
wa, 0.0 hi, 0.0 si,
free,
16680 buffers
free,
519040 cached
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
st
st
st
st
st
st
st
st
st
st
st
st
st
st
st
st
st
st
st
st
st
st
st
st
S %CPU %MEM
TIME+ COMMAND
199.9 0.1 15:10.12 qemu-system-x86
S
0.3 0.0
0:00.31 kworker/10:1
S
0.0 0.0
0:02.94 systemd
S
0.0 0.0
0:00.00 kthreadd
This shows that the DPDK l3fwd CPU 100% loads are on the target CPU cores 8 and 9.
14. Check that the correct vCPU loads in the VM.
# top
(1)
top - 15:03:45 up 22 min, 3 users, load average: 2.00, 1.88, 1.24
Tasks: 86 total,
2 running, 84 sleeping,
0 stopped,
0 zombie
%Cpu0 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 99.7 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.3 hi, 0.0 si, 0.0 st
KiB Mem:
3082324 total, 1465384 used, 1616940 free,
23304 buffers
KiB Swap: 2097148 total,
0 used, 2097148 free,
298540 cached
PID
7388
7221
1
2
USER
root
root
root
root
PR
20
20
20
20
NI
VIRT
0 1681192
0
0
0
47496
0
0
RES
3144
0
5052
0
SHR
2884
0
3692
0
S %CPU %MEM
R 200.1 0.1
S
0.3 0.0
S
0.0 0.2
S
0.0 0.0
TIME+
2:11.14
0:00.14
0:00.33
0:00.00
COMMAND
l3fwd
kworker/0:0
systemd
kthreadd
In the VM, the top utility shows that the DPDK l3task loads are on vCPU1 and vCPU2. Before correct
host affiliation, due to CPU isolation the vCPU loads were running on the same host core since CPU load
scheduling is disabled causing top to not show close to 100% user task loads in the VM. The VM is now
ready for SR-IOV throughput tests.
In a production system, the VM startup and affiliation would be done in a different manor to avoid
disrupting any real time processes currently running on the host or other VMs.
67
Intel® Open Network Platform Server
Benchmark Performance Test Report
B.5.6
Affinitization and Performance Tuning
To maximize network throughput, individual cores must be affinitized to particular tasks. This can be
achieved by using either the taskset command on the host and/or by passing a core mask parameter
to the VM application.
The VM starts with two cores: vCPU0 and vCPU1. The Linux operating system and related tasks must
use only vCPU0. vCPU1 is reserved to run the DPDK processes.
B.5.6.1
Affinitization Using Core Mask Parameter in the qemu, and
the test-pmd Startup Commands
The qemu, test-pmd startup commands offer a core mask parameter that can be set with a hex mask
to ensure the tasks use specific cores.
Use core mask: -c 0x1 for test-pmd commands. This ensures that the DPDK task application in the VM
uses vCPU1. Ensure by running top command if vCPU1 is used at 100%.
However, with the qemu command, even though core mask is set to use two host cores for the VM's
vCPU0 and vCPU1, it allocates the vCPU0 and vCPU1 tasks on a single host core mostly the first core of
the specified core mask. Hence, the qemu task needs to be re-affinitized.
B.5.6.2
Affinitized Host Cores for VMs vCPU0 and vCPU1
Use the taskset command to pin specific processes to a core.
# taskset -p <core_mask> <pid>
Ensure that the VM's vCPU0 and vCPU1 are assigned to two separate host cores. For example:
• Intel® DPDK Accelerated vSwitch uses cores 0, 1, 2 and 3 (-c 0x0F)
• QEMU task for VM's vCPU0 uses core 4 (-c 0x30)
• QEMU task for VM's vCPU1 uses core 5 (-c 0x30; taskset -p 20 <pid_vcpu1>)
68
Intel® Open Network Platform Server
Benchmark Performance Test Report
PCI Passthrough with Intel® QuickAssist
B.6
When setting up a QAT PCI device for a passthrough-to-VM, make sure that VT-d is enabled in the BIOS
and “intel_iommu=on iommu=pt” is used in the grub.cfg file to boot the OS with IOMMU enabled. The
VM has access to the QAT PCI device using PCI passthrough. Make sure the host has two QAT cards
since we will set up an IPSec tunnel between two VMs, each VM using QAT to accelerate the tunnel. The
VM uses standard Open vSwitch virtio+standard vhost IO virtualization method for networking.
1. Run the following command to verify that the host has two QAT devices provisioned in it.
# lspci -nn |grep 043
0c:00.0 Co-processor [0b40]: Intel Corporation Coleto Creek PCIe Endpoint [8086:0435]
85:00.0 Co-processor [0b40]: Intel Corporation Coleto Creek PCIe Endpoint [8086:0435]
2. Use the following commands to detach PCI devices from the host.
# echo 8086 0435 > /sys/bus/pci/drivers/pci-stub/new_id
# echo 0000:85:00.0 > /sys/bus/pci/devices/0000\:85\:00.0/driver/unbind
# echo 0000:0c:00.0 > /sys/bus/pci/devices/0000\:0c\:00.0/driver/unbind
Note:
You may need to use the specific PCI bus ID per your system setup.
3. After detaching the acceleration complex from the host operating system, bind the appropriate
bus/ device/function to pci-stub driver.
# echo 0000:85:00.0 > /sys/bus/pci/drivers/pci-stub/bind
# echo 0000:0c:00.0 > /sys/bus/pci/drivers/pci-stub/bind
4. Verify if the devices are bound to pci-stub.
# lspci -vv |grep pci-stub
5. On a separate compute node that uses standard Open vSwitch for networking, add an ovs bridge
called br0. The tap devices tap1, tap2, tap3 and tap4 is used as data network vNICs for the two
VMs. Each of the two 10 GbE on the host are bridged to the ovs bridge br0 as follows:
#
#
#
#
#
#
#
#
#
#
#
ovs-vsctl
tunctl -t
tunctl -t
tunctl -t
tunctl -t
ovs-vsctl
ovs-vsctl
ovs-vsctl
ovs-vsctl
ovs-vsctl
ovs-vsctl
add-br br0
tap1
tap2
tap3
tap4
add-port br0
add-port br0
add-port br0
add-port br0
add-port br0
add-port br0
tap1
tap2
tap3
tap4
p786p1
p786p2
6. Bring up the tap devices and ports added to the bridge.
#
#
#
#
#
#
#
ip
ip
ip
ip
ip
ip
ip
link
link
link
link
link
link
link
set
set
set
set
set
set
set
br0 up
tap1 up
tap2 up
tap3 up
tap4 up
p786p1 up
p786p2 up
7. Disable Linux Kernel forwarding on the host.
# echo 0 > /proc/sys/net/ipv4/ip_forward
69
Intel® Open Network Platform Server
Benchmark Performance Test Report
8. The two VMs can now be started with QAT devices as PCI passthrough. Following is a qemu
command to pass 85:00.0 and 0c:00.0 to the VMs:
# qemu-system-x86_64 -cpu host -enable-kvm -hda <VM1_image_path> -m 8192 -smp 4
-net nic,model=virtio,netdev=eth0,macaddr=00:00:00:00:00:01 -netdev
tap,ifname=tap1,id=eth0,vhost=on,script=no,downscript=no -net
nic,model=virtio,netdev=eth1,macaddr=00:00:00:00:00:02 -netdev
tap,ifname=tap2,id=eth1,vhost=on,script=no,downscript=no -vnc :15 -name vm1
-device pci- assign,host=85:00.0 &
# qemu-system-x86_64 -cpu host -enable-kvm -hda <VM2_image_path> -m 8192 -smp 4
-net nic,model=virtio,netdev=eth0,macaddr=00:00:00:00:00:03 -netdev
tap,ifname=tap3,id=eth0,vhost=on,script=no,downscript=no -net
nic,model=virtio,netdev=eth1,macaddr=00:00:00:00:00:04 -netdev
tap,ifname=tap4,id=eth1,vhost=on,script=no,downscript=no -vnc :16 -name vm2
-device pci- assign,host=0c:00.0 &
Refer to Appendix B.6.1 to setup a VM with QAT drivers, netkeyshim module and the strongSwan IPSec
application.
B.6.1
VM Installation
B.6.1.1
Creating VM Image and VM Configuration
Refer to Appendix B.4.6.
B.6.1.2
Verifying Pass-through
Once the guest starts, run the following command within guest:
# lspci -nn
Pass-through PCI devices should appear with the same description as the host originally showed. For
example, if the following was shown on the host:
0c:00.0 Co-processor [0b40]: Intel Corporation Coleto Creek PCIe Endpoint [8086:0435]
It should show up on the guest as:
Ethernet controller [0200]: Intel Corporation Device [8086:0435]
B.6.1.3
Installing Intel® Communications Chipset Software in KVM
Guest
The instructions in this solutions guide assume that you have super user privileges. The QAT build
directory used in this section is /QAT.
# su
# mkdir /QAT
# cd /QAT
1. Transfer the tarball using any preferred method. For example, USB memory stick, CDROM, or
network transfer in the /QAT directory.
# tar -zxof <QAT_tarball_name>
2. Launch the script using the following command:
# ./installer.sh
70
Intel® Open Network Platform Server
Benchmark Performance Test Report
3. Choose option 2 to build and install the acceleration software. Choose option 6 to build the sample
LKCF code.
4. Start/Stop acceleration software.
# service qat_service start/stop
5. Setup the environment to install the Linux kernel crypto framework driver.
# export ICP_ROOT=/QAT
# export KERNEL_SOURCE_ROOT=/usr/src/kernels/`uname -r`
6. Unpack the Linux kernel crypto driver.
# mkdir -p $ICP_ROOT/quickassist/shims/netkey
# cd $ICP_ROOT/quickassist/shims/netkey
# tar xzof <path_to>/icp_qat_netkey.L.<version>.tar.gz
7. Build the Linux kernel crypto driver.
# cd $ICP_ROOT/quickassist/shims/netkey/icp_netkey
# make
8. Install the Linux kernel crypto driver.
# cd $ICP_ROOT/quickassist/shims/netkey/icp_netkey
# insmod ./icp_qat_netkey.ko
9. Verify that the module has been installed.
# lsmod | grep icp
The following is the expected output:
icp_qat_netkey
icp_qa_al
B.6.2
21868 0
1551435 2 icp_qat_netkey
Installing strongSwan IPSec Software
Install strongSwan IPsec software on both VM1 and VM2.
1. Download the original strongswan-4.5.3 software package from the following link:
http://download.strongswan.org/strongswan-4.5.3.tar.gz
2. Navigate to the shims directory.
# cd $ICP_ROOT/quickassist/shims
3. Extract the source files from the strongSwan software package to the shims directory.
# tar xzof <path_to>/strongswan-4.5.3.tar.gz
4. Navigate to the strongSwan-4.5.3 directory.
# cd $ICP_ROOT/quickassist/shims/strongswan-4.5.3
5. Configure strongSwan using the following command:
# ./configure --prefix=/usr --sysconfdir=/etc
6. Build strongSwan using the following command:
# make
7. Install strongSwan using the following command:
# make install
Repeat the installation of strongSwan IPsec software on the other VM.
71
Intel® Open Network Platform Server
Benchmark Performance Test Report
B.6.3
Configuring strongSwan IPsec Software
1. Add the following line to the configuration file /etc/strongswan.conf on the VM1 and VM2 platforms
after the charon { line:
load = curl aes des sha1 sha2 md5 pem pkcs1 gmp random x509 revocation hmac xcbc
stroke kernel-netlink socket-raw updown
After adding the line, the section looks like:
# strongswan.conf - strongSwan configuration file
charon {
load = curl aes des sha1 sha2 md5 pem pkcs1 gmp random x509 revocation hmac xcbc
stroke kernel-netlink socket-raw updown
# number of worker threads in charon threads = 16
Note:
When the /etc/strongswan.conf text files are created, the line that starts with load and
ends with updown must be on the same line, despite the appearance of it being on two
separate lines in this documentation.
2. Update the strongSwan configuration files on the VM1 platform:
a. Edit /etc/ipsec.conf.
# ipsec.conf - strongSwan IPsec configuration file
# basic configuration
config
setup
plutodebug=none
crlcheckinterval=180
strictcrlpolicy=no
nat_traversal=no
charonstart=no
plutostart=yes
conn
%default
ikelifetime=60m
keylife=1m
rekeymargin=3m
keyingtries=1
keyexchange=ikev1
ike=aes128-sha-modp2048!
esp=aes128-sha1!
conn
host-host
left=192.168.99.2
leftfirewall=no
right=192.168.99.3
auto=start
authby=secret
b. Edit /etc/ipsec.secrets (this file might not exist and might need to be created).
# /etc/ipsec.secrets - strongSwan IPsec secrets file
192.168.99.2 192.168.99.3 : PSK "shared key"
3. Update the strongSwan configuration files on the VM2 platform:
a. Edit /etc/ipsec.conf.
# /etc/ipsec.conf - strongSwan IPsec configuration file config setup
crlcheckinterval=180
strictcrlpolicy=no
plutostart=yes
plutodebug=none
72
Intel® Open Network Platform Server
Benchmark Performance Test Report
charonstart=no
nat_traversal=no
conn
%default
ikelifetime=60m
keylife=1m
rekeymargin=3m
keyingtries=1
keyexchange=ikev1
ike=aes128-sha1-modp2048!
esp=aes128-sha1!
conn
host-host
left=192.168.99.3
leftfirewall=no
right=192.168.99.2
auto=start
authby=secret
b. Edit /etc/ipsec.secrets (this file might not exist and might need to be created).
# /etc/ipsec.secrets - strongSwan IPsec secrets file
192.168.99.3 192.168.99.2 : PSK "shared key"
B.6.3.1
Starting strongSwan IPsec Software on the VM
IPSec tunnel is setup between two VMs as shown in Figure 6-20 on page 32. The IPSec tunnel is setup
for network interfaces on the subnet 192.168.99.0/24. The data networks for VM1 and VM2 are
1.1.1.0/24 and 6.6.6.0/24 respectively.
1. VM1 configuration settings:
a. Network configuration:
# cat /etc/sysconfig/network-scripts/ifcfg-eth1
TYPE=Ethernet
NAME=eth1
BOOTPROTO=none
ONBOOT=yes
NETMASK=255.255.255.0
IPADDR=1.1.1.2
HWADDR=00:00:00:00:00:01
# cat /etc/sysconfig/network-scripts/ifcfg-eth2
TYPE=Ethernet
NAME=eth2
BOOTPROTO=none
ONBOOT=yes
NETMASK=255.255.255.0
IPADDR=192.168.99.2
HWADDR=00:00:00:00:00:02
b. Add a route to the routing table for the VM2 network.
# ip route add 6.6.6.0/24 dev eth2
# route -n
c. Enable Linux Kernel forwarding and stop the firewall daemon.
# echo 1 > /proc/sys/net/ipv4/ip_forward
# service firewalld stop
d. Check QAT service status and insert the netkeyshim module.
# export ICP_ROOT=/qat/QAT1.6
# service qat_service status
# insmod $ICP_ROOT/quickassist/shims/netkey/icp_netkey/icp_qat_netkey.ko
73
Intel® Open Network Platform Server
Benchmark Performance Test Report
e. Start the IPSec process.
# ipsec start
2. VM2 configuration settings:
a. Network configuration:
# cat /etc/sysconfig/network-scripts/ifcfg-eth1
TYPE=Ethernet
NAME=eth1
BOOTPROTO=none
ONBOOT=yes
NETMASK=255.255.255.0
IPADDR=6.6.6.6
HWADDR=00:00:00:00:00:03
# cat /etc/sysconfig/network-scripts/ifcfg-eth2
TYPE=Ethernet
NAME=eth2
BOOTPROTO=none
ONBOOT=yes
NETMASK=255.255.255.0
IPADDR=192.168.99.3
HWADDR=00:00:00:00:00:04
b. Add a route to the routing table for the VM2 network.
# ip route add 1.1.1.0/24 dev eth2
# route -n
c. Enable Linux Kernel forwarding and stop the firewall daemon
# echo 1 > /proc/sys/net/ipv4/ip_forward
# service firewalld stop
d. Check QAT service status and insert the netkeyshim module.
# export ICP_ROOT=/qat/QAT1.6
# service qat_service status
# insmod $ICP_ROOT/quickassist/shims/netkey/icp_netkey/icp_qat_netkey.ko
e. Start the IPSec process
# ipsec start
3. Enable the IPsec tunnel “host-host” on both VM1 and VM2:
# ipsec up host-host
4. To test the VM-VM IPSec tunnel performance, use Netperf.
74
Intel® Open Network Platform Server
Benchmark Performance Test Report
Appendix C Glossary
Acronym
COTS
Description
Commercial Off‐The-Shelf
DPI
Deep Packet Inspection
IOMMU
Input/Output Memory Management Unit
Kpps
Kilo packets per seconds
KVM
Kernel-based Virtual Machine
Mpps
Millions packets per seconds
NIC
Network Interface Card
pps
Packets per seconds
QAT
Quick Assist Technology
RA
Reference Architecture
RSS
Receive Side Scaling
SP
Service Provider
SR-IOV
Single root I/O Virtualization
75
Intel® Open Network Platform Server
Benchmark Performance Test Report
NOTE:
76
This page intentionally left blank.
Intel® Open Network Platform Server
Benchmark Performance Test Report
Appendix D Definitions
D.1
Packet Throughput
There is a difference between an Ethernet frame, an IP packet, and a UDP datagram. In the sevenlayer OSI model of computer networking, packet refers to a data unit at layer 3 (network layer). The
correct term for a data unit at layer 2 (data link layer) is a frame, and at layer 4 (transport layer) is a
segment or datagram.
Important concepts related to 10GbE performance are frame rate and throughput. The MAC bit rate of
10GbE, defined in the IEEE standard 802.3ae, is 10 billion bits per second. Frame rate is based on the
bit rate and frame format definitions. Throughput, defined in IETF RFC 1242, is the highest rate at
which the system under test can forward the offered load, without loss.
The frame rate for 10GbE is determined by a formula that divides the 10 billion bits per second by the
preamble + frame length + inter-frame gap.
The maximum frame rate is calculated using the minimum values of the following parameters, as
described in the IEEE 802.3ae standard:
• Preamble: 8 bytes * 8 = 64 bits
• Frame length: 64 bytes (minimum) * 8 = 512 bits
• Inter-frame gap: 12 bytes (minimum) * 8 = 96 bits
Therefore, Maximum Frame Rate (64-byte packets)
= MAC Transmit Bit Rate / (Preamble + Frame Length + Inter-frame Gap)
= 10,000,000,000 / (64 + 512 + 96)
= 10,000,000,000 / 672
= 14,880,952.38 frame per second (fps)
Table D-1
Maximum Throughput
IP Packet Size
(Bytes)
Theoretical Line Rate (fps)
(Full duplex)
Theoretical Line Rate (fps)
(Half duplex)
64
29,761,904
14,880,952
128
16,891,891
8,445,946
256
9,057,971
4,528,986
512
4,699,248
2,349,624
1024
2,394,636
1,197,318
1280
1,923,076
961,538
1518
1,625,487
812,744
77
Intel® Open Network Platform Server
Benchmark Performance Test Report
D.2
RFC 2544
RFC 2544 is an Internet Engineering Task Force (IETF) RFC that outlines a benchmarking methodology
for network Interconnect Devices. The methodology results in performance metrics such as latency,
frame loss percentage, and maximum data throughput.
In this document network “throughput” (measured in millions of frames per second) is based on
RFC 2544, unless otherwise noted. Frame size refers to Ethernet frames ranging from smallest frames
of 64 bytes to largest frames of 1518 bytes.
Types of tests are:
• Throughput test defines the maximum number of frames per second that can be transmitted
without any error. Test time during which frames are transmitted must be at least 60 seconds.
• Latency test measures the time required for a frame to travel from the originating device through
the network to the destination device.
• Frame loss test measures the network’s response in overload conditions—a critical indicator of the
network’s ability to support real-time applications in which a large amount of frame loss will rapidly
degrade service quality.
• Burst test assesses the buffering capability of a switch. It measures the maximum number of
frames received at full line rate before a frame is lost. In carrier Ethernet networks, this
measurement validates the excess information rate (EIR) as defined in many SLAs.
• System recovery to characterize speed of recovery from an overload condition
• Reset to characterize speed of recovery from device or software reset
Although not included in the defined RFC 2544 standard, another crucial measurement in Ethernet
networking is packet jitter.
78
Intel® Open Network Platform Server
Benchmark Performance Test Report
Appendix E References
Document Name
Internet Protocol version 4
Source
http://www.ietf.org/rfc/rfc791.txt
Internet Protocol version 6
http://www.faqs.org/rfc/rfc2460.txt
Intel® 82599 10 Gigabit Ethernet Controller Datasheet
http://www.intel.com/content/www/us/en/ethernet-controllers/8259910-gbe-controller-datasheet.html
Intel DDIO
https://www-ssl.intel.com/content/www/us/en/io/direct-data-i-o.html?
Bandwidth Sharing Fairness
http://www.intel.com/content/www/us/en/network-adapters/10-gigabitnetwork-adapters/10-gbe-ethernet-flexible-port-partitioning-brief.html
Design Considerations for efficient network
applications with Intel® multi-core processor- based
systems on Linux
http://download.intel.com/design/intarch/papers/324176.pdf
OpenFlow with Intel 82599
http://ftp.sunet.se/pub/Linux/distributions/bifrost/seminars/workshop2011-03-31/Openflow_1103031.pdf
Wu, W., DeMar,P. & Crawford,M (2012). A TransportFriendly NIC for Multicore / Multiprocessor Systems
IEEE transactions on parallel and distributed systems, vol 23, no 4, April
2012.
http://lss.fnal.gov/archive/2010/pub/fermilab-pub-10-327-cd.pdf
Why does Flow Director Cause Placket Reordering?
http://arxiv.org/ftp/arxiv/papers/1106/1106.0443.pdf
IA packet processing
http://www.intel.com/p/en_US/embedded/hwsw/technology/packetprocessing
High Performance Packet Processing on Cloud
Platforms using Linux* with Intel® Architecture
http://networkbuilders.intel.com/docs/
network_builders_RA_packet_processing.pdf
Packet Processing Performance of Virtualized
Platforms with Linux* and Intel® Architecture
http://networkbuilders.intel.com/docs/network_builders_RA_NFV.pdf
DPDK
http://www.intel.com/go/dpdk
Intel® DPDK Accelerated vSwitch
https://01.org/packet-processing
RFC 1242 (Benchmarking Terminology for Network
Interconnection Devices)
http://www.ietf.org/rfc/rfc1242.txt
RFC 2544 (Benchmarking Methodology for Network
Interconnect Devices)
http://www.ietf.org/rfc/rfc2544.txt
RFC 6349 (Framework for TCP Throughput Testing)
http://www.faqs.org/rfc/rfc6349.txt
RFC 3393 (IP Packet Delay Variation Metric for IP
Performance Metrics)
https://www.ietf.org/rfc/rfc3393.txt
MEF end-end measurement metrics
http://metroethernetforum.org/Assets/White_Papers/Metro-EthernetServices.pdf
DPDK Sample Application User Guide
http://dpdk.org/doc/intel/dpdk-sample-apps-1.7.0.pdf
Benchmarking Methodology for Virtualization Network
Performance
https://tools.ietf.org/html/draft-liu-bmwg-virtual-network-benchmark-00
Benchmarking Virtual Network Functions and Their
Infrastructure
https://tools.ietf.org/html/draft-morton-bmwg-virtual-net-01
79
Intel® Open Network Platform Server
Benchmark Performance Test Report
LEGAL
By using this document, in addition to any agreements you have with Intel, you accept the terms set forth below.
You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel
products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which
includes subject matter disclosed herein.
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY
ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN
INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL
DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR
WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT,
COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software,
operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and
performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when
combined with other products.
The products described in this document may contain design defects or errors known as errata which may cause the product to
deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or
your distributor to obtain the latest specifications and before placing your product order.
Intel technologies may require enabled hardware, specific software, or services activation. Check with your system manufacturer or
retailer. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or
configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your
purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance.
All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change
without notice. Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and
provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your
actual performance.
No computer system can be absolutely secure. Intel does not assume any liability for lost or stolen data or systems or any damages
resulting from such losses.
Intel does not control or audit third-party web sites referenced in this document. You should visit the referenced web site and confirm
whether referenced data are accurate.
Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that
relate to the presented subject matter. The furnishing of documents and other materials and information does not provide any
license, express or implied, by estoppel or otherwise, to any such patents, trademarks, copyrights, or other intellectual property
rights.
© 2014 Intel Corporation. All rights reserved. Intel, the Intel logo, Core, Xeon and others are trademarks of Intel Corporation in the
U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
80