VNT: Virtual Networks over TRILL
Transcription
VNT: Virtual Networks over TRILL
Ahmed Amamou, [email protected] Benoît Ganne, [email protected] Accelerate networking innovation through programmable data plane Removing switches from datacenters with TRILL/VNT and smartNIC Who is Gandi? • Gandi is a domain name registrars since 1999 and a cloud provider since 2008 • We provide both – IaaS: Infrastructure As A Service – PaaS: Platform As A Service • We support open source community: – Provide open source code : https://github.com/Gandi – Support open source project: VLC, Debian, … * * Check http://www.gandi.net/supports/ for exhaustive list 2 IaaS new network’s challenges • Cisco Forecast report*: – Cloud traffic was about 3.3 zetta (1021) Bytes in 2013 – Cloud traffic will reach 6.6 zetta Bytes in 2016 – 76% of cloud traffic are East-West (within the same datacenter) A high density of links within a datacenter is needed • Customer need a full network access – Should be isolated – VM network configuration should not be restrictive Overlaying tenant traffic should be considered * Cisco Global Cloud Index Forecast and Methodology, 2011-2016. 3 Why OpenCompute? • New protocols are proposed to solve these problems (TRILL , VXLAN, 802.1 ad STT …) but: – Hardware integration is slow – Protocol extensions are hard to integrate • We believe the OpenCompute community can help us – To define an open, vendor-neutral API for programmable data plane – Bring open hardware fulfilling those needs 4 New datacenter architecture • Switch from classic datacenter architecture to a full-mesh one • Upgrade hardware to improve performances 5 TRILL @Gandi • Gandi uses commodity hardware as TRILL Rbridges since 2013 • We did not yet found hardware that suits our needs. 6 TRILL: TRansparent Interconnection of Lot of Links • • • • • Layer 2 Routing Protocol Uses a control and a data plane Control plane : based on IS-IS that computes all Routing information Data plane : forward packets using provided information from control plane Uses Mac-in-Mac encapsulation TRILL Header Original payload 7 TRILL benefits Commutation(L2) Routing (L3) TRILL Configuration Minimal Intense Minimal Plug & play Yes No Yes Discovery Automatic Configured Automatic Learning Automatic Configured Automatic Multi path No Yes Yes Convergence Slow Fast Fast Connectivity Inflexible Flexible Flexible Scale Limited Important Important 8 Control Plane: Forwarding database 9 Multitenancy: Virtual Network over TRILL (VNT) New cloud architecture have to take into consideration Multitenancy Trill does not provide Multitenancy handling mechanisms → We need to extend it 10 VNT vs TRILL • Update Both control and data planes – Control plane : Prune multicast tree to limit multicast traffic – Data plane : Forwarding is conditioned by VNI support L2 Routing information Outer Destination Mac Address Outer Source Mac Address Egress Rbridge Nickname Ingress Rbridge Nickname Optional Outer IEEE 802.1Q TRILL Header Options description TLV VNI Tag (24 bits) VNT Header Extensions Tenant identification Original Packet Payload VNT Encapsulation Original Ethernet Frame Publication: Amamou, A., Haddadou, K., & Pujolle, G. (2014). A TRILL-based multi-tenant data center network. Computer Networks. 11 VNT: Multicast tree pruning Topology i1 n8 i2 i1 Multicast tree n1 i2 i3 i2 n2 n7 i3 A –Vni1 i2 i1 i2 i1 i3 n6 i1 i1 n5 n8 i1 i2 i2 i1 n4 n1 n1 i2 i1 n2 i2 i1 i2 i1 i3 n6 n5 i1 A –Vni1 n5 n2 n3 i2 i3 i2 n2 n7 n6 n4 n3 n7 i3 i1 n5 i1 i3 i2 n8 n3 i2 i3 i2 B –Vni1 n1 i1 i2 n4 i1 n6 12 Current VNT implementation on Linux Control plane : Quagga daemon Data plane: Linux Bridge Module 13 Current VNT implementation on Linux Control plane : Quagga daemon https://github.com/Gandi/ Data plane: Linux Bridge Module 14 Data plane: performance Throughput Delay • Throughput is affected by the addition processing operation • Processing for a single packet is not affected 15 Improving performance • Shift data plane from host to smartNIC – Increase performance – Offload x86 for other usages • eg. Customers workload Host Host Control plane Control plane Data plane Data plane NIC smartNIC 16 KALRAY deterministic supercomputing on a chip • Founded in 2008, fabless semiconductor company • Kalray has developed the disruptive MPPA® (Multi-Purpose Processing Array) programmable architecture – – – – First MPPA®-256 Chips with CMOS 28nm TSMC Leading Performance / Energy Ratio Worldwide Leading Performance / Energy Ratio Worldwide Time predictability and low latency Heterogeneous applications on the same chip High programmability • Working with industry-leading partners and customers • 55 employees • Offices in France and US 17 MPPA®-256 Bostan Networking Strengths High throughput / Line rate Software Defined NIC Smart packet classification/dispatching 256 cores for packets processing Standard C/C++ with GCC-4.9 Advanced debugging and profiling Low latency Zero-copy Ethernet PCIe < 1µs port-to-port transparent mode < 1µs port to system memory 80 Gbps full-duplex line-rate (2x120MPPS) 3400 instructions per packet @64B AES, SHA-1, SHA-2,CRC accelerators 2 x PCIe Gen3 8-lanes System integration Linux support Virtualization support Low power 18 MPPA®-256 Bostan • • • 64-bit processor Up to 800MHz High Performance – 845 GFLOPS SP / 422 GFLOPS DP – 1 TOPS • High Bandwidth Network On a Chip – 2 x 12.8 GB/s • High Speed Ethernet – Up to 2x40 Gbps / 2x120 MPPS @ 64B • DDR3 Memory interfaces – 2 x 64-bit + ECC @2133MT/s / 2 x 17GB/s • PCIe Gen3 interface – 2 x 8-lanes / 2 x 8 GB/s full duplex – End Point / Root Complex • NoCX extension – 2 x 40 Gbps + 2 x 80 Gbps ILK • Flash controller, GPIOs… 19 MPPA®-256 Processor Hierarchical Architecture 256 Processing Engine cores + 32 Resource Management cores VLIW Core Compute Cluster Instruction Level Parallelism Thread Level Parallelism Manycore Processor Process Level Parallelism 20 High Speed Ethernet Packet processing • Ethernet Rx dispatcher – 8 classification tables • Classify • Extract fields • Smart Dispatch – Round Robin way – Flexible cores allocation • Round Robin vs. classification • Per 10G Ports • Ethernet Tx – 64 Tx FIFOs – QoS between the FIFOs – Flow Control between clusters and Tx FIFOs Patent pending 21 VNT on a programmable data plane Multicast forwarding example • Kalray Bostan smartNIC – Explore programmable data plane opportunities – Study a VNT smartNIC feasibility and architecture 8x10GbE • IO ethernet driver On-going work between Gandi and Kalray Multicast forwarding put a high load on each node x86 Hypervisor MPPA Linux ethernet driver MPPA Linux ethernet driver Linux networking stack Linux networking stack TRILL controller Userspace application 22 VNT on a programmable data plane Multicast forwarding example • Kalray Egress=DTROOT, Bostan smartNIC VNI=VNI-1> <Ethertype=TRILL, 8x10GbE if (Packet[Ethertype]IO ==ethernet TRILL) {driver send to cluster #HASH(Egress RBridge) } Dispatch the packet based on Egress Rbridge – In case of multicast, Egress RBridge is set to the tree root – Each cluster “owns” a subset of the possible Egress RBridge (ie. a FIB subset) x86 Hypervisor MPPA Linux ethernet driver MPPA Linux ethernet driver Linux networking stack Linux networking stack TRILL controller Userspace application 23 VNT on a programmable data plane Multicast forwarding example • Kalray Bostan smartNIC 8x10GbE Dispatch the packet based on Egress Rbridge – In case of multicast, Egress RBridge is set to the tree root – Each cluster “owns” a subset of the possible Egress RBridge (ie. a FIB subset) IO ethernet driver x86 Hypervisor MPPA Linux ethernet driver MPPA Linux ethernet driver Linux networking stack Linux networking stack TRILL controller Userspace application 24 VNT on a programmable data plane Multicast forwarding example • Kalray Bostan smartNIC Lookup the list of next-hop RBridges for this multicast tree – RBridge owner clusters can be local or remote 8x10GbE • Lookup the LIB for local ports if any FIB[Egress RBridge] = { IO ethernet Egress RBridge MAC; driver Egress RBridge Interface; MCTree = [ RBx, RBy, … ]; x86 VNI = [ VNI-1, VNI-2, … ]; } Hypervisor LIB = { (Local MACx, Local Portx, VNI-1); MPPA Linux ethernet driver MPPA Linux ethernet driver … } Linux networking stack Linux networking stack TRILL controller Userspace application 25 VNT on a programmable data plane Multicast forwarding example • Kalray Bostan smartNIC Forward the frame – Remote • Forward to clusters owning the next-hop RBridge – Local 8x10GbE • • Decapsulte inner frame Forward it the local VM IO ethernet driver x86 Hypervisor MPPA Linux ethernet driver MPPA Linux ethernet driver Linux networking stack Linux networking stack TRILL controller Userspace application 26 VNT on a programmable data plane Multicast forwarding example • Kalray Bostan smartNIC Check if the RBridge support the appropriate VNI – If yes forward to Rbridge – If not, stop here 8x10GbE IO ethernet FIB[Egress RBridge] = { driver Egress RBridge MAC; Egress RBridge Interface; x86MCTree = [ RBx, RBy, … ]; VNI = [ VNI-1, VNI-2, … ]; Hypervisor } MPPA Linux ethernet driver MPPA Linux ethernet driver Linux networking stack Linux networking stack TRILL controller Userspace application 27 VNT on a programmable data plane Multicast forwarding example • Kalray Bostan smartNIC Check if the RBridge support the appropriate VNI – If yes forward to Rbridge – If not, stop here 8x10GbE IO ethernet driver x86 Hypervisor MPPA Linux ethernet driver MPPA Linux ethernet driver Linux networking stack Linux networking stack TRILL controller Userspace application 28 Innovation and efficiency • Solving SDN and network virtualization challenges requires new protocols – eg. VXLAN, NVGRE, TRILL/VNT… • Efficiency generally means hardware support …But hardware development cannot keep up with software and slow down innovation • Gandi and Kalray think a programmable data plane can reconcile efficiency and innovation …But we need open ecosystems, standards and API 29 Ahmed Amamou, [email protected] Benoît Ganne, [email protected] Thank you for your attention! Questions?