The Open Source IRATI Prototype: design
Transcription
The Open Source IRATI Prototype: design
The Open Source IRATI Prototype design, implementation and future plans 28th January 2015 - Ghent Francesco Salvestrini Nextworks s.r.l. Implementing RINA, previous prototypes … • Pre-2013, few RINA prototypes have been implemented: – ProtoRINA (https://github.com/ProtoRINA/users/wiki) – Alba (closed source) • (Design - and implementation - vary depending on the goals to accomplish) • Pre-IRATI prototypes: 1. Focus on the validation of the architecture 2. Written in Java → residing in user-space 2 … previous RINA prototypes … • Focus on concepts, not performance • Constrained to the limitations imposed by the OS: – e.g. inherit limitations of both the TCP/IP stack and the (POSIX) sockets API System (Host) Application Specific Tasks System (Router) Appl. Process Other Mgt. Tasks IPC Mgt. Tasks Multiplexing IPC Resource Mgt. SDU Protection Inter DIF Directory Appl. Process Mgmt Agemt System (Host) DIF Mgmt Agemt IPC Process Shim IPC Process IPC Process IPC Process Shim DIF over TCP/UDP Shim IPC Process Shim IPC Process Shim DIF over Ethernet Mgmt Agemt Shim IPC Process IPC API Data Transfer Layer Management Data Transfer Control CACEP SDU Delimiting Legacy Net. stack DataTransfer Transfer Data Data Transfer Relaying and Multiplexing SDU Protection Kernel User State Vector State Vector State Vector NICs Retransmission Retransmission Retransmission Control Control Control RIB Daemon Flow Allocation Authentication Resource Allocation CDAP Parser/Generator Flow Control Flow Control Flow Control Routing RIB Enrollment Namespace Management Security Management 3 … what was needed next? IPC Process System (Host) IPC API Data Transfer Layer Management Data Transfer Control CACEP SDU Delimiting Relaying and Multiplexing SDU Protection State Vector State StateVector Vector DataTransfer Transfer Data Data Transfer Retransmission Retransmission Retransmission Control Control Control RIB Daemon Authentication Resource Allocation CDAP Parser/Generator Flow Control Flow Control Flow Control RIB Flow Allocation Routing Enrollment Namespace Management Security Management Increasing timescale (functions performed less often) • Start thinking about performances • Allow RINA to lay on all the devices OSes support nowadays • Move to a more mature prototype 4 Where did we start … • We decided to implement (part of) the IPC Process functionalities in kernelspace … System (Host) System (Router) Appl. Process IPC Process Mgmt Agemt Shim IPC Process DIF IPC Process Shim DIF over TCP/UDP Shim IPC Process Shim IPC Process Appl. Process Mgmt Agemt System (Host) IPC Process Shim DIF over Ethernet Mgmt Agemt Shim IPC Process IPC API Data Transfer SDU Delimiting Relaying and Multiplexing State State State Vector Vector Vector DataTransfer Transfer Data Data Transfer Layer Management Data Transfer Control Retransmission Retransmission Retransmission Control Control Control CACEP RIB Daemon Flow Control Flow Control Flow Control SDU Protection RIB Authentication CDAP Parser/Generator Flow Allocation Resource Allocation Enrollment Routing Namespace Management Security Management • Which ones ? How do we split them ? How do they communicate ? How can we increase performances … • … various possibilities … 5 What goes where? • We placed SW components in different “paths”, depending on their timing requirements… – – Data transfer → stringent timings → kernel-space Layer Management → loose timings → user-space System (Host) System (Router) Appl. Process Mgmt Agemt IPC Process DIF IPC Process Shim DIF over TCP/UDP Shim IPC Process Shim IPC Process Shim IPC Process Appl. Process Mgmt Agemt IPC Process Shim DIF over Ethernet Relaying and Multiplexing State State State Vector Vector Vector DataTransfer Transfer Data Data Transfer Retransmission Retransmission Retransmission Control Control Control CACEP RIB Daemon Flow Control Flow Control Flow Control RIB • Authentication CDAP Parser/Generator Flow Allocation Resource Allocation Enrollment Routing Namespace Management SDU Protection Kernel Shim IPC Process Layer Management Data Transfer Control SDU Delimiting Mgmt Agemt User Kernel IPC API Data Transfer System (Host) Security Management User The data-transfer parts were going to reside in kernel-space… 6 Layer management & OS processes • We decided to keep the layer management functionalities of each IPC Process Daemon in a separate OS process – 1 OS process ↔ 1 IPC Process Daemon instance • That approach targets at: – A more “reliable” (SW) solution • IPC Processes can have problems without interfering each-other (too much) – A tight work with the OS • Let the OS do what it is for: manage the resources among its processes • • However, another entity was needed… IPC Manager: – – – – Manages the IPC Processes lifecycle Broker between applications and IPC Processes Local management agent … IPC Process IPC Process Daemon IPC Process Daemon Daemon N IPC Manager Daemon 1 User Kernel Kernel 7 Inter-communications … • OS Processes request services to the kernel via syscalls – – • Modern *NIX systems extend the user/kernel communication mechanisms – • Netlink syscalls Netlink, uevent, devfs, procfs, sysfs etc. IPC Process IPC Process Daemon IPCP Dmn. Daemon Layer mgmt. We needed a “bus-like” mechanism – – • Application Application Application Application Application User originated (user → kernel) “Unicast” User OR kernel originated Unicast/Multicast/broadcast syscalls N IPC Manager Daemon Netlink (& syscalls) User Kernel We adopted syscalls + Netlink – Syscalls (fast-path): • – Bootstrapping the IPCP and then SDUs R/W (fast-path) Netlink (mostly slow-path): • 1 Kernel IPCP Dmn. Data Transfer 1 Management, configuration, notifications … 8 Avoid (major) problems & abstract comms • Syscalls are “wrapped” by libc (glibc in OS/Linux) – • Libraries are normally used to “hide” Netlink mechanisms (libnl family) – • • i.e. syscall(SYS_write, …) → write(…) However, retaining Netlink details (quite often) A change in the kernel/user API implies changes in user-space All applications in the OS are linked to glibc – Changes to the syscalls → changes to glibc • Breaking glibc could break the whole host – • • • • Sandboxed environments are necessary Dependencies invalidation → Time consuming compilations That sort of changes are really hard to get approved upstream … we introduced librina as the initial way to overcome these problems … Application libc kernel RINA fn’s Application libc librina kernel 9 Librina (HL) SW architecture • It started as the placeholder for the common functionalities shared among IPC Process Daemon, IPC Manager Daemon and applications … … and became (on purpose) a framework event-based/multi-threaded framework with bindings for interpreted languages (SWIG) – • Configure PDU Forwarding Table • Create / delete EFCP instances • Allocation of resources to support a flow Application eventPoll() eventWait() • Allocate / deallocate flows • Creation • Read / write SDUs to flows • Deletion • Configuration• Register/unregister to 1+ DIF(s) eventPost() common cdap faux-sockets sdu-protection ipc-process ipc-manager application API Core components framework • Event Queue NetlinkManager librina NetlinkSession NetlinkSession NetlinkSessions RINA Manager nl_send() / nl_recv() Syscall wrappers syscall(SYS_*) libnl / libnl_genl User kernel RINA Netlink RINA syscalls 10 High level software architecture (1st take) System (Host) Security Management Namespace Management Enrollment ipcmd rinad (C++) Language X imports Third parties SW Packages (Applications) SWIG HL wrappers (Language X) Language X “NI” Core RIB Routing CDAP Parser/Generator Resource Allocation Authentication SWIG LL wrappers (C++, for language X) API (C) Flow Control Flow Control Flow Control Retransmission Retransmission Retransmission Control Control Control Data Transfer Control IPC Proces IPC API RIB Daemn. CACEP Layer Management Mgmt Agent Flow Allocation ipcpd API (C++) Core (C++) Shim IPC Process SDU Protection Relaying and Multiplexing Transfer DataTransfer Data Data Transfer SDU Delimiting Data Transfer State Vector State Vector State Vector librina (C++) libnl / libnl-gen Netlink & syscalls Linux with RINA extensions 11 Details on the user space framework IPC Manager Daemon Main logic DIF allocator Local Management agent RIB & RIB Daemon Normal IPC Process IPC (Layer ProcessManagement) Daemon (Layer Management) librina Application A Application A Application Application logic Netlink sockets System calls Netlink sockets Sysfs Netlink sockets PDU Forwarding Table Generation RIB & RIB Daemon Resource allocation Flow allocation librina System calls Enrollment librina System calls Netlink sockets Sysfs User space Kernel • IPC Manager Daemon – – – – • Manages the IPC Processes lifecycle Broker between applications and IPC Processes Local management agent DIF Allocator client (to search for applications not available through local DIFs) IPC Process Daemon – Layer Management components of the IPC Process (RIB Daemon, RIB, CDAP parsers/generators, CACEP, Enrollment, Flow Allocation, Resource Allocation, PDU Forwarding Table Generation, Security Management) 12 IPC Manager Daemon Message Message IPC Manager Daemon (C++) classes Console classes classes IPC Manager core classes IPC Process Manager Flow Manager Application Registration Manager Call operation on IPC Manager core classes Command Line Interface Server Thread Operation result Call IPC Process Factory, IPC Process or Application Manager local TCP Connection CLI Session Message Message classes Config classes classes Call operation on IPC Manager core classes Main event loop Bootstrapper Configuration file EventProducer.eventWait() EventProducer.eventWait() librina IPC Process IPC Process Factory Message Message classes Model classes classes Message Message classes Event classes classes Event Producer Application Manager System calls Netlink Messages 13 IPC Process Daemon IPC Process Daemon (Java) Supporting classes CDAP parser Delimiter Encoder Layer Management function classes Enrollment Task Flow Allocator Resource Allocator Forwarding Table Generator Registration Manager RIB Daemon Resource Information Base (RIB) RIBDaemon. sendCDAPMessage() RIBDaemon.cdapMessageReceived() Call IPCManager or KernelIPCProcess CDAP Message reader Thread Main event loop EventProducer.eventWait() KernelIPCProcess.writeMgmtSDU() KernelIPCProcess.readMgmtSDU() librina (C++) KernelIPC Process IPC Manager System calls Message Message classes Model classes classes Message Message classes Event classes classes Netlink Messages Event Producer 14 State Vector State Vector State Vector IPC API Flow Control Flow Control Flow Control Retransmission Retransmission Retransmission Control Control Control Data Transfer Control RIB RIB Daemn. Namespace Management Routing CDAP Parser/Generator Authentication CACEP Security Management Enrollment Resource Allocation Flow Allocation Layer Management Kernel space Framework User space Framework SDU Protection Relaying and Multiplexing Transfer DataTransfer Data Data Transfer SDU Delimiting Data Transfer High level software architecture (2nd take) ipcpd PFT ipcmd rinad RNL RMT Third parties SW Packages SWIG HL wrappers (Language X) SWIG LL wrappers (C++, for language X) API (C) API (C++) Core (C++) libnl / libnl-gen syscalls Netlink Personality mux/demux KIPCM core KFA KIPCM IPCP Factories Normal IPC P. shim-eth-vlan EFCP shim-dummy RINA-ARP 15 User/kernel interface: KIPCM + RNL • interface = syscalls + Netlink messages • Kernel IPC Manager (KIPCM): – Manages the syscalls • Syscalls: a small-numbered, well defined set of calls (#8) : – IPCs: ipc_create and ipc_destroy – Flows: allocate_port and deallocate_port – SDUs: sdu_read, sdu_write, mgmt_sdu_read and mgmt_sdu_write • RINA Netlink Layer (RNL): – Manages the Netlink part • Abstracts message’s reception, sending, parsing & crafting • Netlink: #36 message types (with dynamic attributes): – assign_to_dif_req, assign_to_dif_resp, dif_reg_notif, dif_unreg_notif… • Partitioning: – Syscalls → KIPCM → “Fast-path” (read and write SDUs) – Netlink → RNL → “Slow-path” (conf and mgmt) 16 From recursion to iteration: KIPCM & KFA • The Kernel Flow Allocator (KFA) User space – Manages ports and flows – Ports • Flow handler • Port ID Manager syscalls Netlink KIPCM KFA – Flows • maps: port-id → ipc-process-instance • The KIPCM: – Manages the lifecycle the IPC Processes – Abstracts IPC Process instances • Same API for all the IPC Processes regardless the type • maps: ipc-process-id → ipc-process-instance • Recursion in kernel-space considered harmful Normal IPCP i/f EFCP Shim IPCP RMT PFT OUT IN • They are the point where “recursion” is transformed into “iteration” 17 Recursion and IPC Processes i/f • The architecture describes – the (Normal) IPC Processes – The Shim IPC Processes • W.r.t. “DIF stacking” 2 (Normal) IPC Process 1 (Normal) IPC Process 0 Shim IPC Process – Normal IPC Processes • Have “compatible” NB/SB interfaces • Have “full-fledged” functionalities – Shim IPC Processes: • Have a “compatible” NB interface • They wrap the technology they are laid over – Minimum veneer over legacy technologies! Hardware • They don’t have a “SB” interface 18 Normal & Shim IPC Processes • The stack provides the implementation of the “normal” IPC Process – DTP, DTCP, RMT, PDU Forwarding Table functionalities • There are currently 4 shims implemented: – shim-dummy: • Confined into a single host (“loopback”) • Used for debugging & testing the stack – shim-eth-vlan: • • • • Runs over 802.1Q Uses our version of ARP implementation Offers 1 unreliable QoS cube VLAN-id = DIF name – shim-tcp-udp: • Allows RINA to run “over” TCP/UDP • Offers 2 QoS cubes: – Reliable: mapped over a TCP socket (each flow, a different socket) – Unreliable: mapped over UDP socket (1 socket for all the flows) 19 Shim IPC Processes (cont.) • shim-hv: – Allows the stack to run in virtualised environments • QEMU/KVM and Xen – Works only with “shared memory” buffers (VMPI/VirtualQueues) – Offers 1 QoS cube – This shim is enough to allow RINA to take advantages of HV environments • Get rid of software bridges and TCP/IP stack ! 20 The Open Source initiative • After almost 2 years of continuous development the code-base was made available as Open Source material on GitHub: – http://github.com/IRATI • It provides the implementation of the following (major) functional blocks: – – IPC Manager daemon • • • Manages IPC Processes lifecyle Broker between applications and IPC Process DIF allocator client (to search for applications not available through local DIFs) • Transport and management layers • Has routing functionalities (link-state based routing) IPC Process daemon – Provides unreliable and reliable flows functionalities – A set of shims: – – • shim-eth-vlan • shim-hv(KVM/Qemu & Xen flavours) • shim-tcp-udp • shim-dummy (testing) A library for building native-RINA applications A testing/debugging framework • • Regression (runs at build-time, installation-time …) A testing application: rina-echo-time 21 Ongoing works … • The stack: – Implements the core functionalities of the RINA architecture – Its policies are hardwired ... • ... we were in the need of enabling the customization capabilities provided by the architecture • Leverage on the stack, maturing a RINA SDK – Define the API for each SW component having a policy – Allow extension modules to be plugged in and out of the prototype – Allow to dynamically load & accept changes on its behaviours at runtime 22 Pluggin’ policies, places RcvrInactivityTimer SndrInactivityTimer InitialSequenceNumber TransmissionControl Authentication RTTExtimation SenderACK MonitorNMinus1Flow NMinus1FlowDown IPC API Data Transfer Relaying and Multiplexing Checksum Compression Encryption TTL State Vector State StateVector Vector DataTransfer Transfer Data Data Transfer Layer Management Data Transfer Control SDU Delimiting MaxQ RMTQMonitor RMTScheduling NewFlowRequest AllocateRetry Retransmission Retransmission Retransmission Control Control Control CACEP Authentication RIB Daemon CDAP Parser/Generator Flow Control Flow Control Flow Control RIB Namespace Management User Resource Allocation Enrollment Routing SDU Protection Kernel Flow Allocation RoutingAlgorithm Security Management NewMemberAccessControl NewFlowAccessControl RIBAccessControl • ... policies are in both spaces ... 23 RINA Plugins Infrastructure • The RINA Plugins Infrastructure (RPI) • • Plugin = policy code + framework The “framework” is ... all the functionalities required to use custom policies in the stack: – Workflow: Load, plug, select, unselect, unplug and unload • Since the stack is split in two halves ... – RPI must comply with both kernel and user spaces characteristics ... • ... RPI must be split in two as well: • Policy set = A set of policies (in the same SW component) that can share state – Kernel RPI (kRPI) → leverages on LKM – User RPI (uRPI) → leverages on SO – This way: different policies - in the same component - can share state in a plugin-specific way 24 Components addressing • Address of an IPC Process component in a processing system: • IPC Process ID (uint) • Path in the IP Process component tree • Example: • Custom passwd policy-set for Security Manager is addressed by • Security-manager.passwd 25 ... and routing (between spaces) ... • Commands (e.g. select a behaviour or set a value)turned into Netlink request messages • Requests routed to user-space or kernel-space – depending on the addressed component • Response messages received back 26 … Next steps • Improvements and new functionalities: – Short terms: • Export a subset of the policies – Medium terms: • Consolidate the RPI framework & export a larger set of policies • Librina – Bindings for interpreted languages (Java, Python) – Subsetting: librina-rib, librina-application, … • A RINA Traffic generator • A multi-node configuration building tool – Medium/long-terms: • Implement a Management Agent • Minimise user-/kernel- spaces differences w.r.t. writing policies • (in parallel) – keep the implementation in-sync with the specs – Hardening , cleanup, increase performances, reduce memory consumption, … 27 Have a look && join us! http://irati.github.io/stack https://github.com/IRATI http://www.freelists.org/list/irati Thanks! 28