Update on Lustre and OpenSFS
Transcription
Update on Lustre and OpenSFS
Update on Lustre and OpenSFS John Carrier Cray, Inc 3/27/2012 1 Outline • Why is there a Lustre talk at an IB conference? • How is OpenSFS ensuring Lustre’s future? 2 Lustre… • …is a highly scalable, parallel file system – O(10,000) client nodes – supports both file per process (N-N) and shared file (N-1) access • …runs on the world’s fastest supercomputers – 64 of Top 100, 40 of Top 50, 8 of Top 10 • …is extremely fast – Two systems today over 240 GB/s – Three file systems to attempt 1TB/s this year! • …is a major consumer of InfiniBand 3 Lustre Overview Metadata Server (MDS) MDT MGT Management Server (MGS) MDS MDS MGS MGS client ... client >80K clients Client OSS Ethernet InfiniBand SeaStar Gemini RDMA OSS OST OSS OST ... OSS OST OSS OST OST OST ... client MDS Object Storage Servers (OSS) Object Storage Targets (OST) ... OST OST >8K OSTs 4 Shared Site-wide Lustre MDT MGT MDS MDS MGS MGS OST OSS OST ... OST OST ... ... InfiniBand Fabric OSS OSS OSS OST OST ... OST OST Single file system shares data with multiple compute resources 5 Lustre Networking • Lustre remote procedure calls use the Lustre Network (LNET) Message Passing API for client-server communication • Lustre Networking Drivers (LNDs) transport LNET messages across multiple network types • LNDs make Lustre agnostic to the underlying transport • LNET and LNDs can exploit RDMA for efficient, low CPU tranfsers • Routing is possible at the LNET layer on nodes with two or more network interfaces LNET Users LNET API Generic LNET LND API LND LND LND Network Specific APIs 6 Lustre Router Software Stack Cray XE6/XK6 System Compute Node Lustre MDS or OSS IO Node Application Linux VFS Lustre Client Lustre Router Lustre Server LNET LNET LNET LNET Gemini LND Gemini LND GNI driver GNI driver OFED LND OFED RDMA IB driver OFED LND OFED RDMA IB driver Gemini Gemini IB HCA IB HCA Cray HSN InfiniBand TOR IB Switch Key: Cray Lustre IB/OFED 7 LNET Fine Grain Routing • Lustre networking can create logical subnets within an IB fabric • Define multiple LNETs to isolate I/O to specific physical paths through the fabric • LNET Fine Grain Routing groups routers and OSSes connected through the same leaf switch • Without link aggregation, failover and load balance at the router, not the HCA HSN LNET GNI0.XIO1-4.IB_RED.OSS1-3 HSN LNET GNI0.XIO5-8.IB_BLU.OSS4-6 XIO Blade IO IO IO IO 1 2 3 4 XIO Blade IO IO IO IO 5 6 7 8 QDR IB QDR IB IB LNET IB_RED.OSS1-3 OSS1 OSS2 OSS3 OSS4 OSS5 OSS6 IB LNET iB_BLU.OSS4-6 8 http://www.opensfs.org/ http://lists.opensfs.org/ 9 Lustre Community Transition • In 2010, the Lustre Community moved to support Lustre on its own – Two companies—Whamcloud & Xyratex—hire Lustre developers and offer level three support to organizations – Three community groups—OpenSFS, HPCFS, EOFS—form to offer venues for collaboration and support for end-users • In 2011, the Community consolidated – The three community groups quickly aligned their efforts to create a unified, global effort to support Lustre – OpenSFS gathered community requirements, solicited proposals, then funded a 2 year development contract to improve Lustre metadata performance and scalability – Whamcloud created Lustre 2.1 as the first community release and OpenSFS agreed to fund future releases • In 2012, the community grows … 10 What is OpenSFS? • OpenSFS is a vendor-neutral, member-supported nonprofit organization bringing together the open source file system community for the high performance computing sector • Our mission is to aggregate community resources and be the center of collaborative activities to ensure efficient coordination of technology advancement, development and education The end goal is the continued evolution of robust open source file systems for the HPC community 11 Who is OpenSFS? • Promoters • Adopters • Supporters 12 OpenSFS Working Groups • Technical Working Group – Gather requirements from community – Propose and manage development projects – Generate Lustre feature roadmap • Community Development Working Group – Manage Lustre releases – Coordinate release roadmap for new features • Benchmarking Working Group – Investigate HPC user workloads – Define tests to evaluate file system scalability • WAN Working Group – Coordinate use cases and features for wide-area Lustre 13 OpenSFS Development Projects Feature Single Server Metadata Performance Improvements Purpose Scale-up strategy to remove MDS processing bottlenecks Distributed Namespace Scale-out strategy to enable multiple MDS per file system Monitor, validate, and repair file system state on-line Lustre File System Checker WAN authentication Improve support for Lustre as a wide-area file system Project SMP Node Affinity Parallel Directory Operations Remote Directories Striped Directories Inode Iterator & OI Scrub MDT-OST Consistency MDT-MDT Consistency UID Mapping GSSAPI Extensions 14 Lustre Community Roadmap Source:http://wiki.whamcloud.com/display/PUB/Community+Lustre+Roadmap 15 Upcoming Activities • Lustre Users Group (LUG) – Annual meeting is April 23-25, 2012 in Austin, TX – See http://www.opensfs.org/lug for more info • OpenSFS Working Groups – Community Release WG • Whamcloud & OpenSFS to release Lustre 2.2 this spring • Seeking contributions for future releases – Technical WG • Gathering requirements for new development contracts – Benchmarking WG • Defining workloads, investigating tests – WG F2F meetings Sun 4/22 & Weds 4/25 in Austin 16 Summary • Lustre uses RDMA natively and drives deployment of large scale IB fabrics • OpenSFS members are funding Lustre development and future releases • If your organization uses Lustre, contribute to the next release by joining OpenSFS http://www.opensfs.org/ http://lists.opensfs.org/ 17 Thank You 18