Update on Lustre and OpenSFS

Transcription

Update on Lustre and OpenSFS
Update on Lustre and
OpenSFS
John Carrier
Cray, Inc
3/27/2012
1
Outline
• Why is there a Lustre talk at an IB conference?
• How is OpenSFS ensuring Lustre’s future?
2
Lustre…
• …is a highly scalable, parallel file system
– O(10,000) client nodes
– supports both file per process (N-N) and shared file
(N-1) access
• …runs on the world’s fastest supercomputers
– 64 of Top 100, 40 of Top 50, 8 of Top 10
• …is extremely fast
– Two systems today over 240 GB/s
– Three file systems to attempt 1TB/s this year!
• …is a major consumer of InfiniBand
3
Lustre Overview
Metadata Server
(MDS)
MDT
MGT
Management Server
(MGS)
MDS MDS MGS MGS
client
...
client
>80K clients
Client
OSS
Ethernet
InfiniBand
SeaStar
Gemini
RDMA
OSS
OST
OSS
OST
...
OSS
OST
OSS
OST
OST
OST
...
client
MDS
Object Storage Servers
(OSS)
Object Storage Targets
(OST)
...
OST
OST
>8K OSTs
4
Shared Site-wide Lustre
MDT
MGT
MDS MDS MGS MGS
OST
OSS
OST
...
OST
OST
...
...
InfiniBand
Fabric
OSS
OSS
OSS
OST
OST
...
OST
OST
Single file system shares data with
multiple compute resources
5
Lustre Networking
• Lustre remote procedure calls use
the Lustre Network (LNET) Message
Passing API for client-server
communication
• Lustre Networking Drivers (LNDs)
transport LNET messages across
multiple network types
• LNDs make Lustre agnostic to the
underlying transport
• LNET and LNDs can exploit RDMA
for efficient, low CPU tranfsers
• Routing is possible at the LNET
layer on nodes with two or more
network interfaces
LNET Users
LNET API
Generic LNET
LND API
LND
LND
LND
Network
Specific APIs
6
Lustre Router Software Stack
Cray XE6/XK6 System
Compute Node
Lustre MDS or OSS
IO Node
Application
Linux VFS
Lustre Client
Lustre Router
Lustre Server
LNET
LNET
LNET
LNET
Gemini LND
Gemini LND
GNI driver
GNI driver
OFED LND
OFED RDMA
IB driver
OFED LND
OFED RDMA
IB driver
Gemini
Gemini
IB HCA
IB HCA
Cray HSN
InfiniBand
TOR IB Switch
Key:
Cray
Lustre IB/OFED
7
LNET Fine Grain Routing
• Lustre networking can create
logical subnets within an IB
fabric
• Define multiple LNETs to isolate
I/O to specific physical paths
through the fabric
• LNET Fine Grain Routing
groups routers and OSSes
connected through the same
leaf switch
• Without link aggregation,
failover and load balance at the
router, not the HCA
HSN LNET GNI0.XIO1-4.IB_RED.OSS1-3
HSN LNET GNI0.XIO5-8.IB_BLU.OSS4-6
XIO Blade
IO IO IO IO
1 2 3 4
XIO Blade
IO IO IO IO
5 6 7 8
QDR IB
QDR IB
IB LNET
IB_RED.OSS1-3
OSS1
OSS2
OSS3
OSS4
OSS5
OSS6
IB LNET
iB_BLU.OSS4-6
8
http://www.opensfs.org/
http://lists.opensfs.org/
9
Lustre Community Transition
• In 2010, the Lustre Community moved to support Lustre on its own
– Two companies—Whamcloud & Xyratex—hire Lustre developers and
offer level three support to organizations
– Three community groups—OpenSFS, HPCFS, EOFS—form to offer
venues for collaboration and support for end-users
• In 2011, the Community consolidated
– The three community groups quickly aligned their efforts to create a
unified, global effort to support Lustre
– OpenSFS gathered community requirements, solicited proposals, then
funded a 2 year development contract to improve Lustre metadata
performance and scalability
– Whamcloud created Lustre 2.1 as the first community release and
OpenSFS agreed to fund future releases
• In 2012, the community grows …
10
What is OpenSFS?
• OpenSFS is a vendor-neutral, member-supported nonprofit organization bringing together the open source file
system community for the high performance computing
sector
• Our mission is to aggregate community resources and
be the center of collaborative activities to ensure efficient
coordination of technology advancement, development
and education
The end goal is the continued evolution of robust
open source file systems for the HPC community
11
Who is OpenSFS?
• Promoters
• Adopters
• Supporters
12
OpenSFS Working Groups
• Technical Working Group
– Gather requirements from community
– Propose and manage development projects
– Generate Lustre feature roadmap
• Community Development Working Group
– Manage Lustre releases
– Coordinate release roadmap for new features
• Benchmarking Working Group
– Investigate HPC user workloads
– Define tests to evaluate file system scalability
• WAN Working Group
– Coordinate use cases and features for wide-area Lustre
13
OpenSFS Development
Projects
Feature
Single Server Metadata
Performance
Improvements
Purpose
Scale-up strategy to
remove MDS processing
bottlenecks
Distributed Namespace
Scale-out strategy to
enable multiple MDS per
file system
Monitor, validate, and
repair file system state
on-line
Lustre File System
Checker
WAN authentication
Improve support for
Lustre as a wide-area file
system
Project
SMP Node Affinity
Parallel Directory
Operations
Remote Directories
Striped Directories
Inode Iterator & OI Scrub
MDT-OST Consistency
MDT-MDT Consistency
UID Mapping
GSSAPI Extensions
14
Lustre Community Roadmap
Source:http://wiki.whamcloud.com/display/PUB/Community+Lustre+Roadmap
15
Upcoming Activities
• Lustre Users Group (LUG)
– Annual meeting is April 23-25, 2012 in Austin, TX
– See http://www.opensfs.org/lug for more info
• OpenSFS Working Groups
– Community Release WG
• Whamcloud & OpenSFS to release Lustre 2.2 this spring
• Seeking contributions for future releases
– Technical WG
• Gathering requirements for new development contracts
– Benchmarking WG
• Defining workloads, investigating tests
– WG F2F meetings Sun 4/22 & Weds 4/25 in Austin
16
Summary
• Lustre uses RDMA natively and drives
deployment of large scale IB fabrics
• OpenSFS members are funding Lustre
development and future releases
• If your organization uses Lustre, contribute to
the next release by joining OpenSFS
http://www.opensfs.org/
http://lists.opensfs.org/
17
Thank You
18