Gfarm Grid File System Osamu Tatebe University of Tsukuba
Transcription
Gfarm Grid File System Osamu Tatebe University of Tsukuba
First NEGST workshop June 23-24, 2006, Paris Gfarm Grid File System Osamu Tatebe University of Tsukuba Petascale Data Intensive Computing High Energy Physics z CERN LHC, KEK-B Belle z ~MB/collision, 100 collisions/sec z ~PB/year z2000 physicists, 35 countries Detector for LHCb experiment Detector for ALICE experiment Astronomical Data Analysis z data analysis of the whole data z TB~PB/year/telescope z Subaru telescope z 10 GB/night, 3 TB/year Petascale Data Intensive Computing Requirements Storage Capacity Peta/Exabyte scale files, millions of millions of files Computing Power > 1TFLOPS, hopefully > 10TFLOPS I/O Bandwidth > 100GB/s, hopefully > 1TB/s within a system and between systems Global Sharing group-oriented authentication and access control Does Grid computing helps? Yes, of course !! Federation of resources increases storage capacity and computing power in total But, reality is . . . Federation does not immediately mean a big highperformance computer Many many small resources not tightly coupled at distant locations Sharing large-scale data at 100GB/s rate not possible Limited type of applications can be applied to Network bandwidth improvement helps to be, but Grid middleware needs to be matured Application modification required to fully utilize resources Grid MW should be a distributed OS that hides a detail Recap - What we really want A *High-Performance* Computer > 10TFLOPS – scalable computing power A *High-Performance* File System > 1PB – scalable capacity > 1TB/s – scalable I/O bandwidth Can we realize in Grid??? POINT: Access locality in space How to couple distributed storage and distributed computing is a key Gfarm Grid File System (1) Virtual file system that federates local storage of commodity PCs (cluster nodes or Grid nodes) Enables transparent access using Global namespace to dispersed file data in a Grid Supports fault tolerance and avoids access concentration by automatic replica selection It can be mounted by all cluster nodes and clients Global namespace /gfarm ggf aist file1 jp gtrc file2 file1 file3 Gfarm File System mapping file2 file4 File replica creation Gfarm Grid File System (2) Files can be shared among all nodes and clients Physically, it may be replicated and stored on any file system node Applications can access it regardless of its location File system nodes can be distributed GridFTP, samba, NFS server GridFTP, samba, NFS server Compute & fs node /gfarm Gfarm metadata server metadata Compute & fs node File A Compute & fs node Compute & fs node File File B C Compute & fs node Compute & fs node File A Compute & fs node Compute & fs node Compute & fs node Compute & fs node US File B Compute & fs node Japan File C Client PC File A File C File B Gfarm file system Note PC … Scalable I/O performance in distributed environment User’s view Physical execution view in Gfarm (file-affinity scheduling) User A submits Job A that accesses File A Job A is executed on a node that has File A User B submits Job B that accesses File B Job B is executed on a node that has File B network Cluster, Cluster,Grid Grid CPU CPU CPU CPU Gfarm file system Shared network file system File system nodes = compute nodes Do not separate storage and CPU (SAN not necessary) Move and execute program instead of moving large-scale data exploiting local I/O is a key for scalable I/O performance Example - Gaussian 03 in Gfarm Ab initio quantum chemistry Package Install once and run everywhere No modification required to access Gfarm Compute Test415 (IO intensive test input) node 1h 54min 33sec (NFS) NFS vs GfarmFS 1h 0min 51sec (Gfarm) Parallel analysis of all 666 test inputs using 47 nodes Compute Compute Compute . . . node node node Write error! (NFS) Due to heavy IO load 16h 6m 58s (Gfarm) NFS vs GfarmFS Quite good scalability of IO performance Longest test input takes 15h 42m 10s (test 574) *Gfarm consists of local disks of compute nodes Software Stack Gfarm Application UNIX Application Windows Appli. (Word,PPT, Excel etc) UNIX Command Gfarm Command NFS Client Windows Client nfsd samba FTP Client SFTP / SCP Web Browser / Client ftpd sshd httpd Syscall-hook、GfarmFS-FUSE (that mounts FS in User-space) libgfarm - Gfarm I/O Library Gfarm Grid File System Feature of Gfarm Grid File System Commodity PC based scalable architecture Add commodity PCs to increase storage capacity in operation much more than petabyte scale Even PCs at distant locations via internet Adaptive replica creation and consistent management Create multiple file replicas to increase performance and reliability Create file replicas at distant locations for disaster recovery Open Source Software Linux binary packages, ports for *BSD, . . . It is included in Naregi, Knoppix HTC edition, and Rocks cluster distribution Existing applications can access w/o any modification Network mount, Samba, HTTP, FTP, … http://datafarm.apgrid.org/ SC03 Bandwidth SC05 StorCloud Toward Instant Grid File System Private IP issue Currently, each file system node needs to have a public IP address Evaluation of data? Probably, there may be some other issues Ongoing Collaboration JuxMem group, Gabriel Antoniu (IRISA/INRIA) JuxMem provides shared object access using JXTA juxmem_malloc, juxmem_mmap juxmem_acquire, juxmem_acquire_read, juxmem_release Brainstorming Summary Grid potentially has capability to enable Petascale Data Intensive Computing Grid MW needs to be matured Gfarm Grid File System Global virtual file system A dependable network shared file system in a cluster or a grid High performance data computing support Associates Computational Grid with Data Grid Collaboration with JuxMem group