CISL Operations and Services Update
Transcription
CISL Operations and Services Update
CISL Update p Operations and Services CISL HPC Advisory Panel Meeting 29 October 2009 Anke Kamrath [email protected] Operations and Services Division Computational and Information Systems Laboratory 1 CHAP Meeting 14 May 2009 Overview • Staff Comings and Goings in OSD • Computing, Storage, Data and Software Updates • University Workshops • Preparations for NWSC already underway – – – – – Security Review User Surveys Accounting/Allocation Systems HPSS Migration RFP Process for NWSC Procurement 2 CHAP Meeting 29 October 2009 Staff Comings and Goings… • Changes Ch – Director of Operations and Services • Anke Kamrath, Joined NCAR July, 2009 – Consulting Services Group (CSG) • Si Liu, consultant replaced Wei Huang (moved to VETS) – Computer Production Group (CPG) • Ronald R ld Reed, R d Operator O t replaced l d Ken K Albertson Alb t (moved ( d to t RAL) – Data Analysis Systems Group (DASG) • Kendall Southwick, Student Assistant on Vapor – Supercomputer Systems Group (SSG) • Irfan Elahi, promoted to group head • Russell Gonsalves • Nate Rini • Openings – CSG Consulting Position. SE III (Mike Page Replacement) – DASG Vapor Position. Position SE I opening opening. – Co-location Manager Position 3 CHAP Meeting 29 October 2009 Computing Update • Strong Demand for Bluefire – 2 Large memory bluefire nodes redeployed into batch pool. Bluefire now has 120 32processor batch nodes – IPCC AR5 Ramping up (Sept 09-Dec 10) • 12M GAUs needed to complete • Will Generate 1.3PB data – High Utilization, Low Queue Wait Times • Decommissioning Lightning and Pegasus • 3 Racks Added to Frost (BlueGene System) ) – TeraGrid resource (>20 Tflops) 4 CHAP Meeting 29 October 2009 High Utilization, Low Queue Wait Times Bluefire System Utilization (daily average) 100.0% • Bluefire system utilization is routinely >90% • Average queue-wait time < 1hr 90.0% 80.0% 70.0% 60.0% 50.0% 40 0% 40.0% 30.0% Spring ML Powerdown May 3 Firmware & Software Upgrade 20.0% 10.0% 14000 Nov-09 Oct-09 Sep-09 Aug-09 Jul-09 Jun-09 May-09 Apr-09 Mar-09 Feb-09 Jan-09 Economy 8000 Dec-08 Regular Nov-08 0.0% Premium 10000 Standby 6000 Average Queue Wait Times for User Jobs at the NCAR/CISL Computing Facility 4000 2000 >2d <2d <1d <8h Peak TFLOPs > <4h <1h Queue-Wait Time <2h <40m <10m <20m <2m 0 <5m # of Jobs 12000 Queue Premium Regular Economy Stand-by 5 bluefire lightning blueice bluevista since Jun'08 since Dec'04 Jan'07-Jun'08 Jan'06-Sep'08 77.0 TFLOPs all jobs 00:05 00:36 01:52 01:18 1.1 TFLOPs all jobs 00:11 00:16 01:10 00:55 12.2 TFLOPs all jobs 00:11 00:37 03:37 02:41 4.4 TFLOPs all jobs 00:12 01:36 03:51 02:14 5 CHAP Meeting 29 October 2009 Archive Update • NCAR MSS exceeds 8 PBs total data stored – Sep p 2009 • MSS Transition to HPSS – Target date Jan 2011 – Erich to p present more on this. • AMSTAR Data Ooze (migrating data to new tape technology) – Optimized ooze software deployed July 2009. Streams tape to tape, creates two copies in parallel. – Oozing on average 20+ TBs per day, 6 PBs of data to ooze of which 4 PBs are unique – Estimated E ti t d completion l ti d date t M Mar-Apr A 2010 well ll ahead h d off the th Dec 2010 EOL on Powderhorn tape libraries 6 CHAP Meeting 29 October 2009 7 CHAP Meeting 29 October 2009 Storage Update: D t Services Data S i Redesign R d i • Pending management review • Near Near-term term Goals: – Creation of unified and consistent data environment for NCAR HPC – High-performance availability of central filesystem from many projects/systems (RDA, ESG, CAVS, Bluefire, TeraGrid) • Longer term Goals: Longer-term – Filesystem cross-mounting – Global WAN filesystems (GPFS, and/or Lustre) • Tentative Schedule: – Phase 1: March 2010 8 CHAP Meeting 29 October 2009 Data Collections Update: Major ajo update upda e of o International e a o a Co Comprehensive p e e s e OceanOcea Atmosphere Data Set (ICOADS) in the RDA • • • • Release e ease 2.5 5 covers 16622007 Many data sources added over the full period of record NEW monthly automated update procedure adds preliminary data, now for p 2009 2008 - Sept. ICOADS is a 25year collaborative project with NOAA Annual distribution (1937-2007) of major platform types in Release 2.5 shown as millions 9 CHAP Meeting of reports per year. 29 October 2009 Software Update • Version 1.5.0 released – Numerous new features aimed at WRF community • Geo-referenced imagery • Integration with NCL • Moving domains – Block structured AMR Grids – Ease of use improvements • New funding streams – NSF XD Vis Award d • NCAR, TACC, Utah, Purdue • 2009-2012, $386k • Focus: petascale DAV – TeraGrid GIG Award • 2009-2011, $112k • Focus: progressive access data models This typhoon yp Jangmi g image g from MMM’s Bill Kuo and Wei Wang illustrates several new capabilities. A moving, nested domain is tracked by VAPOR. Satellite imagery and plots generated by NCL are correctly positioned in the 4D scene with geo-referencing. 10 10 CHAP Meeting 29 October 2009 Support for Summer ‘09 Workshops • Climate Modeling Primer, July 27-31 – Organized by CGD (Gettelman lead) and held at NCAR – 41 participants, mostly graduate students – Ran low resolution versions of CAM; some work with CCSM – Provided 10-15 dedicated nodes and overview of using bluefire to participants 11 CHAP Meeting 29 October 2009 Support for Summer ‘09 Workshops • Scaling to Petascale, August 3-7 – Organized by Great Lakes Consortium for Petascale Computation – Supported by the NSF through the Blue Waters Project (NCSA and others) – Participants joined in from four locations using high-definition high definition streaming video – Ran a variety of applications – Provided 16 dedicated nodes during hands-on sessions 12 CHAP Meeting 29 October 2009 Support for Summer ‘09 Workshops (continued) • Ice Sheet Modeling Workshop, August 314 – Organized by Christina Hulbe (Portland State University), Jesse Johnson (U of Montana), and Cornelis van der Veen (U of Kansas); supported by NSF – Brought current and future ice-sheet ice sheet scientists together to develop better models for the projection of future sea-level rise – Provided accounts for each participant and d di t d share dedicated h queue on bl bluefire fi since i using i single processor codes – Some participants will continue to use bluefire based on Johnson’s NSF award for this workshop http://websrv.cs.umt.edu/isis/index.php/Summer_Modeling_S chool 13 CHAP Meeting 29 October 2009 Preparations for NWSC Already Underway – Storage Environment • HPSS Migration • Data Services Redesign (DSR) • Data D t W Workflow kfl R Requirements i t Analysis A l i – Security Review • Balancing “Science and Security” • Phase 1: Underway for DSR • Phase 2: Global Filesystems – Accounting/Allocation Systems • Current System built in 70s. • Reviewing requirements • Leveraging other products (e.g., Gold, AmieGold) – RFP Process for NWSC Procurement • Tom Engel to Discuss Further – User Survey 14 CHAP Meeting 29 October 2009 User Survey Plans Q1 2010 survey off active ti users – Use to baseline current environment for NWSC – Understand where we’re doing g well and where we can improve • Investigate any area with >10% dissatisfaction – Evaluate future needs – Topics • Ease of access within current security environment • Ease of use (initially and on-going) • Consulting (include methodology of service delivery) • Documentation for users • Training desired and method of delivery Follow-up p survey y ~4 months after NWSC computer p available to users 15 CHAP Meeting 29 October 2009 On the road to a PetaFlop Computer… • And 6-15 Petabyte Filesystem • And +100 Petabyte Archive 16 CHAP Meeting 29 October 2009 CURRENT NCAR COMPUTING >100TFLOPS Peak TFLOPs at NCAR (All Systems) IBM POWER6/Power575/IB (bluefire) ICESS Phase 2 100 IBM POWER6/Power575/IB (firefly) IBM POWER5 POWER5+/p575/HPS / 575/HPS (blueice) IBM POWER5/p575/HPS (bluevista) 80 IBM BlueGene/L (frost) IBM Opteron/Linux (pegasus) 60 IBM Opteron/Linux (lightning) ICESS Phase 1 IBM POWER4/Federation (thunder) 40 IBM POWER4/Colony (bluesky) bluefire ARCS Phase 4 ARCS Phase 3 IBM POWER4 (bluedawn) Linux ARCS Phase 2 SGI Origin3800/128 20 blueice lightning/pegasus ARCS Phase 1 frost IBM POWER3 (blackforest) bluevista firefly bluesky IBM POWER3 (babyblue) blackforest 0 J 00 Jan-00 J 01 Jan-01 J 02 Jan-02 J 03 Jan-03 J 04 Jan-04 J 05 Jan-05 J 06 Jan-06 J 07 Jan-07 J 08 Jan-08 J 09 Jan-09 J 10 Jan-10 17 CHAP Meeting 29 October 2009 NWSC HPC Projection P k PFLOP Peak PFLOPs att NCAR 2.0 NWSC HPC (uncertainty) 1.8 NWSC HPC 1.6 1.4 IBM POWER6/Power575/IB (bluefire) 1.2 IBM POWER5+/p575/HPS (blueice) 1.0 IBM POWER5/p575/HPS (bluevista) 0.8 IBM BlueGene/L (frost) 0.6 IBM Opteron/Linux (pegasus) 04 0.4 ICESS Phase 2 IBM Opteron/Linux (lightning) ICESS Phase 1 0.2 ARCS Phase 4 Jan-04 Jan-05 Jan-06 IBM POWER4/Colony (bluesky) bluefire bluesky frost 0.0 Jan-07 Jan-08 Jan-09 18 Jan-10 Jan-11 Jan-12 Jan-13 Jan-14 18 CHAP Meeting 29 October 2009 NCAR Data Archive Projection Total Data in the NCAR Archive (Actual and Projected) Total 120 Unique 100 60 40 20 19 19 Jan-15 5 Jan-14 4 Jan-13 3 2 Jan-12 Jan-11 Jan-10 0 Jan-09 9 Jan-08 8 Jan-07 7 Jan-06 6 Jan-05 5 Jan-04 4 Jan-03 3 2 Jan-02 Jan-01 Jan-00 0 0 Jan-99 9 P Petabytes 80 CHAP Meeting 29 October 2009 Questions and Discussion 20 CHAP Meeting 29 October 2009