CISL Operations and Services Update

Transcription

CISL Operations and Services Update
CISL Update
p
Operations and Services
CISL HPC Advisory Panel Meeting
29 October 2009
Anke Kamrath
[email protected]
Operations and Services Division
Computational and Information Systems Laboratory
1
CHAP Meeting
14 May 2009
Overview
• Staff Comings and Goings in OSD
• Computing, Storage, Data and Software
Updates
• University Workshops
• Preparations for NWSC already underway
–
–
–
–
–
Security Review
User Surveys
Accounting/Allocation Systems
HPSS Migration
RFP Process for NWSC Procurement
2
CHAP Meeting
29 October 2009
Staff Comings and Goings…
• Changes
Ch
– Director of Operations and Services
• Anke Kamrath, Joined NCAR July, 2009
– Consulting Services Group (CSG)
• Si Liu, consultant replaced Wei Huang (moved to VETS)
– Computer Production Group (CPG)
• Ronald
R
ld Reed,
R d Operator
O
t replaced
l
d Ken
K
Albertson
Alb t
(moved
(
d to
t RAL)
– Data Analysis Systems Group (DASG)
• Kendall Southwick, Student Assistant on Vapor
– Supercomputer Systems Group (SSG)
• Irfan Elahi, promoted to group head
• Russell Gonsalves
• Nate Rini
• Openings
– CSG Consulting Position. SE III (Mike Page Replacement)
– DASG Vapor Position.
Position SE I opening
opening.
– Co-location Manager Position
3
CHAP Meeting
29 October 2009
Computing Update
• Strong Demand for Bluefire
– 2 Large memory bluefire nodes redeployed
into batch pool. Bluefire now has 120 32processor batch nodes
– IPCC AR5 Ramping up (Sept 09-Dec 10)
• 12M GAUs needed to complete
• Will Generate 1.3PB data
– High Utilization, Low Queue Wait Times
• Decommissioning Lightning and
Pegasus
• 3 Racks Added to Frost (BlueGene
System)
)
– TeraGrid resource (>20 Tflops)
4
CHAP Meeting
29 October 2009
High Utilization, Low Queue Wait Times
Bluefire System Utilization (daily average)
100.0%
• Bluefire system
utilization is routinely
>90%
• Average queue-wait
time < 1hr
90.0%
80.0%
70.0%
60.0%
50.0%
40 0%
40.0%
30.0%
Spring ML Powerdown
May 3 Firmware & Software Upgrade
20.0%
10.0%
14000
Nov-09
Oct-09
Sep-09
Aug-09
Jul-09
Jun-09
May-09
Apr-09
Mar-09
Feb-09
Jan-09
Economy
8000
Dec-08
Regular
Nov-08
0.0%
Premium
10000
Standby
6000
Average Queue Wait Times for User Jobs at the
NCAR/CISL Computing Facility
4000
2000
>2d
<2d
<1d
<8h
Peak TFLOPs >
<4h
<1h
Queue-Wait Time
<2h
<40m
<10m
<20m
<2m
0
<5m
# of Jobs
12000
Queue
Premium
Regular
Economy
Stand-by
5
bluefire
lightning
blueice
bluevista
since Jun'08
since Dec'04
Jan'07-Jun'08
Jan'06-Sep'08
77.0 TFLOPs
all jobs
00:05
00:36
01:52
01:18
1.1 TFLOPs
all jobs
00:11
00:16
01:10
00:55
12.2 TFLOPs
all jobs
00:11
00:37
03:37
02:41
4.4 TFLOPs
all jobs
00:12
01:36
03:51
02:14
5
CHAP Meeting
29 October 2009
Archive Update
• NCAR MSS exceeds 8 PBs total data stored –
Sep
p 2009
• MSS Transition to HPSS
– Target date Jan 2011
– Erich to p
present more on this.
• AMSTAR Data Ooze (migrating data to new tape
technology)
– Optimized ooze software deployed July 2009. Streams tape to
tape, creates two copies in parallel.
– Oozing on average 20+ TBs per day, 6 PBs of data to ooze of
which 4 PBs are unique
– Estimated
E ti
t d completion
l ti
d
date
t M
Mar-Apr
A 2010 well
ll ahead
h d off the
th
Dec 2010 EOL on Powderhorn tape libraries
6
CHAP Meeting
29 October 2009
7
CHAP Meeting
29 October 2009
Storage Update:
D t Services
Data
S
i
Redesign
R d i
• Pending management review
• Near
Near-term
term Goals:
– Creation of unified and consistent data environment for NCAR HPC
– High-performance availability of central filesystem from many projects/systems
(RDA, ESG, CAVS, Bluefire, TeraGrid)
• Longer
term Goals:
Longer-term
– Filesystem cross-mounting
– Global WAN filesystems (GPFS, and/or Lustre)
• Tentative Schedule:
– Phase 1: March 2010
8
CHAP Meeting
29 October 2009
Data Collections Update:
Major
ajo update
upda e of
o International
e a o a Co
Comprehensive
p e e s e OceanOcea
Atmosphere Data Set (ICOADS) in the RDA
•
•
•
•
Release
e ease 2.5
5
covers 16622007
Many data
sources added
over the full
period of record
NEW monthly
automated
update procedure
adds preliminary
data, now for
p 2009
2008 - Sept.
ICOADS is a 25year collaborative
project with
NOAA
Annual distribution (1937-2007) of major
platform types in Release 2.5 shown as millions
9
CHAP Meeting
of reports per year.
29 October 2009
Software Update
• Version 1.5.0 released
– Numerous new features
aimed at WRF community
• Geo-referenced imagery
• Integration with NCL
• Moving domains
– Block structured AMR Grids
– Ease of use improvements
• New funding streams
– NSF XD Vis Award
d
• NCAR, TACC, Utah, Purdue
• 2009-2012, $386k
• Focus: petascale DAV
– TeraGrid GIG Award
• 2009-2011, $112k
• Focus: progressive access
data models
This typhoon
yp
Jangmi
g
image
g from MMM’s Bill Kuo and
Wei Wang illustrates several new capabilities. A
moving, nested domain is tracked by VAPOR. Satellite
imagery and plots generated by NCL are correctly
positioned in the 4D scene with geo-referencing.
10
10
CHAP Meeting
29 October 2009
Support for Summer ‘09 Workshops
• Climate Modeling
Primer, July 27-31
– Organized by CGD
(Gettelman lead) and held at
NCAR
– 41 participants, mostly
graduate students
– Ran low resolution versions
of CAM; some work with
CCSM
– Provided 10-15 dedicated
nodes and overview of using
bluefire to participants
11
CHAP Meeting
29 October 2009
Support for Summer ‘09 Workshops
• Scaling to Petascale, August 3-7
– Organized by Great Lakes Consortium for Petascale
Computation
– Supported by the NSF through the Blue Waters Project (NCSA
and others)
– Participants joined in from four locations using high-definition
high definition
streaming video
– Ran a variety of applications
– Provided 16 dedicated
nodes during hands-on
sessions
12
CHAP Meeting
29 October 2009
Support for Summer ‘09 Workshops
(continued)
• Ice Sheet Modeling Workshop, August 314
– Organized by Christina Hulbe (Portland State
University), Jesse Johnson (U of Montana), and
Cornelis van der Veen (U of Kansas); supported
by NSF
– Brought current and future ice-sheet
ice sheet scientists
together to develop better models for the
projection of future sea-level rise
– Provided accounts for each participant and
d di t d share
dedicated
h
queue on bl
bluefire
fi since
i
using
i
single processor codes
– Some participants will continue to use bluefire
based on Johnson’s NSF award for this
workshop
http://websrv.cs.umt.edu/isis/index.php/Summer_Modeling_S
chool
13
CHAP Meeting
29 October 2009
Preparations for NWSC Already Underway
– Storage Environment
• HPSS Migration
• Data Services Redesign (DSR)
• Data
D t W
Workflow
kfl
R
Requirements
i
t Analysis
A l i
– Security Review
• Balancing “Science and Security”
• Phase 1: Underway for DSR
• Phase 2: Global Filesystems
– Accounting/Allocation Systems
• Current System built in 70s.
• Reviewing requirements
• Leveraging other products
(e.g., Gold, AmieGold)
– RFP Process for NWSC Procurement
• Tom Engel to Discuss Further
– User Survey
14
CHAP Meeting
29 October 2009
User Survey Plans
Q1 2010 survey off active
ti
users
– Use to baseline current environment for NWSC
– Understand where we’re doing
g well and where we can
improve
• Investigate any area with
>10% dissatisfaction
– Evaluate future needs
– Topics
• Ease of access within
current security environment
• Ease of use (initially and on-going)
• Consulting (include methodology of service delivery)
• Documentation for users
• Training desired and method of delivery
Follow-up
p survey
y ~4 months after NWSC computer
p
available to users
15
CHAP Meeting
29 October 2009
On the road to a PetaFlop Computer…
• And 6-15 Petabyte Filesystem
• And +100 Petabyte Archive
16
CHAP Meeting
29 October 2009
CURRENT NCAR COMPUTING >100TFLOPS
Peak TFLOPs at NCAR (All Systems)
IBM POWER6/Power575/IB
(bluefire)
ICESS Phase 2
100
IBM POWER6/Power575/IB
(firefly)
IBM POWER5
POWER5+/p575/HPS
/ 575/HPS
(blueice)
IBM POWER5/p575/HPS
(bluevista)
80
IBM BlueGene/L (frost)
IBM Opteron/Linux
(pegasus)
60
IBM Opteron/Linux
(lightning)
ICESS Phase 1
IBM POWER4/Federation
(thunder)
40
IBM POWER4/Colony
(bluesky)
bluefire
ARCS Phase 4
ARCS Phase 3
IBM POWER4 (bluedawn)
Linux
ARCS Phase 2
SGI Origin3800/128
20
blueice
lightning/pegasus
ARCS Phase 1
frost
IBM POWER3 (blackforest)
bluevista
firefly
bluesky
IBM POWER3 (babyblue)
blackforest
0
J 00
Jan-00
J 01
Jan-01
J 02
Jan-02
J 03
Jan-03
J 04
Jan-04
J 05
Jan-05
J 06
Jan-06
J 07
Jan-07
J 08
Jan-08
J 09
Jan-09
J 10
Jan-10
17
CHAP Meeting
29 October 2009
NWSC HPC Projection
P k PFLOP
Peak
PFLOPs att NCAR
2.0
NWSC HPC
(uncertainty)
1.8
NWSC HPC
1.6
1.4
IBM
POWER6/Power575/IB
(bluefire)
1.2
IBM
POWER5+/p575/HPS
(blueice)
1.0
IBM POWER5/p575/HPS
(bluevista)
0.8
IBM BlueGene/L (frost)
0.6
IBM Opteron/Linux
(pegasus)
04
0.4
ICESS Phase 2
IBM Opteron/Linux
(lightning)
ICESS Phase 1
0.2
ARCS Phase 4
Jan-04
Jan-05
Jan-06
IBM POWER4/Colony
(bluesky)
bluefire
bluesky
frost
0.0
Jan-07
Jan-08
Jan-09
18
Jan-10
Jan-11
Jan-12
Jan-13
Jan-14
18
CHAP Meeting
29 October 2009
NCAR Data Archive Projection
Total Data in the NCAR Archive (Actual and Projected)
Total
120
Unique
100
60
40
20
19
19
Jan-15
5
Jan-14
4
Jan-13
3
2
Jan-12
Jan-11
Jan-10
0
Jan-09
9
Jan-08
8
Jan-07
7
Jan-06
6
Jan-05
5
Jan-04
4
Jan-03
3
2
Jan-02
Jan-01
Jan-00
0
0
Jan-99
9
P
Petabytes
80
CHAP Meeting
29 October 2009
Questions and Discussion
20
CHAP Meeting
29 October 2009