Intermediate Supercomputing

Transcription

Intermediate Supercomputing
Intermediate Supercomputing
February 2015
Intermediate Supercomputing February 2015
Course Objectives
• Compile and run code on a Cray XC40
supercomputer
• Understand how to get good performance out of the
filesystem
• Develop and use advanced job scripts
• Explore current and past jobs
Intermediate Supercomputing February 2015
Course Prerequisites
• Introduction to iVEC
• What we offer and the benefits
• How to access resources
• Introduction to Linux
• Basic Linux usage
• Introduction to Supercomputing
• Parallel computing architecture and concepts
• Requesting resources and workflows
Intermediate Supercomputing February 2015
Course Outline
1.
2.
3.
4.
5.
6.
7.
8.
9.
Programming Environment
Makefiles
Libraries
Launching Applications
Workflows
Job Arrays
Monitoring
Accounting
Lustre & Data
Intermediate Supercomputing February 2015
How do we get from here…
Intermediate Supercomputing February 2015
…to here?
Intermediate Supercomputing February 2015
1. PROGRAMMING
ENVIRONMENT
Intermediate Supercomputing February 2015
Compiling
Source Code
func1.c
Source Code
func2.c
Source Code
func3.c
Compiler
Object Code
func1.o
Object Code
func2.o
Object Code
func3.o
Linker
External Libraries
Executable
myprog.exe
Intermediate Supercomputing February 2015
Compiling Source Code
• Compiling on other systems
gcc myprog.c –o myprog
mpicc my_mpiprog.c –o my_mpiprog
icc my_intelprog.c –o my_intelprog
• Compiling on a Cray
cc myprog.c –o myprog
• Cray compiler drivers
• Same command (ftn/cc/CC) regardless of compiler
• Links in libraries automatically if module is loaded, eg MPI,
Libsci, etc.
• Always use the wrapper!
Intermediate Supercomputing February 2015
Cray Programming Environment
• A cross-compiling environment
• Compiler runs on the login node
• Executable runs on the compute nodes
• Modules Environment
• Allows easy swapping of tools & libraries
• Cray compiler drivers
• Same command (ftn/cc/CC) regardless of compiler
• Links in libraries automatically if module is loaded, eg MPI,
Libsci, etc.
• Always use the wrapper!
Intermediate Supercomputing February 2015
Supported Compilers
•
•
•
•
PrgEnv-Cray, PrgEnv-Intel and PrgEnv-gnu
All have Fortran, C and C++ compilers
All support OpenMP/MPI/Libsci/PETSc etc
Switch compilers easily
module swap PrgEnv-cray PrgEnv-intel
module swap PrgEnv-intel PrgEnv-gnu
Intermediate Supercomputing February 2015
Compiler Comparison?
Compiler Pros
Cons
CCE
(Cray)
•
•
•
•
•
•
•
GNU
• C++
• Scalar optimization
• Universal availability
• Vectorization
• Fortran
Intel
•
•
•
•
• Threading/ALPS
• PGAS (CAF & UPC)
Fortran
PGAS (CAF & UPC)
Vectorization
OpenMP 3.0
Integration with Cray Tools
Best bug turnaround time
Pedantic
Compatibility with Epic
Scalar optimization
Vectorization
Intel MKL
• C++
• No inline assembly
support
• Pedantic
Intermediate Supercomputing February 2015
Which Compiler?
• No compiler is superior for all codes
• Some codes rely on specific compiler behaviour
• Try different compilers
• Good way of catching bugs, improving portability and
standard-conformance, find optimization opportunities
• Don’t try to optimise your code until you know it is portable!
Intermediate Supercomputing February 2015
Compiler Flags
Flag
Cray
Intel
GNU
Optimization
-O3
-O3
-O3
OpenMP
-h omp | nomp
-openmp
-fopenmp
Pre-processes
-eZ
-fpp
-cpp
Module location
-e m -J <dir>
-module <dir>
-M<dir>
Debugging
-g | -G0 | -eD
-g | -debug
-g
Floating point
-h fp<n>
-fp-model
<key>
-float-store
IPA
-h ipa<n>
-ipo<n>
-flto=<n>
Zero unitialized
-e 0
-zero
-finit-local-zero
Real promotion
(Fortran)
-s real64
-r8
-fdefault-double-8
-fdefault-real-8
Integer promotion
(Fortran)
-s integer64
-i8
-fdefault-integer-8
Intermediate Supercomputing February 2015
Compiler Flag Suggestions
Compiler
Recommended Flags
Compiler Feedback
Cray
-O3 -hfp3
(this is default)
-rm
-hlist=m
GNU
-O3 -ffast-math -funrollloops
-ftree-vectorizer-verbose=2
Intel
-O2 -ipo
-fcode-asm
(Fortran)
(C/C++)
OpenMP: Enabled by default for Cray, disabled by default
with Intel and GNU
MPI:
Enabled by default for all compilers
Note: With craype-haswell module loaded you get
Haswell-specific optimisations The login nodes
are SandyBridge, not Haswell!!!
Intermediate Supercomputing February 2015
Modules Environment
• Comes preconfigured with many modules!
• Part of the default login environment
• Compilers, MPI, Math Libraries, Performance Tools, etc...
• Updated versions from previous magnus
• Latest CLE and CDT
• New MPI version
• Haswell support
• Most of the applications iVEC builds for users not
there yet
Intermediate Supercomputing February 2015
Exercise: Modules
• Display the default module environment
• See what the compiler wrappers do in different
programming environments.
>ftn -craype-verbose
>module swap PrgEnv-cray PrgEnv-intel
>ftn -craype-verbose
>module swap PrgEnv-intel PrgEnv-cray
Intermediate Supercomputing February 2015
Exercise: Default Environment
pryan@magnus:~> module list
Currently Loaded Modulefiles:
1) modules/3.2.6.7
2) nodestat/2.2-1.0502.51228.1.86.ari
3) sdb/1.0-1.0502.52592.3.23.ari
4) alps/5.2.1-2.0502.8712.10.32.ari
5) lustre-cray_ari_s
6) udreg/2.3.2-1.0502.8763.1.11.ari
7) ugni/5.0-1.0502.9037.7.26.ari
8) gni-headers/3.0-1.0502.9038.7.4.ari
9) dmapp/7.0.1-1.0502.9080.9.32.ari
10) xpmem/0.1-2.0502.51169.1.11.ari
11) hss-llm/7.2.0
12) Base-opts/1.0.2-1.0502.51201.1.4.ari
pryan@magnus:~>
13)
14)
15)
16)
17)
18)
19)
20)
21)
22)
23)
24)
craype-network-aries
craype/2.1.2
cce/8.3.0
cray-libsci/13.0.0
pmi/5.0.4-1.0000.10161.132.4.ari
rca/1.0.0-2.0502.51491.3.92.ari
atp/1.7.3
PrgEnv-cray/5.2.25
craype-haswell
slurm/2.6.6-2-ivec-1
cray-mpich/7.0.0
ddt/3.2.1_28503
Intermediate Supercomputing February 2015
2. MAKEFILES
Intermediate Supercomputing February 2015
Why Makefiles are good
• Automated build
• Consistency, reproducible, transportable
• Efficient
• Only recompile changes
• Handles dependencies
• Parallel build support
• Easy to use
Intermediate Supercomputing February 2015
Makefile rules
•
•
•
•
A Makefile tells make what to do
Makefiles are composed of rules
Order is not important except for the default target
The default target is the first in the makefile
targets: prerequisites
command1
command2
• A rule says if the prerequisite is out of date or does
not exist then create the target with this command
• The whitespace before a command is a tab
Intermediate Supercomputing February 2015
Hello world example
How to link the executable
hello: hello.o lib.o
ftn -o hello hello.o lib.o -dynamic
hello.o: hello.f90 hello.h
ftn –O3 -c hello.f90
How to compile the source
clean:
rm –f hello hello.o lib.o
How to clean up
Intermediate Supercomputing February 2015
Variables
• Variables
• var_name = ...
• Automatic Variables – compute for each rule afresh
• $^ - matches all prerequisites
• $< - matches the first prerequisite
• $@ - matches the target
Intermediate Supercomputing February 2015
Pattern Rules
• Looks like an ordinary rule, except has %
• foo.o: foo.c -> %.o: %.c
• The target %.o is a pattern
Intermediate Supercomputing February 2015
Exercise: Makefile
• Use the variables and patterns to modify the basic
example
• Create variables for the compiler and flags
• Use pattern rules to simplify the rules
Intermediate Supercomputing February 2015
Exercise: Makefile
• FORTRAN Hello World
mkdir hello_worldf90
cd hello_worldf90
touch hello.f90 Makefile
vi hello.f90
program hello
print *,”Hello World!”
end program hello
Intermediate Supercomputing February 2015
Exercise: Makefile
FC=ftn
FCFLAGS=-O3
LDFLAGS=-dynamic
objects = hello.o lib.o
hello: $(objects)
$(FC) -o $@ $(objects)
%.o: %.f90 %.h
$(FC) $(FCFLAGS) -c $<
clean:
rm –f hello $(objects)
Intermediate Supercomputing February 2015
More
• Makefiles can do a lot more than this
• https://www.gnu.org/software/make/manual/make.html
• Other options also exist
• autoconf & automake
• cmake
• SCons
Intermediate Supercomputing February 2015
3. LIBRARIES
Intermediate Supercomputing February 2015
Libsci
• Dense Solvers
• BLAS, LAPACK, ScaLAPACK, IRT
• Sparse Solvers
• PETSc, Trilinos
• FFT
• FFTW
• Tuned for processor
• Tuned for interconnect
• Adaptive and auto-tuned
Intermediate Supercomputing February 2015
Libsci - How to Use
• Version provided for all compilers (Cray, Intel, GNU)
• cray-libsci module loaded by default
• Will link in automatically
• Will select the appropriate serial/threaded version
depending on context
• If OpenMP is enabled or not
• Inside a parallel region or not
• OMP_NUM_THREADS controls threading
• Can force single threaded version
• -l sci_cray
| -l sci_intel
| -l sci_gnu
Intermediate Supercomputing February 2015
Exercise: Libsci
• Build the dgemm_example.c from
/opt/intel/composer_xe_2013_sp1.1.106/Samples/
en_US/mkl/tutorials.zip
• Build with the Cray compiler
• Build with the Intel compiler
Intermediate Supercomputing February 2015
Intel MKL
• What you love and know
• Linear Algebra
• BLAS, LAPACK, Sparse Solvers, ScaLAPACK
• FFT
• FFTW, Cluster FFT
• RNG
• Congruent, Mersenne Twister
• Statistics, Data Fitting, Vector Math
Intermediate Supercomputing February 2015
MKL How to Use
• Fully compatible with PrgEnv-intel
• module swap PrgEnv-cray PrgEnv-intel
• Single threaded MKL compatible with
PrgEnv-cray
• module load intel
• Must unload cray-libsci
• module unload cray-libsci
• Can use -mkl, -mkl=parallel|sequential
• Do not use -mkl=cluster
• Use Intel’s link advisor http://software.intel.com/enus/articles/intel-mkl-link-line-advisor/
Intermediate Supercomputing February 2015
Exercise: MKL
• Re-use the example dgemm_example.f from
• /opt/intel/composer_xe_2013_sp1.1.106/Samples/
en_US/mkl/tutorials.zip
• Increase M, N, P x10
• Build with the Intel compiler & MKL
• Try to build with the Cray compiler & MKL
Intermediate Supercomputing February 2015
Exercise: MKL
• Solution for Cray compiler & MKL
ftn -o dgemm_cray_mkl dgemm_example.f
-I$MKLROOT/include -L$MKLROOT/lib/intel64
-Wl,--start-group
-lmkl_intel_lp64 -lmkl_core -lmkl_sequential
-Wl,--end-group -lpthread -lm
Intermediate Supercomputing February 2015
GNU Scientific Libraries
• Not provided by default
• Not available yet, we are working on it!
• If you can use Libsci or MKL, do so
Intermediate Supercomputing February 2015
FFTW
• Fastest Fourier Transform in the West
• Versions 2 & 3 supported
• Just load the module to use
• module load fftw
• module load fftw/2.1.5.7
Intermediate Supercomputing February 2015
PETSc and Trilinos
• PETSc
•
•
•
•
Large scale parallel solvers for PDEs
Data structures and routines
Linear/Non-linear equation solvers, ODE integrators
Load cray-petsc or cray-petsc-complex module
• Trilinos
• Robust parallel algorithms
• Preconditions, optimization, iterative methods, much much
more
• Load cray-trilinos module
• Third Party Scientific Libraries
• MUMPs, SuperLU, SuperLU_dist, ParMetis, Hypre, Sundials,
and Scotch
Intermediate Supercomputing February 2015
PETSc and Trilinos – How to Use
• Version provided for all compilers
• Make sure cray-libsci is loaded
• Load cray-petsc and/or cray-trilinos
• cray-tpsl loaded automatically
Intermediate Supercomputing February 2015
HDF5 & NetCDF
• Hierarchical Data Format
•
•
•
•
Store large amounts of data efficiently
HDF5 1.8 support
Parallel access
module load cray-hdf5 | cray-hdf5-parallel
• Network Common Data Form
•
•
•
•
•
Self describing, machine independent
Array oriented scientific data
NetCDF v4 support
Parallel access
Module load cray-netcdf | cray-netcdf-hdf5parallel
Intermediate Supercomputing February 2015
4.LAUNCHING APPLICATIONS
Intermediate Supercomputing February 2015
Compute Nodes
MOM Nodes
Login Node
SLURM
• Job submitted to SLURM (resource request)
• Once allocated, a copy of the job script is executed on
MOM node
• Parallel portions are sent to compute nodes ( aprun )
Intermediate Supercomputing February 2015
ALPS
• Application Level Placement Scheduler
• Interface to execute and place jobs on Cray compute
nodes
• Manages passing of environment and limits to
compute nodes
• You will need to replace mpirun with aprun
• Even for serial/OpenMP only applications
• aprun has different but similar options
• Can’t I just use mpirun? No!
Intermediate Supercomputing February 2015
ALPS Terminology
• Node: Resources managed by a single CNL
(compute-node Linux) instance
• Processing Element (PE): Instance of the
executable, e.g., MPI process
• NUMA node: multi-core socket (2 NUMAnodes/node)
• CPU: a single core
Intermediate Supercomputing February 2015
aprun
Option Description
OpenMPI mpirun Equivalents
-n
Total number of PEs to run
-n | -np
-N
Number of PEs per node
-npernode
-j
Number of PEs per CPU (core)
-S
Number of PEs per NUMA
node
-d
Number of threads per PE
-cc
How to bind PEs to CPUs
-bysocket
-bind-to-*
Intermediate Supercomputing February 2015
aprun Examples
• Single-node job
aprun –n 24 ./prog.exe
• Multi-node job (8 nodes)
aprun –n 192 –N 24 ./prog.exe
Intermediate Supercomputing February 2015
aprun Examples
• 1 process, 24 OpenMP Threads
#!/bin/bash –l
#SBATCH --nodes=1
#SBATCH --account=director100
#SBATCH --job-name=myjob
#SBATCH --time=00:05:00
#SBATCH --partition=workq
#SBATCH --export=NONE
export OMP_NUM_THREADS=24
aprun -n 1 -N 1 -d $OMP_NUM_THREADS ./a.out
Intermediate Supercomputing February 2015
aprun Examples
• 2 MPI processes/node, 12 OpenMP threads
#!/bin/bash –l
#SBATCH --nodes=2
#SBATCH --account=director100
#SBATCH --job-name=myjob
#SBATCH --time=00:05:00
#SBATCH --partition=workq
#SBATCH --export=NONE
export OMP_NUM_THREADS=12
aprun -n 4 -N 2 -d $OMP_NUM_THREADS -S 1 ./a.out
Intermediate Supercomputing February 2015
Multiple Program Multiple Data
• aprun supports this model
• Use the colon syntax
• Make sure you use a space before and after the
colon
aprun -n 4 -d 12 -N 2 ./exe1 :
16 -N 1 ./exe2
-n 4 -d
Intermediate Supercomputing February 2015
CPU Binding
• By default PEs are bound to CPUs, i.e., a core.
-cc cpu
Ideal for MPI and Hybrid nodes
• Or constrain PEs to a NUMA node/socket.
-cc numa_node
• Can also be disabled
-cc none
• Or something more complicated
-cc 0,0-7,8,8-15,16,16-23
(See aprun man page for more details)
Intermediate Supercomputing February 2015
Exercise: Intel Threading
• Reuse the previous example dgemm
• Build with Intel & MKL
• Run with 1 thread
• aprun -n 1 ./dgemm
• Run with 12 threads
• OMP_NUM_THREADS=12
• time aprun -q -n 1 -d 12 ./dgemm
• What is wrong?
Intermediate Supercomputing February 2015
Intel Threading
• Don’t need anything special for pure MPI jobs
• Intel’s OpenMP has additional helper threads which
cause problems with cpu binding
• First, disable cpu binding “-cc none”
• Second, consider using KMP_AFFINITY
export KMP_AFFINITY=“compact,1”
export OMP_NUM_THREADS=24
aprun –n 1 –d 24 –cc none ../a.out
Intermediate Supercomputing February 2015
Hyperthreading
•
•
•
•
Not suitable for all applications
CLE/ALPS supports hyperthreading
Integration with SLURM not perfect
Request number of nodes only
#SBATCH --nodes=2
• Be explicit with aprun resources
aprun -n 96 -N 48 -j 2 uname -a
Intermediate Supercomputing February 2015
5. WORKFLOWS
Intermediate Supercomputing February 2015
SLURM dependencies
• Supported between jobs, job arrays, array elements
• Not between jobs on different clusters
• #SBATCH --dependency=type:jobid,...
D
B
H
E
A
F
G
C
Intermediate Supercomputing February 2015
SLURM dependencies
Dependency List
Description
after:jobid
Begin after the listed job has begun execution
afterany:jobid
Begin after the listed job has terminated
afternotok:jobid
Begin after the listed job has terminated in a failed
state
afterok:jobid
Begin after the listed job has successfully executed
singleton
Begin after any job of the same job name and user
has terminated
• Multiple jobids allowed, eg jobid:jobid
• Job array elements referenced as jobid_index
• Jobs that are requeued after failure treated the same
Intermediate Supercomputing February 2015
Chaining Jobs
•
•
•
•
Submit the next job from within a batch job
At the start or the end of the job
Can combine both methods
Useful when running jobs across clusters
A
B
C
D
E
Intermediate Supercomputing February 2015
Handling Job Failures
•
•
•
•
•
•
Control if the batch job restarts after node failure
Cluster default is to re-queue after failure
#SBATCH --requeue
#SBATCH --no-requeue
Mail notification with --mail-type=REQUEUE
Can control runtime behaviour by checking
SLURM_RESTART_COUNT in batch script
Intermediate Supercomputing February 2015
6. JOB ARRAYS
Intermediate Supercomputing February 2015
Use Cases
• More efficient than submitting lots of individual jobs
• Parametric sweeps
• Running the same program on many different data
sets
Intermediate Supercomputing February 2015
Job Arrays
• Submit multiple jobs with identical parameters
• User defined range of elements
• 0,1,2,3
• 0-9
• 0-9,20-29
•
•
•
•
#SBATCH --array=<indexes>
Maximum number of elements is 1000
Identify which index: $SLURM_ARRAY_TASK_ID
Overall job: $SLURM_ARRAY_JOB_ID
Intermediate Supercomputing February 2015
Exercise: Job Arrays
•
•
•
•
Submit a job array with 4 elements, one node each
Print each array id
The third id should sleep for 60 seconds
Monitor the status
Intermediate Supercomputing February 2015
7. NODE & JOB STATUS
Intermediate Supercomputing February 2015
xtnodestat
• Provides current job and node status
• Textual GUI output
• Information organized as physically installed
Intermediate Supercomputing February 2015
xtnodestat
pryan@magnus:~> xtnodestat
Current Allocation Status at Mon Sep 01 12:08:41 2014
C0-0
n3 --------------n2 S--------------n1 S--------------c2n0 --------------n3 --------------n2 S--------------n1 S--------------c1n0 --------------n3
-------------n2 SS-------------n1 SS-------------c0n0
-------------s0123456789abcdef
C1-0
----Z---------S----Z---------S----Z-------------Z-----------------------Z
S--------------S----------------------------------------------------------------------------------------0123456789abcdef
C2-0
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------Z---------0123456789abcdef
C3-0
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------0123456789abcdef
Cabinet
Admin Down
Idle Nodes
Chasis & node
Intermediate Supercomputing February 2015
xtnodestat
C4-0
n3
n2
n1
c2n0
n3
n2
n1
c1n0
n3
n2
n1
c0n0
C5-0
--------------S--------------S-------------Z----------------------------S--------------S-----------------------------------------SS-------------SS--------------------------s0123456789abcdef
C6-0
----------aaaaa
S----------aaaaa
S----------Zaaaa
-----------aaaa
--------------S--------------S----------------------------------------------------------------------------------------0123456789abcdef
Legend:
nonexistent node
; free interactive compute node
A allocated (idle) compute or ccm node
W waiting or non-running job
Y down or admindown service node
Service Nodes
C7-0
aaaaaaAAAAAAAAAA
aaaaaaAAAAAAAAAA
aaaaaaAAAAAAAAAA
aaaaaaaAAAAAAAAA
aaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaa
0123456789abcdef
S
?
X
Z
-----Z---AAAAAA---------AAAAAA---------AAAAAA---------ZAAAAAA
AAAAAAAAAAA----AAAAAAAAAAA----AAAAAAAAAAAZ---AAAAAAAAAAAA---AAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAA
0123456789abcdef
Allocated but Idle
Allocated nodes
service node
free batch compute node
suspect compute node
down compute node
admindown compute node
Legend
Intermediate Supercomputing February 2015
xtnodestat
Available compute nodes:
0 interactive,
1134 batch
Available nodes
Job ID
User
Size Age
State
command line
--- ------- -------- ----- -------- ------- -----------------------------a
2097949 pryan
171
0h26m
run
pflotran
Job legend/detail
Intermediate Supercomputing February 2015
apstat
• Displays information on
•
•
•
•
Applications
Compute nodes
Resource reservations
Scheduler statistics
• Static information only
• Based on allocated resources
Intermediate Supercomputing February 2015
apstat
pryan@magnus:~> apstat
Compute node summary
arch config
up
XT
1488
1488
resv
342
use
171
avail
1146
down
0
No pending applications are present
Available nodes
Running command
Total placed applications: 1
Apid ResId User PEs Nodes
Age State Command
2097949 325685 pryan 4096
171 2h02m
run pflotran
No applications or reservations are being cleaned up
pryan@magnus:~>
Application ID
Resources
Intermediate Supercomputing February 2015
apstat
pryan@magnus:~>
NID Arch State
8
XT UP B
9
XT UP B
10
XT UP B
11
XT UP B
12
XT UP B
13
XT UP B
14
XT UP B
15
XT UP B
16
XT UP B
apstat -n
CU Rv Pl PgSz
Avl
24 - 4K 16777216
24 - 4K 16777216
24 - 4K 16777216
24 - 4K 16777216
24 - 4K 16777216
24 - 4K 16777216
24 - 4K 16777216
24 - 4K 16777216
24 - 4K 16777216
Conf
0
0
0
0
0
0
0
0
0
Placed PEs
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Node status
Number of cores
Reserved cores inc HT
<snip>
1529
XT UP B 24 48 24
4K
1530
XT UP B 24 48 4K
1531
XT UP B 24 48 48
4K
1532
XT UP B 24 - 4K
1533
XT UP B 24 - 4K
1534
XT UP B 24 - 4K
1535
XT UP B 24 - 4K
Compute node summary
arch config
up
resv
XT
1488
1488
342
pryan@magnus:~>
Apids
16777216 16777216
16777216 12582912
16777216 16777216
16777216
0
16777216
0
16777216
0
16777216
0
use
171
avail
1146
down
0
2
0
2
0
0
0
0
0
0
0
0
0
0
0
Number of PEs (process)
Number of cores placed
Intermediate Supercomputing February 2015
Exercise: Job Status
• Run the hyperthread job from before, but replace xthi
with “sleep 300”
• Verify if it is really using all the hardware resources
using apstat
Intermediate Supercomputing February 2015
8. ACCOUNTING
Intermediate Supercomputing February 2015
sacct
• Displays job accounting information
• Information updated in real-time
• Output can be tailored to requirements
JobID
JobName Partition
Account AllocCPUS
State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- -------93037
hpl
workq director1+
71136
FAILED
1:0
93037.batch
batch
director1+
1
FAILED
1:0
93038
hpl
workq director1+
71136
FAILED
1:0
93038.batch
batch
director1+
1
FAILED
1:0
93039
hpl
workq director1+
71136 CANCELLED+
0:0
93040
hpl
workq director1+
71136 CANCELLED+
0:0
93041
hpl
workq director1+
71136 CANCELLED+
0:0
93042
hpl
workq director1+
71136 CANCELLED+
0:0
Intermediate Supercomputing February 2015
sacct
• Display format
• export SACCT_FORMAT=“jobid,jobname%20...”
• sacct --format=“jobid,jobname%20...”
• Filter options
• Job name --name
• Node list --nodelist
• Start/End time --startime/--endtime
Intermediate Supercomputing February 2015
Exercise: sacct
• View the jobs you executed today
• Run a job which displays the nodes allocated using
sacct
Intermediate Supercomputing February 2015
in a job script
#!/bin/bash -l
# 10 nodes, 24 MPI processes/node, 240 MPI processes total
#SBATCH --job-name="myjob"
#SBATCH --time=02:00:00
#SBATCH --ntasks=240
#SBATCH --output=myjob.%j.o
#SBATCH --error=myjob.%j.e
#SBATCH --account=director100
#======START=====
echo "The current job ID is $SLURM_JOB_ID"
echo "Running on $SLURM_JOB_NUM_NODES nodes"
echo "Using $SLURM_NTASKS_PER_NODE tasks per node"
echo "A total of $SLURM_NTASKS tasks is used"
echo "Node list:"
sacct --format=JobID,NodeList%100 -j $SLURM_JOB_ID
aprun -n 240 ./a.out
#=====END====
Intermediate Supercomputing February 2015
9. LUSTRE
Intermediate Supercomputing February 2015
What is Lustre
•
•
•
•
•
•
Distributed file system
Fault tolerant and highly available
Scalable and high performance
POSIX compliant
Supports locking
Supports quotas
Intermediate Supercomputing February 2015
Lustre Architecture
• Clients
• MDS
• The meta-data server
• open(), close(), stat(), unlink()
• MDT
• The meta-data target, i.e. the backend disk for the MDS
• OSS
• The object storage server
• write(), read(), seek(), flock()
• OST
• The object storage target, i.e. the backend disks for the OSS
Intermediate Supercomputing February 2015
Lustre Architecture
Clients
{
Infinband Interconnect
MDS
MDT
{
2
1
OST
OSS
OSS
OSS
OSS
5
6
7
8
{
1
2
3
4
Intermediate Supercomputing February 2015
Lustre File systems
On Magnus:
Filesystem Type
Size
User
quota
Group
quota
Flush
Policy
Backup
/scratch
Lustre
3 PB
-
-
30 days
No
/group
Lustre
750 TB -
1 TB
-
No
•
•
/scratch
• Fast, run your jobs here under
/scratch/projectid/userid
• POSIX compliance and multi-node access
• No quota, files will be deleted after 30 days (exceptions
possible)
/group
• Share common files amongst a project group
• Less performance than /scratch
• Copy to/from /scratch as required
Intermediate Supercomputing February 2015
Exercise: What is your quota?
• lfs quota …?
Intermediate Supercomputing February 2015
Striping
• Will distribute files across OSTs
• The default is not to stripe
• This is the users responsibility
• Striping can improve performance
• Small files should not be striped
• For large files, set stripe between 1-4
• Can tune stripe size – default 1MB
• Can set a default stripe for a file or a directory
• Files and sub-directories inherit stripe settings from their
parent
• Only effective when a new file is created
Intermediate Supercomputing February 2015
Exercise: File striping
• Create a new directory in /scratch
• View the default settings
• lfs getstripe
• Change the stripe settings
• lfs setstripe
• Create new files
• dd if=/dev/zero of=./file0 bs=1MB count=100
• View the stripe setting for the new file
• lfs getstripe
Intermediate Supercomputing February 2015
10. DATA TRANSFER
Intermediate Supercomputing February 2015
Data Mover Nodes
• Dedicated servers for transferring large amounts of
data
• /home, /scratch and /group visible
• Externally visible scp/gridftp
Supercomputer
Hostname
Magnus / Zeus / Zythos
magnusdata.ivec.org
Galaxy
galaxydata.ivec.org
Fornax
fornaxdata.ivec.org
Intermediate Supercomputing February 2015
copyq
• The data mover nodes are available though the
pawsey cluster (-M pawsey)
• Submit data transfer jobs to the respective partition
• Fornax uses existing mechanisms
Supercomputer
Partition
Time Limit
Magnus/Zythos
magnusdm
2 days
Galaxy
galaxydm
2 days
Intermediate Supercomputing February 2015
Exercise: Copy some data
• Submit a batch script to magnusdm to get
ftp://ftp.aarnet.edu.au/pub/centos/6.5/isos/x86_64/Ce
ntOS-6.5-x86_64-netinstall.iso
• Monitor progress of the job
Intermediate Supercomputing February 2015
For more help
http://portal.ivec.org
• Documentation
• iVEC-supported Software list
• Maintenance logs
Intermediate Supercomputing February 2015