Intermediate Supercomputing
Transcription
Intermediate Supercomputing
Intermediate Supercomputing February 2015 Intermediate Supercomputing February 2015 Course Objectives • Compile and run code on a Cray XC40 supercomputer • Understand how to get good performance out of the filesystem • Develop and use advanced job scripts • Explore current and past jobs Intermediate Supercomputing February 2015 Course Prerequisites • Introduction to iVEC • What we offer and the benefits • How to access resources • Introduction to Linux • Basic Linux usage • Introduction to Supercomputing • Parallel computing architecture and concepts • Requesting resources and workflows Intermediate Supercomputing February 2015 Course Outline 1. 2. 3. 4. 5. 6. 7. 8. 9. Programming Environment Makefiles Libraries Launching Applications Workflows Job Arrays Monitoring Accounting Lustre & Data Intermediate Supercomputing February 2015 How do we get from here… Intermediate Supercomputing February 2015 …to here? Intermediate Supercomputing February 2015 1. PROGRAMMING ENVIRONMENT Intermediate Supercomputing February 2015 Compiling Source Code func1.c Source Code func2.c Source Code func3.c Compiler Object Code func1.o Object Code func2.o Object Code func3.o Linker External Libraries Executable myprog.exe Intermediate Supercomputing February 2015 Compiling Source Code • Compiling on other systems gcc myprog.c –o myprog mpicc my_mpiprog.c –o my_mpiprog icc my_intelprog.c –o my_intelprog • Compiling on a Cray cc myprog.c –o myprog • Cray compiler drivers • Same command (ftn/cc/CC) regardless of compiler • Links in libraries automatically if module is loaded, eg MPI, Libsci, etc. • Always use the wrapper! Intermediate Supercomputing February 2015 Cray Programming Environment • A cross-compiling environment • Compiler runs on the login node • Executable runs on the compute nodes • Modules Environment • Allows easy swapping of tools & libraries • Cray compiler drivers • Same command (ftn/cc/CC) regardless of compiler • Links in libraries automatically if module is loaded, eg MPI, Libsci, etc. • Always use the wrapper! Intermediate Supercomputing February 2015 Supported Compilers • • • • PrgEnv-Cray, PrgEnv-Intel and PrgEnv-gnu All have Fortran, C and C++ compilers All support OpenMP/MPI/Libsci/PETSc etc Switch compilers easily module swap PrgEnv-cray PrgEnv-intel module swap PrgEnv-intel PrgEnv-gnu Intermediate Supercomputing February 2015 Compiler Comparison? Compiler Pros Cons CCE (Cray) • • • • • • • GNU • C++ • Scalar optimization • Universal availability • Vectorization • Fortran Intel • • • • • Threading/ALPS • PGAS (CAF & UPC) Fortran PGAS (CAF & UPC) Vectorization OpenMP 3.0 Integration with Cray Tools Best bug turnaround time Pedantic Compatibility with Epic Scalar optimization Vectorization Intel MKL • C++ • No inline assembly support • Pedantic Intermediate Supercomputing February 2015 Which Compiler? • No compiler is superior for all codes • Some codes rely on specific compiler behaviour • Try different compilers • Good way of catching bugs, improving portability and standard-conformance, find optimization opportunities • Don’t try to optimise your code until you know it is portable! Intermediate Supercomputing February 2015 Compiler Flags Flag Cray Intel GNU Optimization -O3 -O3 -O3 OpenMP -h omp | nomp -openmp -fopenmp Pre-processes -eZ -fpp -cpp Module location -e m -J <dir> -module <dir> -M<dir> Debugging -g | -G0 | -eD -g | -debug -g Floating point -h fp<n> -fp-model <key> -float-store IPA -h ipa<n> -ipo<n> -flto=<n> Zero unitialized -e 0 -zero -finit-local-zero Real promotion (Fortran) -s real64 -r8 -fdefault-double-8 -fdefault-real-8 Integer promotion (Fortran) -s integer64 -i8 -fdefault-integer-8 Intermediate Supercomputing February 2015 Compiler Flag Suggestions Compiler Recommended Flags Compiler Feedback Cray -O3 -hfp3 (this is default) -rm -hlist=m GNU -O3 -ffast-math -funrollloops -ftree-vectorizer-verbose=2 Intel -O2 -ipo -fcode-asm (Fortran) (C/C++) OpenMP: Enabled by default for Cray, disabled by default with Intel and GNU MPI: Enabled by default for all compilers Note: With craype-haswell module loaded you get Haswell-specific optimisations The login nodes are SandyBridge, not Haswell!!! Intermediate Supercomputing February 2015 Modules Environment • Comes preconfigured with many modules! • Part of the default login environment • Compilers, MPI, Math Libraries, Performance Tools, etc... • Updated versions from previous magnus • Latest CLE and CDT • New MPI version • Haswell support • Most of the applications iVEC builds for users not there yet Intermediate Supercomputing February 2015 Exercise: Modules • Display the default module environment • See what the compiler wrappers do in different programming environments. >ftn -craype-verbose >module swap PrgEnv-cray PrgEnv-intel >ftn -craype-verbose >module swap PrgEnv-intel PrgEnv-cray Intermediate Supercomputing February 2015 Exercise: Default Environment pryan@magnus:~> module list Currently Loaded Modulefiles: 1) modules/3.2.6.7 2) nodestat/2.2-1.0502.51228.1.86.ari 3) sdb/1.0-1.0502.52592.3.23.ari 4) alps/5.2.1-2.0502.8712.10.32.ari 5) lustre-cray_ari_s 6) udreg/2.3.2-1.0502.8763.1.11.ari 7) ugni/5.0-1.0502.9037.7.26.ari 8) gni-headers/3.0-1.0502.9038.7.4.ari 9) dmapp/7.0.1-1.0502.9080.9.32.ari 10) xpmem/0.1-2.0502.51169.1.11.ari 11) hss-llm/7.2.0 12) Base-opts/1.0.2-1.0502.51201.1.4.ari pryan@magnus:~> 13) 14) 15) 16) 17) 18) 19) 20) 21) 22) 23) 24) craype-network-aries craype/2.1.2 cce/8.3.0 cray-libsci/13.0.0 pmi/5.0.4-1.0000.10161.132.4.ari rca/1.0.0-2.0502.51491.3.92.ari atp/1.7.3 PrgEnv-cray/5.2.25 craype-haswell slurm/2.6.6-2-ivec-1 cray-mpich/7.0.0 ddt/3.2.1_28503 Intermediate Supercomputing February 2015 2. MAKEFILES Intermediate Supercomputing February 2015 Why Makefiles are good • Automated build • Consistency, reproducible, transportable • Efficient • Only recompile changes • Handles dependencies • Parallel build support • Easy to use Intermediate Supercomputing February 2015 Makefile rules • • • • A Makefile tells make what to do Makefiles are composed of rules Order is not important except for the default target The default target is the first in the makefile targets: prerequisites command1 command2 • A rule says if the prerequisite is out of date or does not exist then create the target with this command • The whitespace before a command is a tab Intermediate Supercomputing February 2015 Hello world example How to link the executable hello: hello.o lib.o ftn -o hello hello.o lib.o -dynamic hello.o: hello.f90 hello.h ftn –O3 -c hello.f90 How to compile the source clean: rm –f hello hello.o lib.o How to clean up Intermediate Supercomputing February 2015 Variables • Variables • var_name = ... • Automatic Variables – compute for each rule afresh • $^ - matches all prerequisites • $< - matches the first prerequisite • $@ - matches the target Intermediate Supercomputing February 2015 Pattern Rules • Looks like an ordinary rule, except has % • foo.o: foo.c -> %.o: %.c • The target %.o is a pattern Intermediate Supercomputing February 2015 Exercise: Makefile • Use the variables and patterns to modify the basic example • Create variables for the compiler and flags • Use pattern rules to simplify the rules Intermediate Supercomputing February 2015 Exercise: Makefile • FORTRAN Hello World mkdir hello_worldf90 cd hello_worldf90 touch hello.f90 Makefile vi hello.f90 program hello print *,”Hello World!” end program hello Intermediate Supercomputing February 2015 Exercise: Makefile FC=ftn FCFLAGS=-O3 LDFLAGS=-dynamic objects = hello.o lib.o hello: $(objects) $(FC) -o $@ $(objects) %.o: %.f90 %.h $(FC) $(FCFLAGS) -c $< clean: rm –f hello $(objects) Intermediate Supercomputing February 2015 More • Makefiles can do a lot more than this • https://www.gnu.org/software/make/manual/make.html • Other options also exist • autoconf & automake • cmake • SCons Intermediate Supercomputing February 2015 3. LIBRARIES Intermediate Supercomputing February 2015 Libsci • Dense Solvers • BLAS, LAPACK, ScaLAPACK, IRT • Sparse Solvers • PETSc, Trilinos • FFT • FFTW • Tuned for processor • Tuned for interconnect • Adaptive and auto-tuned Intermediate Supercomputing February 2015 Libsci - How to Use • Version provided for all compilers (Cray, Intel, GNU) • cray-libsci module loaded by default • Will link in automatically • Will select the appropriate serial/threaded version depending on context • If OpenMP is enabled or not • Inside a parallel region or not • OMP_NUM_THREADS controls threading • Can force single threaded version • -l sci_cray | -l sci_intel | -l sci_gnu Intermediate Supercomputing February 2015 Exercise: Libsci • Build the dgemm_example.c from /opt/intel/composer_xe_2013_sp1.1.106/Samples/ en_US/mkl/tutorials.zip • Build with the Cray compiler • Build with the Intel compiler Intermediate Supercomputing February 2015 Intel MKL • What you love and know • Linear Algebra • BLAS, LAPACK, Sparse Solvers, ScaLAPACK • FFT • FFTW, Cluster FFT • RNG • Congruent, Mersenne Twister • Statistics, Data Fitting, Vector Math Intermediate Supercomputing February 2015 MKL How to Use • Fully compatible with PrgEnv-intel • module swap PrgEnv-cray PrgEnv-intel • Single threaded MKL compatible with PrgEnv-cray • module load intel • Must unload cray-libsci • module unload cray-libsci • Can use -mkl, -mkl=parallel|sequential • Do not use -mkl=cluster • Use Intel’s link advisor http://software.intel.com/enus/articles/intel-mkl-link-line-advisor/ Intermediate Supercomputing February 2015 Exercise: MKL • Re-use the example dgemm_example.f from • /opt/intel/composer_xe_2013_sp1.1.106/Samples/ en_US/mkl/tutorials.zip • Increase M, N, P x10 • Build with the Intel compiler & MKL • Try to build with the Cray compiler & MKL Intermediate Supercomputing February 2015 Exercise: MKL • Solution for Cray compiler & MKL ftn -o dgemm_cray_mkl dgemm_example.f -I$MKLROOT/include -L$MKLROOT/lib/intel64 -Wl,--start-group -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -Wl,--end-group -lpthread -lm Intermediate Supercomputing February 2015 GNU Scientific Libraries • Not provided by default • Not available yet, we are working on it! • If you can use Libsci or MKL, do so Intermediate Supercomputing February 2015 FFTW • Fastest Fourier Transform in the West • Versions 2 & 3 supported • Just load the module to use • module load fftw • module load fftw/2.1.5.7 Intermediate Supercomputing February 2015 PETSc and Trilinos • PETSc • • • • Large scale parallel solvers for PDEs Data structures and routines Linear/Non-linear equation solvers, ODE integrators Load cray-petsc or cray-petsc-complex module • Trilinos • Robust parallel algorithms • Preconditions, optimization, iterative methods, much much more • Load cray-trilinos module • Third Party Scientific Libraries • MUMPs, SuperLU, SuperLU_dist, ParMetis, Hypre, Sundials, and Scotch Intermediate Supercomputing February 2015 PETSc and Trilinos – How to Use • Version provided for all compilers • Make sure cray-libsci is loaded • Load cray-petsc and/or cray-trilinos • cray-tpsl loaded automatically Intermediate Supercomputing February 2015 HDF5 & NetCDF • Hierarchical Data Format • • • • Store large amounts of data efficiently HDF5 1.8 support Parallel access module load cray-hdf5 | cray-hdf5-parallel • Network Common Data Form • • • • • Self describing, machine independent Array oriented scientific data NetCDF v4 support Parallel access Module load cray-netcdf | cray-netcdf-hdf5parallel Intermediate Supercomputing February 2015 4.LAUNCHING APPLICATIONS Intermediate Supercomputing February 2015 Compute Nodes MOM Nodes Login Node SLURM • Job submitted to SLURM (resource request) • Once allocated, a copy of the job script is executed on MOM node • Parallel portions are sent to compute nodes ( aprun ) Intermediate Supercomputing February 2015 ALPS • Application Level Placement Scheduler • Interface to execute and place jobs on Cray compute nodes • Manages passing of environment and limits to compute nodes • You will need to replace mpirun with aprun • Even for serial/OpenMP only applications • aprun has different but similar options • Can’t I just use mpirun? No! Intermediate Supercomputing February 2015 ALPS Terminology • Node: Resources managed by a single CNL (compute-node Linux) instance • Processing Element (PE): Instance of the executable, e.g., MPI process • NUMA node: multi-core socket (2 NUMAnodes/node) • CPU: a single core Intermediate Supercomputing February 2015 aprun Option Description OpenMPI mpirun Equivalents -n Total number of PEs to run -n | -np -N Number of PEs per node -npernode -j Number of PEs per CPU (core) -S Number of PEs per NUMA node -d Number of threads per PE -cc How to bind PEs to CPUs -bysocket -bind-to-* Intermediate Supercomputing February 2015 aprun Examples • Single-node job aprun –n 24 ./prog.exe • Multi-node job (8 nodes) aprun –n 192 –N 24 ./prog.exe Intermediate Supercomputing February 2015 aprun Examples • 1 process, 24 OpenMP Threads #!/bin/bash –l #SBATCH --nodes=1 #SBATCH --account=director100 #SBATCH --job-name=myjob #SBATCH --time=00:05:00 #SBATCH --partition=workq #SBATCH --export=NONE export OMP_NUM_THREADS=24 aprun -n 1 -N 1 -d $OMP_NUM_THREADS ./a.out Intermediate Supercomputing February 2015 aprun Examples • 2 MPI processes/node, 12 OpenMP threads #!/bin/bash –l #SBATCH --nodes=2 #SBATCH --account=director100 #SBATCH --job-name=myjob #SBATCH --time=00:05:00 #SBATCH --partition=workq #SBATCH --export=NONE export OMP_NUM_THREADS=12 aprun -n 4 -N 2 -d $OMP_NUM_THREADS -S 1 ./a.out Intermediate Supercomputing February 2015 Multiple Program Multiple Data • aprun supports this model • Use the colon syntax • Make sure you use a space before and after the colon aprun -n 4 -d 12 -N 2 ./exe1 : 16 -N 1 ./exe2 -n 4 -d Intermediate Supercomputing February 2015 CPU Binding • By default PEs are bound to CPUs, i.e., a core. -cc cpu Ideal for MPI and Hybrid nodes • Or constrain PEs to a NUMA node/socket. -cc numa_node • Can also be disabled -cc none • Or something more complicated -cc 0,0-7,8,8-15,16,16-23 (See aprun man page for more details) Intermediate Supercomputing February 2015 Exercise: Intel Threading • Reuse the previous example dgemm • Build with Intel & MKL • Run with 1 thread • aprun -n 1 ./dgemm • Run with 12 threads • OMP_NUM_THREADS=12 • time aprun -q -n 1 -d 12 ./dgemm • What is wrong? Intermediate Supercomputing February 2015 Intel Threading • Don’t need anything special for pure MPI jobs • Intel’s OpenMP has additional helper threads which cause problems with cpu binding • First, disable cpu binding “-cc none” • Second, consider using KMP_AFFINITY export KMP_AFFINITY=“compact,1” export OMP_NUM_THREADS=24 aprun –n 1 –d 24 –cc none ../a.out Intermediate Supercomputing February 2015 Hyperthreading • • • • Not suitable for all applications CLE/ALPS supports hyperthreading Integration with SLURM not perfect Request number of nodes only #SBATCH --nodes=2 • Be explicit with aprun resources aprun -n 96 -N 48 -j 2 uname -a Intermediate Supercomputing February 2015 5. WORKFLOWS Intermediate Supercomputing February 2015 SLURM dependencies • Supported between jobs, job arrays, array elements • Not between jobs on different clusters • #SBATCH --dependency=type:jobid,... D B H E A F G C Intermediate Supercomputing February 2015 SLURM dependencies Dependency List Description after:jobid Begin after the listed job has begun execution afterany:jobid Begin after the listed job has terminated afternotok:jobid Begin after the listed job has terminated in a failed state afterok:jobid Begin after the listed job has successfully executed singleton Begin after any job of the same job name and user has terminated • Multiple jobids allowed, eg jobid:jobid • Job array elements referenced as jobid_index • Jobs that are requeued after failure treated the same Intermediate Supercomputing February 2015 Chaining Jobs • • • • Submit the next job from within a batch job At the start or the end of the job Can combine both methods Useful when running jobs across clusters A B C D E Intermediate Supercomputing February 2015 Handling Job Failures • • • • • • Control if the batch job restarts after node failure Cluster default is to re-queue after failure #SBATCH --requeue #SBATCH --no-requeue Mail notification with --mail-type=REQUEUE Can control runtime behaviour by checking SLURM_RESTART_COUNT in batch script Intermediate Supercomputing February 2015 6. JOB ARRAYS Intermediate Supercomputing February 2015 Use Cases • More efficient than submitting lots of individual jobs • Parametric sweeps • Running the same program on many different data sets Intermediate Supercomputing February 2015 Job Arrays • Submit multiple jobs with identical parameters • User defined range of elements • 0,1,2,3 • 0-9 • 0-9,20-29 • • • • #SBATCH --array=<indexes> Maximum number of elements is 1000 Identify which index: $SLURM_ARRAY_TASK_ID Overall job: $SLURM_ARRAY_JOB_ID Intermediate Supercomputing February 2015 Exercise: Job Arrays • • • • Submit a job array with 4 elements, one node each Print each array id The third id should sleep for 60 seconds Monitor the status Intermediate Supercomputing February 2015 7. NODE & JOB STATUS Intermediate Supercomputing February 2015 xtnodestat • Provides current job and node status • Textual GUI output • Information organized as physically installed Intermediate Supercomputing February 2015 xtnodestat pryan@magnus:~> xtnodestat Current Allocation Status at Mon Sep 01 12:08:41 2014 C0-0 n3 --------------n2 S--------------n1 S--------------c2n0 --------------n3 --------------n2 S--------------n1 S--------------c1n0 --------------n3 -------------n2 SS-------------n1 SS-------------c0n0 -------------s0123456789abcdef C1-0 ----Z---------S----Z---------S----Z-------------Z-----------------------Z S--------------S----------------------------------------------------------------------------------------0123456789abcdef C2-0 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------Z---------0123456789abcdef C3-0 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------0123456789abcdef Cabinet Admin Down Idle Nodes Chasis & node Intermediate Supercomputing February 2015 xtnodestat C4-0 n3 n2 n1 c2n0 n3 n2 n1 c1n0 n3 n2 n1 c0n0 C5-0 --------------S--------------S-------------Z----------------------------S--------------S-----------------------------------------SS-------------SS--------------------------s0123456789abcdef C6-0 ----------aaaaa S----------aaaaa S----------Zaaaa -----------aaaa --------------S--------------S----------------------------------------------------------------------------------------0123456789abcdef Legend: nonexistent node ; free interactive compute node A allocated (idle) compute or ccm node W waiting or non-running job Y down or admindown service node Service Nodes C7-0 aaaaaaAAAAAAAAAA aaaaaaAAAAAAAAAA aaaaaaAAAAAAAAAA aaaaaaaAAAAAAAAA aaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaa 0123456789abcdef S ? X Z -----Z---AAAAAA---------AAAAAA---------AAAAAA---------ZAAAAAA AAAAAAAAAAA----AAAAAAAAAAA----AAAAAAAAAAAZ---AAAAAAAAAAAA---AAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAA 0123456789abcdef Allocated but Idle Allocated nodes service node free batch compute node suspect compute node down compute node admindown compute node Legend Intermediate Supercomputing February 2015 xtnodestat Available compute nodes: 0 interactive, 1134 batch Available nodes Job ID User Size Age State command line --- ------- -------- ----- -------- ------- -----------------------------a 2097949 pryan 171 0h26m run pflotran Job legend/detail Intermediate Supercomputing February 2015 apstat • Displays information on • • • • Applications Compute nodes Resource reservations Scheduler statistics • Static information only • Based on allocated resources Intermediate Supercomputing February 2015 apstat pryan@magnus:~> apstat Compute node summary arch config up XT 1488 1488 resv 342 use 171 avail 1146 down 0 No pending applications are present Available nodes Running command Total placed applications: 1 Apid ResId User PEs Nodes Age State Command 2097949 325685 pryan 4096 171 2h02m run pflotran No applications or reservations are being cleaned up pryan@magnus:~> Application ID Resources Intermediate Supercomputing February 2015 apstat pryan@magnus:~> NID Arch State 8 XT UP B 9 XT UP B 10 XT UP B 11 XT UP B 12 XT UP B 13 XT UP B 14 XT UP B 15 XT UP B 16 XT UP B apstat -n CU Rv Pl PgSz Avl 24 - 4K 16777216 24 - 4K 16777216 24 - 4K 16777216 24 - 4K 16777216 24 - 4K 16777216 24 - 4K 16777216 24 - 4K 16777216 24 - 4K 16777216 24 - 4K 16777216 Conf 0 0 0 0 0 0 0 0 0 Placed PEs 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Node status Number of cores Reserved cores inc HT <snip> 1529 XT UP B 24 48 24 4K 1530 XT UP B 24 48 4K 1531 XT UP B 24 48 48 4K 1532 XT UP B 24 - 4K 1533 XT UP B 24 - 4K 1534 XT UP B 24 - 4K 1535 XT UP B 24 - 4K Compute node summary arch config up resv XT 1488 1488 342 pryan@magnus:~> Apids 16777216 16777216 16777216 12582912 16777216 16777216 16777216 0 16777216 0 16777216 0 16777216 0 use 171 avail 1146 down 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0 Number of PEs (process) Number of cores placed Intermediate Supercomputing February 2015 Exercise: Job Status • Run the hyperthread job from before, but replace xthi with “sleep 300” • Verify if it is really using all the hardware resources using apstat Intermediate Supercomputing February 2015 8. ACCOUNTING Intermediate Supercomputing February 2015 sacct • Displays job accounting information • Information updated in real-time • Output can be tailored to requirements JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------93037 hpl workq director1+ 71136 FAILED 1:0 93037.batch batch director1+ 1 FAILED 1:0 93038 hpl workq director1+ 71136 FAILED 1:0 93038.batch batch director1+ 1 FAILED 1:0 93039 hpl workq director1+ 71136 CANCELLED+ 0:0 93040 hpl workq director1+ 71136 CANCELLED+ 0:0 93041 hpl workq director1+ 71136 CANCELLED+ 0:0 93042 hpl workq director1+ 71136 CANCELLED+ 0:0 Intermediate Supercomputing February 2015 sacct • Display format • export SACCT_FORMAT=“jobid,jobname%20...” • sacct --format=“jobid,jobname%20...” • Filter options • Job name --name • Node list --nodelist • Start/End time --startime/--endtime Intermediate Supercomputing February 2015 Exercise: sacct • View the jobs you executed today • Run a job which displays the nodes allocated using sacct Intermediate Supercomputing February 2015 in a job script #!/bin/bash -l # 10 nodes, 24 MPI processes/node, 240 MPI processes total #SBATCH --job-name="myjob" #SBATCH --time=02:00:00 #SBATCH --ntasks=240 #SBATCH --output=myjob.%j.o #SBATCH --error=myjob.%j.e #SBATCH --account=director100 #======START===== echo "The current job ID is $SLURM_JOB_ID" echo "Running on $SLURM_JOB_NUM_NODES nodes" echo "Using $SLURM_NTASKS_PER_NODE tasks per node" echo "A total of $SLURM_NTASKS tasks is used" echo "Node list:" sacct --format=JobID,NodeList%100 -j $SLURM_JOB_ID aprun -n 240 ./a.out #=====END==== Intermediate Supercomputing February 2015 9. LUSTRE Intermediate Supercomputing February 2015 What is Lustre • • • • • • Distributed file system Fault tolerant and highly available Scalable and high performance POSIX compliant Supports locking Supports quotas Intermediate Supercomputing February 2015 Lustre Architecture • Clients • MDS • The meta-data server • open(), close(), stat(), unlink() • MDT • The meta-data target, i.e. the backend disk for the MDS • OSS • The object storage server • write(), read(), seek(), flock() • OST • The object storage target, i.e. the backend disks for the OSS Intermediate Supercomputing February 2015 Lustre Architecture Clients { Infinband Interconnect MDS MDT { 2 1 OST OSS OSS OSS OSS 5 6 7 8 { 1 2 3 4 Intermediate Supercomputing February 2015 Lustre File systems On Magnus: Filesystem Type Size User quota Group quota Flush Policy Backup /scratch Lustre 3 PB - - 30 days No /group Lustre 750 TB - 1 TB - No • • /scratch • Fast, run your jobs here under /scratch/projectid/userid • POSIX compliance and multi-node access • No quota, files will be deleted after 30 days (exceptions possible) /group • Share common files amongst a project group • Less performance than /scratch • Copy to/from /scratch as required Intermediate Supercomputing February 2015 Exercise: What is your quota? • lfs quota …? Intermediate Supercomputing February 2015 Striping • Will distribute files across OSTs • The default is not to stripe • This is the users responsibility • Striping can improve performance • Small files should not be striped • For large files, set stripe between 1-4 • Can tune stripe size – default 1MB • Can set a default stripe for a file or a directory • Files and sub-directories inherit stripe settings from their parent • Only effective when a new file is created Intermediate Supercomputing February 2015 Exercise: File striping • Create a new directory in /scratch • View the default settings • lfs getstripe • Change the stripe settings • lfs setstripe • Create new files • dd if=/dev/zero of=./file0 bs=1MB count=100 • View the stripe setting for the new file • lfs getstripe Intermediate Supercomputing February 2015 10. DATA TRANSFER Intermediate Supercomputing February 2015 Data Mover Nodes • Dedicated servers for transferring large amounts of data • /home, /scratch and /group visible • Externally visible scp/gridftp Supercomputer Hostname Magnus / Zeus / Zythos magnusdata.ivec.org Galaxy galaxydata.ivec.org Fornax fornaxdata.ivec.org Intermediate Supercomputing February 2015 copyq • The data mover nodes are available though the pawsey cluster (-M pawsey) • Submit data transfer jobs to the respective partition • Fornax uses existing mechanisms Supercomputer Partition Time Limit Magnus/Zythos magnusdm 2 days Galaxy galaxydm 2 days Intermediate Supercomputing February 2015 Exercise: Copy some data • Submit a batch script to magnusdm to get ftp://ftp.aarnet.edu.au/pub/centos/6.5/isos/x86_64/Ce ntOS-6.5-x86_64-netinstall.iso • Monitor progress of the job Intermediate Supercomputing February 2015 For more help http://portal.ivec.org • Documentation • iVEC-supported Software list • Maintenance logs Intermediate Supercomputing February 2015