This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Last revision Both sides next revision | ||
software:specfem [2017/03/28 17:10] wphase |
software:specfem [2018/03/05 16:05] wphase |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== SPECFEM3D_GLOBE ====== | ====== SPECFEM3D_GLOBE ====== | ||
- | ==== Running SPECFEM3D_GLOBE on the Strasbourg HPC cluster with intel17 and cuda7.5 ==== | + | |
+ | ==== Running SPECFEM3D_GLOBE on the Strasbourg HPC cluster with gnu 4.8 and cuda 7.5 ==== | ||
=== Setup the environment === | === Setup the environment === | ||
Line 8: | Line 9: | ||
module purge | module purge | ||
module load batch/slurm | module load batch/slurm | ||
- | module load compilers/intel17 | ||
module load compilers/cuda-7.5 | module load compilers/cuda-7.5 | ||
- | module load mpi/openmpi-2.0.i17.cuda75 | ||
- | export CUDA_LIB=/usr/local/cuda/cuda-7.5/lib64 | ||
export CUDA_INC=/usr/local/cuda/cuda-7.5/include | export CUDA_INC=/usr/local/cuda/cuda-7.5/include | ||
+ | export CUDA_LIB=/usr/local/cuda/cuda-7.5/lib64 | ||
+ | export PATH=/rpriv/ipgs/zac/openmpi-1.10.7/bin:$PATH | ||
+ | export LD_LIBRARY_PATH=/rpriv/ipgs/zac/openmpi-1.10.7/lib:$LD_LIBRARY_PATH | ||
+ | </code> | ||
+ | Notice that we use default gnu compiler of the operating system: | ||
+ | <code> | ||
+ | $ gfortran --version | ||
+ | GNU Fortran (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4) | ||
</code> | </code> | ||
=== Compilation === | === Compilation === | ||
- | First make sure that required modules are loaded and CUDA_LIB, CUDA_INC environment variables are declared (see previous section). Create a run directory including directories ''DATABASE_MPI'', ''OUTPUT_FILES'', ''bin'' and ''DATA''. | + | Before compilation, make sure that required modules are loaded and CUDA_LIB, CUDA_INC environment variables are declared (see previous section). Create a run directory including directories ''DATABASE_MPI'', ''OUTPUT_FILES'', ''bin'' and ''DATA''. |
In the directory ''DATA'', create ''CMTSOLUTION'', ''Par_file'' and ''STATIONS'' file (cf., SPECFEM3D_GLOBE documentation). | In the directory ''DATA'', create ''CMTSOLUTION'', ''Par_file'' and ''STATIONS'' file (cf., SPECFEM3D_GLOBE documentation). | ||
Line 46: | Line 52: | ||
ln -s DATA/topo_bathy $rundir/DATA/topo_bathy | ln -s DATA/topo_bathy $rundir/DATA/topo_bathy | ||
</code> | </code> | ||
+ | |||
+ | Below is a script doing all the configuration and compilation: | ||
+ | <code> | ||
+ | #!/bin/bash | ||
+ | |||
+ | # Load modules | ||
+ | module purge | ||
+ | module load batch/slurm | ||
+ | module load compilers/cuda-7.5 | ||
+ | module load mpi/openmpi-basic | ||
+ | export CUDA_LIB=/usr/local/cuda/cuda-7.5/lib64 | ||
+ | export CUDA_INC=/usr/local/cuda/cuda-7.5/include | ||
+ | |||
+ | # source directory | ||
+ | rootdir=/b/home/ipgs/cmorales/specfem3d_globe | ||
+ | |||
+ | # setting up run directory | ||
+ | currentdir=`pwd` | ||
+ | |||
+ | mkdir -p DATABASES_MPI | ||
+ | mkdir -p OUTPUT_FILES | ||
+ | |||
+ | rm -rf DATABASES_MPI/* | ||
+ | rm -rf OUTPUT_FILES/* | ||
+ | |||
+ | # configure and compile in the source directory | ||
+ | cd $rootdir | ||
+ | |||
+ | # configure | ||
+ | ./configure -with-cuda=cuda5 | ||
+ | |||
+ | # compiles for a forward simulation | ||
+ | cp $currentdir/DATA/Par_file DATA/Par_file | ||
+ | make clean | ||
+ | make all | ||
+ | |||
+ | # backup of constants setup | ||
+ | cp setup/* $currentdir/OUTPUT_FILES/ | ||
+ | cp DATA/Par_file $currentdir/OUTPUT_FILES/ | ||
+ | |||
+ | # Copy executables/Model in the current directory | ||
+ | cd $currentdir | ||
+ | |||
+ | # copy executables | ||
+ | mkdir -p bin | ||
+ | cp $rootdir/bin/xmeshfem3D ./bin/ | ||
+ | cp $rootdir/bin/xspecfem3D ./bin/ | ||
+ | |||
+ | # Links data necessary directories | ||
+ | # The example below is for s362ani... this part should be changed if another model is used | ||
+ | cd DATA/ | ||
+ | ln -s $rootdir/DATA/crust2.0 | ||
+ | ln -s $rootdir/DATA/s362ani | ||
+ | ln -s $rootdir/DATA/QRFSI12 | ||
+ | ln -s $rootdir/DATA/topo_bathy | ||
+ | cd ../ | ||
+ | </code> | ||
+ | |||
+ | |||
+ | === Run with CPU === | ||
+ | |||
+ | Example of slurm script (number of CPU cores should be adapted to NPROC_XI and NPROC_ETA: | ||
+ | <code> | ||
+ | #!/bin/bash | ||
+ | #SBATCH -p grant -A g2016a68 # Partition / Account | ||
+ | #SBATCH -n 96 # Number of CPU cores | ||
+ | #SBATCH --job-name=SPECFEM | ||
+ | #SBATCH -t 23:00:00 # Wall time | ||
+ | |||
+ | # Load modules | ||
+ | module purge | ||
+ | module load batch/slurm | ||
+ | module load mpi/openmpi-basic | ||
+ | |||
+ | # | ||
+ | echo Master on host `hostname` | ||
+ | echo Time is `date` | ||
+ | |||
+ | # Start time | ||
+ | begin=`date +"%s"` | ||
+ | |||
+ | # Run | ||
+ | BASEMPIDIR=`grep LOCAL_PATH DATA/Par_file | cut -d = -f 2 ` | ||
+ | NPROC_XI=`grep NPROC_XI DATA/Par_file | cut -d = -f 2 ` | ||
+ | NPROC_ETA=`grep NPROC_ETA DATA/Par_file | cut -d = -f 2` | ||
+ | NCHUNKS=`grep NCHUNKS DATA/Par_file | cut -d = -f 2 ` | ||
+ | numcpus=$(( $NCHUNKS * $NPROC_XI * $NPROC_ETA )) | ||
+ | |||
+ | # numcpus should be consistent with the -n option | ||
+ | |||
+ | mkdir -p OUTPUT_FILES | ||
+ | # backup files used for this simulation | ||
+ | cp DATA/Par_file OUTPUT_FILES/ | ||
+ | cp DATA/STATIONS OUTPUT_FILES/ | ||
+ | cp DATA/CMTSOLUTION OUTPUT_FILES/ | ||
+ | |||
+ | ## | ||
+ | ## mesh generation | ||
+ | ## | ||
+ | sleep 2 | ||
+ | |||
+ | echo | ||
+ | echo `date` | ||
+ | echo "starting MPI mesher on $numcpus processors" | ||
+ | echo | ||
+ | |||
+ | mpirun -np $numcpus bin/xmeshfem3D | ||
+ | |||
+ | echo " mesher done: `date`" | ||
+ | echo | ||
+ | |||
+ | # backup important files addressing.txt and list*.txt | ||
+ | cp OUTPUT_FILES/*.txt $BASEMPIDIR/ | ||
+ | |||
+ | ## | ||
+ | ## forward simulation | ||
+ | ## | ||
+ | |||
+ | # set up addressing | ||
+ | #cp $BASEMPIDIR/addr*.txt OUTPUT_FILES/ | ||
+ | #cp $BASEMPIDIR/list*.txt OUTPUT_FILES/ | ||
+ | |||
+ | sleep 2 | ||
+ | |||
+ | echo | ||
+ | echo `date` | ||
+ | echo starting run in current directory $PWD | ||
+ | echo | ||
+ | |||
+ | mpirun -np $numcpus bin/xspecfem3D | ||
+ | /bin/rm -rf DATABASES_MPI | ||
+ | |||
+ | echo "finished successfully" | ||
+ | echo `date` | ||
+ | |||
+ | # Print time after running | ||
+ | echo Time is `date` | ||
+ | echo Walltime : $(expr `date +"%s"` - $begin) # Seconds | ||
+ | echo CPUtime : $(squeue -j $SLURM_JOBID -o "%M" -h) # HH:MM:SS | ||
+ | </code> | ||
+ | |||
+ | |||
+ | |||
+ | === Run with GPU === | ||
+ | |||
+ | Example of slurm script (number of nodes should be adapted to the number of GPU per nodes and to NPROC_XI and NPROC_ETA: | ||
+ | <code> | ||
+ | #!/bin/bash | ||
+ | #SBATCH -p pri2015gpu -A eost | ||
+ | #SBATCH -N 3-3 # Will use 3 nodes | ||
+ | #SBATCH --tasks-per-node 8 # 8 tasks per node | ||
+ | #SBATCH --gres=gpu:8 # We only want nodes with 8 GPUs | ||
+ | #SBATCH --job-name=SPECFEM | ||
+ | #SBATCH -t 12:00:00 # Wall time | ||
+ | #SBATCH --cpu_bind=verbose | ||
+ | |||
+ | # Load modules | ||
+ | module purge | ||
+ | module load batch/slurm | ||
+ | module load compilers/cuda-7.5 | ||
+ | module load mpi/openmpi-basic | ||
+ | |||
+ | # ID of each GPU (should be adapted if using a different number of GPUs) | ||
+ | export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 | ||
+ | |||
+ | echo Master on host `hostname` | ||
+ | echo Time is `date` | ||
+ | |||
+ | # Start time | ||
+ | begin=`date +"%s"` | ||
+ | |||
+ | # Run | ||
+ | BASEMPIDIR=`grep LOCAL_PATH DATA/Par_file | cut -d = -f 2 ` | ||
+ | NPROC_XI=`grep NPROC_XI DATA/Par_file | cut -d = -f 2 ` | ||
+ | NPROC_ETA=`grep NPROC_ETA DATA/Par_file | cut -d = -f 2` | ||
+ | NCHUNKS=`grep NCHUNKS DATA/Par_file | cut -d = -f 2 ` | ||
+ | numgpus=$(( $NCHUNKS * $NPROC_XI * $NPROC_ETA )) | ||
+ | |||
+ | mkdir -p OUTPUT_FILES | ||
+ | |||
+ | # backup files used for this simulation | ||
+ | cp -pf DATA/Par_file OUTPUT_FILES/ | ||
+ | cp -pf DATA/STATIONS OUTPUT_FILES/ | ||
+ | cp -pf DATA/CMTSOLUTION OUTPUT_FILES/ | ||
+ | |||
+ | ## | ||
+ | ## mesh generation | ||
+ | ## | ||
+ | sleep 2 | ||
+ | |||
+ | echo | ||
+ | echo `date` | ||
+ | echo "starting MPI mesher on $numgpus processors" | ||
+ | echo | ||
+ | |||
+ | mpirun -np $numgpus bin/xmeshfem3D | ||
+ | |||
+ | echo " mesher done: `date`" | ||
+ | echo | ||
+ | |||
+ | # backup important files addressing.txt and list*.txt | ||
+ | cp OUTPUT_FILES/*.txt $BASEMPIDIR/ | ||
+ | |||
+ | ## | ||
+ | ## forward simulation | ||
+ | ## | ||
+ | |||
+ | # set up addressing | ||
+ | #cp $BASEMPIDIR/addr*.txt OUTPUT_FILES/ | ||
+ | #cp $BASEMPIDIR/list*.txt OUTPUT_FILES/ | ||
+ | |||
+ | sleep 2 | ||
+ | |||
+ | echo | ||
+ | echo `date` | ||
+ | echo starting run in current directory $PWD | ||
+ | echo | ||
+ | |||
+ | mpirun -np $numgpus bin/xspecfem3D | ||
+ | /bin/rm -rf DATABASES_MPI | ||
+ | |||
+ | echo "finished successfully" | ||
+ | echo `date` | ||
+ | |||
+ | # Print time after running | ||
+ | echo Time is `date` | ||
+ | echo Walltime : $(expr `date +"%s"` - $begin) # Seconds | ||
+ | echo CPUtime : $(squeue -j $SLURM_JOBID -o "%M" -h) # HH:MM:SS | ||
+ | </code> | ||
+ | |||
+ | |||
+ | |||
+ | ==== Running simulations in parallel ==== | ||
+ | |||
+ | Some instructions to use custom scripts enabling parallel SEM simulations on the HPC cluster | ||
+ | |||
+ | === Preparing the input files === | ||
+ | |||
+ | |||
+ | First, create an event list "Events.txt" with 3 collumns: | ||
+ | * 1st column: event_id (will also be the name of the run directory) | ||
+ | * 2nd column: path to ''CMTSOLUTION'' file for this event | ||
+ | * 3nd column: path to ''STATION'' file for this event (can be the same for all events) | ||
+ | |||
+ | Then you must setup a ''Par_file'' (be careful to use a version of ''Par_file'' that is compatible with your SEM version) | ||
+ | |||
+ | Finally, you must setup hostfiles named ''nodelistN'' files where N=0,...,Np-1 (Np, the number of parallel SEM simulations). These files must specify host names and number of slots per node. Here is an example: | ||
+ | <code> | ||
+ | $ cat nodelist0 | ||
+ | hpc-n443 slots=8 | ||
+ | hpc-n444 slots=8 | ||
+ | hpc-n445 slots=8 | ||
+ | </code> | ||
+ | (see ''/b/home/eost/zac/jobs/specfem/parallelSEM/nodelist0'') | ||
+ | |||
+ | === Running the simulations in parallel === | ||
+ | |||
+ | Parallel SEM simulations are handled using 3 scripts: | ||
+ | * ''parallelSEM.sh'': is the main script, that compiles the code and run the simulations | ||
+ | * ''run_gpu_nodelist.sh'' is the script used to run the mesher and solver | ||
+ | * ''sleep.slurm'' is a script to reserve the GPU nodes | ||
+ | All these scripts are available in ''/b/home/eost/zac/jobs/specfem/parallelSEM'' | ||
+ | |||
+ | Before running your job, make sure that the input parameters in ''parallelSEM.sh'' are consistent with the input parameters stated above (see ''INPUT PARAMS'' in the main script). Specifically: | ||
+ | * ''SPECFEMDIR'': path to SPECFEM3D_GLOBE directory | ||
+ | * ''Par_file'': path to the Par_file used in simulations | ||
+ | * ''Nparallel'': Number of SEM simulations in parallel (make sure enough GPUs are available) | ||
+ | * ''event_list'': List of events with the format given above | ||
+ | |||
+ | Then run your simulations: | ||
+ | <code> | ||
+ | ./parallelSEM.sh | ||
+ | </code> | ||
+ | The script will make sure that the GPU nodes are available before launching SPECFEM3D_GLOBE. | ||
+ | |||
+ | |||