Simulation software scripts and tips
Warning
Please understand that we do not whish to provide general documentation on how to use a given tool (say generic GROMACS concepts). That is left to the developer of the tools and we’ll try not redo what already exists. That said, we wish to detail our machine’ specificities and what choice of option one should select to best exploit the hardware.
CINES software stack
Amber
GPU
Molecular dynamics jobs are ran the fastest using pmemd.hip
(which is single GPU). At the moment, Amber is found not to scale well on Adastra. That said we obtain competitive performance on 1 GPU which is what we recommend.
#!/bin/bash
#SBATCH --account=<account_to_charge>
#SBATCH --job-name="<job_name>"
#SBATCH --constraint=MI250
#SBATCH --nodes=1
# #SBATCH --exclusive # Shared !
#SBATCH --gpus-per-node=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --threads-per-core=1
#SBATCH --time=1:00:00
module purge
module load archive
module load GCC-GPU-2.1.0
module load amber
module list
NUM_THREADS=1
# Not meaningful on non MPI (pmemd.hip vs pmemd.hip.mpi) jobs but required on
# multi-process ones!
# export MPICH_GPU_SUPPORT_ENABLED=1
srun --ntasks-per-node=1 --cpus-per-task="${NUM_THREADS}" --threads-per-core=1 --gpu-bind=closest --label \
-- pmemd.hip -O -i <input.0> -o mdout -p <input.1> -c <input.2>
A performance comparison of Amber on GPUs is shown below. It was established in october 2023 using Amber 22. We observe competitive performance to same generation GPUs (H100).

CP2K
GPU
#!/bin/bash
#SBATCH --account=<account_to_charge>
#SBATCH --job-name="<job_name>"
#SBATCH --constraint=MI250
#SBATCH --nodes=1
#SBATCH --exclusive
#SBATCH --time=1:00:00
module purge
module load archive
module load GCC-GPU-2.1.0
module load cp2k/2023.1-mpi-elpa-omp-plumed-scalapack-python3
module list
# export OMP_DISPLAY_AFFINITY=TRUE
export OMP_NUM_THREADS=8
srun --ntasks-per-node=8 --cpus-per-task="${OMP_NUM_THREADS}" --threads-per-core=1 --gpu-bind=closest --label \
-- cp2k.psmp <case.inp>
CPU
#!/bin/bash
#SBATCH --account=<account_to_charge>
#SBATCH --job-name="<job_name>"
#SBATCH --constraint=GENOA
#SBATCH --nodes=2
#SBATCH --exclusive
#SBATCH --time=1:00:00
module purge
module load archive
module load GCC-CPU-2.1.0
module load cp2k
module list
# export OMP_DISPLAY_AFFINITY=TRUE
export OMP_PROC_BIND=CLOSE
export OMP_PLACES=THREADS
export OMP_NUM_THREADS=4
srun --ntasks-per-node=48 --cpus-per-task="${OMP_NUM_THREADS}" --threads-per-core=1 --label \
-- cp2k.psmp <case.inp>
DFTB+
CPU
#!/bin/bash
#SBATCH --account=<account_to_charge>
#SBATCH --job-name="<job_name>"
#SBATCH --constraint=GENOA
#SBATCH --nodes=1
# #SBATCH --exclusive # Shared, it DOES NOT SCALE past one core !
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --threads-per-core=1
#SBATCH --time=1:00:00
module purge
module load archive
module load CCE-CPU-2.1.0
module load dftbplus
module list
# export OMP_DISPLAY_AFFINITY=TRUE
export OMP_PROC_BIND=CLOSE
export OMP_PLACES=THREADS
export OMP_NUM_THREADS=1
srun --ntasks-per-node=1 --cpus-per-task="${OMP_NUM_THREADS}" --threads-per-core=1 --label \
-- dftb+
GROMACS
Note
You may find this website interesting to select the software and SLURM options.
Warning
Different versions may have significant differences in simulation methods and default parameters. Reproducing results of older versions with a newer version may not be straightforward.
Note
If you have used a combination of GROMACS options that work best for a specific use case, please let us know so we can collectively benefit from your work.
External documentation can be found on the official documentation <https://manual.gromacs.org/documentation/> and this document <https://docs.alliancecan.ca/wiki/GROMACS>t__ is good a explaining the inefficiencies of GROMACS.
GPU
Warning
GROMACS is known not to scale well on multi GPUs.
#!/bin/bash
#SBATCH --account=<account_to_charge>
#SBATCH --job-name="<job_name>"
#SBATCH --constraint=MI250
#SBATCH --nodes=1
#SBATCH --exclusive
#SBATCH --time=1:00:00
module purge
module load CCE-GPU-3.0.0
module load gromacs/2023_amd-mpi-omp-plumed-python3
module list
# export OMP_DISPLAY_AFFINITY=TRUE
export OMP_PROC_BIND=CLOSE
export OMP_PLACES=THREADS
export OMP_NUM_THREADS=8
# Depending on your case, you may want to activate or deactivate the
# variables below:
# export GMX_ENABLE_DIRECT_GPU_COMM=1
# export GMX_FORCE_GPU_AWARE_MPI=1
# export MPICH_GPU_SUPPORT_ENABLED=1
srun --ntasks-per-node=8 --cpus-per-task="${OMP_NUM_THREADS}" --threads-per-core=1 --gpu-bind=closest --label \
-- gmx_mpi mdrun \
-s <case.tpr> \
-nsteps 1000 \
-pin off \
-nb gpu \
-npme 1 \
-pme gpu \
-bonded gpu \
-update auto
CPU
#!/bin/bash
#SBATCH --account=<account_to_charge>
#SBATCH --job-name="<job_name>"
#SBATCH --constraint=GENOA
#SBATCH --nodes=1
#SBATCH --exclusive
#SBATCH --time=1:00:00
module purge
module load archive
module load CCE-CPU-2.1.0
module load gromacs
module list
# export OMP_DISPLAY_AFFINITY=TRUE
export OMP_PROC_BIND=CLOSE
export OMP_PLACES=THREADS
export OMP_NUM_THREADS=2
srun --ntasks-per-node=96 --cpus-per-task="${OMP_NUM_THREADS}" --threads-per-core=1 --label \
-- gmx_mpi mdrun \
-s <case.tpr> \
-plumed <plume_configuration.dat> \
-nsteps <step_count> \
-pin off
LAMMPS
GPU
#!/bin/bash
#SBATCH --account=<account_to_charge>
#SBATCH --job-name="<job_name>"
#SBATCH --constraint=MI250
#SBATCH --nodes=1
#SBATCH --exclusive
#SBATCH --time=1:00:00
module purge
module load CCE-GPU-3.1.0
module load lammps
module list
export MPICH_GPU_SUPPORT_ENABLED=1
export NUM_THREADS=1
srun --nodes=1 --ntasks-per-node=8 --cpus-per-task="${NUM_THREADS}" --threads-per-core=1 --gpu-bind=closest --label \
-- lmp -sf kk -k on g 8 -i in.lj
CPU
#!/bin/bash
#SBATCH --account=<account_to_charge>
#SBATCH --job-name="<job_name>"
#SBATCH --constraint=GENOA
#SBATCH --nodes=1
#SBATCH --exclusive
#SBATCH --time=1:00:00
module purge
module load archive
module load CCE-CPU-2.1.0
module load lammps
module list
# export OMP_DISPLAY_AFFINITY=TRUE
export OMP_PROC_BIND=CLOSE
export OMP_PLACES=THREADS
export OMP_NUM_THREADS=2
srun --ntasks-per-node=192 --cpus-per-task="${OMP_NUM_THREADS}" --threads-per-core=2 --label \
-- lmp -sf omp -i <case>
NAMD
Note
You may find this website interesting to select the software and SLURM options.
GPU
The NAMD GPU binary is single node, multi GPU.
#!/bin/bash
#SBATCH --account=<account_to_charge>
#SBATCH --job-name="<job_name>"
#SBATCH --constraint=MI250
#SBATCH --nodes=1 # Only support mono node (non-MPI) computations.
#SBATCH --exclusive
#SBATCH --time=1:00:00
module purge
module load archive
module load CPE-22.11-gcc-11.2.0-softs
module load namd/3.0a9-mpi
module list
# As many core as GPUs.
NUM_THREADS=8
srun --ntasks-per-node=1 --cpus-per-task="${NUM_THREADS}" --threads-per-core=1 --cpu-bind=none --mem-bind=none --label \
-- namd3 +p "${NUM_THREADS}" +setcpuaffinity --CUDASOAintegrate on +devices 0,1,2,3,4,5,6,7 <case>
CPU
#!/bin/bash
#SBATCH --account=<account_to_charge>
#SBATCH --job-name="<job_name>"
#SBATCH --constraint=GENOA
#SBATCH --nodes=1
#SBATCH --exclusive
#SBATCH --time=1:00:00
module purge
module load archive
module load GCC-CPU-2.1.0
module load namd
module list
NUM_THREADS=4
srun --ntasks-per-node=48 --cpus-per-task="${NUM_THREADS}" --threads-per-core=1 --label \
-- namd3 +p "${NUM_THREADS}" +setcpuaffinity <case>
Quantum espresso
CPU
#!/bin/bash
#SBATCH --account=<account_to_charge>
#SBATCH --job-name="<job_name>"
#SBATCH --constraint=GENOA
#SBATCH --nodes=1
#SBATCH --exclusive
#SBATCH --time=1:00:00
module purge
module load develop GCC-CPU-3.1.0
module load quantum-espresso
module list
# export OMP_DISPLAY_AFFINITY=TRUE
export OMP_PROC_BIND=CLOSE
export OMP_PLACES=THREADS
export OMP_NUM_THREADS=1
srun --ntasks-per-node=192 --cpus-per-task="${OMP_NUM_THREADS}" --threads-per-core=1 --label \
-- pw.x < <case>
Other simulation softwares
Boltz
One possible way to use Boltz on Adastra is to separate MSA from structure predictions, i.e. using compute_msa
to generate MSA files for your sequences on the login node (or wherever you have internet access), specifying the files in the input FASTA files, and running predictions without the flag --use_msa_server
. Unless you do that Boltz will try to download file from the internet while it is running on a compute node which does not have access https://api.colabfold.com.
You can ask more questions on the subject here https://github.com/jwohlwend/boltz/issues/176#issuecomment-2646006610
VASP
CPU
An example module environment is given below. This is the one we use for the compilation described below and also when running on the CPU partition:
$ module purge
$ module load cpe/24.07
$ module load craype-x86-genoa
$ module load PrgEnv-gnu
$ module load cray-hdf5 cray-fftw
$ module list
Compilation
CINES provides a makefile.include
machine file for the environment defined above.
The makefile.include
file is to be placed into the Vasp repository’s directory (next to the makefile
and README.md
), its content is:
# Default precompiler options
CPP_OPTIONS = -DHOST=\"LinuxGNU\" \
-DMPI -DMPI_BLOCK=8000 -Duse_collective \
-DscaLAPACK \
-DCACHE_SIZE=4000 \
-Davoidalloc \
-Dvasp6 \
-Duse_bse_te \
-Dtbdyn \
-Dfock_dblbuf \
-D_OPENMP \
-DMPI_INPLACE
CPP = cc -E -C -w $*$(FUFFIX) >$*$(SUFFIX) $(CPP_OPTIONS)
CC = cc -fopenmp
FC = ftn -fopenmp
FCL = ftn -fopenmp
FREE = -ffree-form -ffree-line-length-none
FFLAGS = -w -ffpe-summary=none
OFLAG = -O2 # Or O3
OFLAG_IN = $(OFLAG)
DEBUG = -O0
# NOTE: you may want to comment these tree lines below if you want to build
# Vasp 6.5
OBJECTS = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o
OBJECTS_O1 += fftw3d.o fftmpi.o fftmpiw.o
OBJECTS_O2 += fft3dlib.o
# For what used to be vasp.5.lib
CPP_LIB = $(CPP)
FC_LIB = $(FC)
CC_LIB = $(CC)
CFLAGS_LIB = -O
FFLAGS_LIB = -O1
FREE_LIB = $(FREE)
OBJECTS_LIB = linpack_double.o
# For the parser library
CXX_PARS = CC
# We need that because we link using the `ftn` which does not link with the C++
# library (obviously).
LLIBS = -lstdc++
##
## Customize as of this point! Of course you may change the preceding
## part of this file as well if you like, but it should rarely be
## necessary ...
##
# When compiling on the target machine itself, change this to the
# relevant target when cross-compiling for another architecture
# Implicit with Cray wrappers:
# VASP_TARGET_CPU ?= -march=zen4
# For gcc-10 and higher (vasp is broken an non standard)
FFLAGS += -fallow-argument-mismatch
# BLAS (mandatory)
# Implicit with Cray wrappers.
# LAPACK (mandatory)
# Implicit with Cray wrappers.
# scaLAPACK (mandatory)
# Implicit with Cray wrappers.
# HDF5-support (optional but strongly recommended)
CPP_OPTIONS+= -DVASP_HDF5
# Implicit with Cray wrappers.
# For the fftlib library (recommended)
CPP_OPTIONS += -Dsysv
FCL += fftlib.o
CXX_FFTLIB = CC -fopenmp -std=c++11 -DFFTLIB_THREADSAFE
INCS_FFTLIB = -I./include
LIBS += fftlib
LLIBS += -ldl
Then one does the following commands:
$ rm -rf build
$ mkdir -p bin
$ make -j DEPS=1 VASP_BUILD_DIR=build all
The binaries will be placed in the bin
directory inside the Vasp repository’s directory.
Warning
Before compiling, ensure you do not have older binaries in the bin
directory.
Running
#!/bin/bash
#SBATCH --account=<account_to_charge>
#SBATCH --job-name="<job_name>"
#SBATCH --constraint=GENOA
#SBATCH --nodes=2
#SBATCH --exclusive
#SBATCH --time=1:00:00
module purge
module load cpe/24.07
module load craype-x86-genoa
module load PrgEnv-gnu
module load cray-hdf5 cray-fftw
module list
# export OMP_DISPLAY_AFFINITY=TRUE
export OMP_PROC_BIND=CLOSE
export OMP_PLACES=THREADS
export OMP_NUM_THREADS=1
srun --ntasks-per-node=192 --cpus-per-task="${OMP_NUM_THREADS}" --threads-per-core=1 --label \
-- vasp_std