Using Ant

Login

There is 1 login node:

Hostname

Node type

ant

Ant login node

Host key fingerprint:

Algorithm

Fingerprint (SHA256)

RSA

SHA256:JOg7saslfaqZdPVy8sTv2qoWy/cCFgTIvADhzj6cHfw

ECDSA

SHA256:/fY0bZIwZ6O6+5CWAvgL79+AoxMlOelhdb71ecskKfE

ED25519

SHA256:luxPt965f5utw+7WkTvs9fJwMu93+vAktFFQA0WJHI8

Building software

ESPResSo

Release 4.3:

# last update: March 2024
module load spack/default gcc/12.3.0 cuda/12.3.0 openmpi/4.1.6 \
            fftw/3.3.10 boost/1.83.0 cmake/3.27.9 python/3.12.1

git clone --recursive --branch python --origin upstream \
    https://github.com/espressomd/espresso.git espresso-4.3
cd espresso-4.3
python3 -m venv venv
source venv/bin/activate
python3 -m pip install -c "requirements.txt" numpy scipy vtk h5py setuptools cython==3.0.6
mkdir build
cd build
cp ../maintainer/configs/maxset.hpp myconfig.hpp
sed -i "/ADDITIONAL_CHECKS/d" myconfig.hpp
cmake .. -D CMAKE_BUILD_TYPE=Release -D ESPRESSO_BUILD_WITH_CCACHE=OFF \
    -D ESPRESSO_BUILD_WITH_CUDA=ON -D CMAKE_CUDA_ARCHITECTURES="86" \
    -D CUDAToolkit_ROOT="${CUDA_HOME}" \
    -D ESPRESSO_BUILD_WITH_WALBERLA=ON -D ESPRESSO_BUILD_WITH_WALBERLA_AVX=ON \
    -D ESPRESSO_BUILD_WITH_SCAFACOS=OFF -D ESPRESSO_BUILD_WITH_HDF5=OFF
make -j 64
SITE_PACKAGES=$(python3 -c 'import sysconfig;print(sysconfig.get_path("platlib"))')
echo $(realpath ./src/python) > "${SITE_PACKAGES}/espresso.pth"
deactivate

Loading software

With EESSI:

# last update: October 2024
[user@ant ~]$ source /cvmfs/software.eessi.io/versions/2023.06/init/bash
{EESSI 2023.06} [user@ant ~]$ module load ESPResSo/4.2.2-foss-2023b
{EESSI 2023.06} [user@ant ~]$ module load pyMBE/0.8.0-foss-2023b
{EESSI 2023.06} [user@ant ~]$ python3 -c "import pyMBE"
{EESSI 2023.06} [user@ant ~]$ python3 -c "import espressomd"

With Spack:

# last update: October 2024
module load spack/default
module load gcc/12.3.0 openmpi/4.1.6 cuda/12.3.0

Submitting jobs

Caution

The default walltime for jobs on ant is set to 10 minutes. For longer jobs, explicitly set the walltime in your SLURM script. Similarly, the default RAM per allocated CPU is set to 2GB. Adapt your SLURM script if you require more memory!

#SBATCH --time=05:00:00  # for 5 hours
#SBATCH --mem-per-cpu=5G  # for 5GB per allocated CPU

Batch command:

sbatch --job-name="test" --nodes=1 --ntasks=4 --mem-per-cpu=2GB job.sh

Job script:

#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --output %j.stdout
#SBATCH --error  %j.stderr
module load spack/default gcc/12.3.0 cuda/12.3.0 openmpi/4.1.6 \
            fftw/3.3.10 boost/1.83.0 python/3.12.1
source espresso-4.3/venv/bin/activate
srun --cpu-bind=cores python3 espresso-4.3/testsuite/python/particle.py
deactivate

Benchmarks

Multi-GPU job

Run mpi4py on multiple nodes, one CPU per GPU, and make only one GPU visible per CPU.

Environment:

python3 -m venv venv
. venv/bin/activate
pip install mpi4py pycuda "numpy<2"

Launcher (gpu_vis_wrapper):

#!/bin/bash
CUDA_VISIBLE_DEVICES=$((${SLURM_PROCID} % ${SLURM_GPUS_ON_NODE})) $*

Executor (list_cuda.py):

from mpi4py import MPI
import pycuda.driver as cuda
import os

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
cuda_visible_devices = os.environ.get("CUDA_VISIBLE_DEVICES", "")
host_name = os.environ.get("SLURMD_NODENAME", "")
print(f"{rank=} {host_name=} {cuda_visible_devices=}")
import pycuda.autoinit
cuda.init()
device_count = cuda.Device.count()
print(f"{rank=} {host_name=} Number of CUDA devices available: {device_count}")
for i in range(device_count):
    device = cuda.Device(i)
    print(f"{rank=} {host_name=} Device {i}: {device.name()} - Memory: {device.total_memory() // (1024**2)} MB")

Output:

$ srun --nodes=2 -J mpi4py --ntasks-per-node=2 --gres=gpu:2 --mem-per-cpu=100MB \
       --time=00:02:00 bash ./gpu_vis_wrapper python3 ./list_cuda.py
rank=0 host_name='compute02' cuda_visible_devices='0'
rank=0 host_name='compute02' Number of CUDA devices available: 1
rank=0 host_name='compute02' Device 0: NVIDIA L4 - Memory: 22478 MB
rank=1 host_name='compute02' cuda_visible_devices='1'
rank=1 host_name='compute02' Number of CUDA devices available: 1
rank=1 host_name='compute02' Device 0: NVIDIA L4 - Memory: 22478 MB
rank=2 host_name='compute03' cuda_visible_devices='0'
rank=2 host_name='compute03' Number of CUDA devices available: 1
rank=2 host_name='compute03' Device 0: NVIDIA L4 - Memory: 22478 MB
rank=3 host_name='compute03' cuda_visible_devices='1'
rank=3 host_name='compute03' Number of CUDA devices available: 1
rank=3 host_name='compute03' Device 0: NVIDIA L4 - Memory: 22478 MB