Using Ant¶

Login¶

There is 1 login node:

Hostname	Node type
`ant`	Ant login node

Host key fingerprint:

Algorithm	Fingerprint (SHA256)
RSA	`SHA256:JOg7saslfaqZdPVy8sTv2qoWy/cCFgTIvADhzj6cHfw`
ECDSA	`SHA256:/fY0bZIwZ6O6+5CWAvgL79+AoxMlOelhdb71ecskKfE`
ED25519	`SHA256:luxPt965f5utw+7WkTvs9fJwMu93+vAktFFQA0WJHI8`

Building software¶

ESPResSo¶

Release 5.0:

# last update: June 2025
module load spack/default gcc/12.4.0 cuda/12.3.0 openmpi/4.1.6 \
            fftw/3.3.10 boost/1.84.0 cmake/3.31.5 python/3.12.1

git clone --recursive --branch python --origin upstream \
    https://github.com/espressomd/espresso.git espresso-5.0
cd espresso-5.0
python3 -m venv venv
source venv/bin/activate
python3 -m pip install -c "requirements.txt" numpy scipy vtk setuptools cython==3.0.8
mkdir build
cd build
cp ../maintainer/configs/maxset.hpp myconfig.hpp
sed -i "/ADDITIONAL_CHECKS/d" myconfig.hpp
cmake .. -D CMAKE_BUILD_TYPE=Release -D ESPRESSO_BUILD_WITH_CCACHE=OFF \
    -D ESPRESSO_BUILD_WITH_CUDA=ON -D CMAKE_CUDA_ARCHITECTURES="86" \
    -D CUDAToolkit_ROOT="${CUDA_HOME}" -D ESPRESSO_BUILD_WITH_SHARED_MEMORY_PARALLELISM=OFF \
    -D ESPRESSO_BUILD_WITH_WALBERLA=ON -D ESPRESSO_BUILD_WITH_WALBERLA_AVX=ON \
    -D ESPRESSO_BUILD_WITH_SCAFACOS=OFF -D ESPRESSO_BUILD_WITH_HDF5=OFF
make -j 64
SITE_PACKAGES=$(python3 -c 'import sysconfig;print(sysconfig.get_path("platlib"))')
echo $(realpath ./src/python) > "${SITE_PACKAGES}/espresso.pth"
deactivate

Loading software¶

With EESSI:

# last update: October 2024
[user@ant ~]$ source /cvmfs/software.eessi.io/versions/2023.06/init/bash
{EESSI 2023.06} [user@ant ~]$ module load ESPResSo/4.2.2-foss-2023b
{EESSI 2023.06} [user@ant ~]$ module load pyMBE/0.8.0-foss-2023b
{EESSI 2023.06} [user@ant ~]$ python3 -c "import pyMBE"
{EESSI 2023.06} [user@ant ~]$ python3 -c "import espressomd"

With Spack:

# last update: June 2025
module load spack/default
module load gcc/12.4.0 openmpi/4.1.6 cuda/12.3.0

Submitting jobs¶

Caution

The default walltime for jobs on ant is set to 10 minutes. For longer jobs, explicitly set the walltime in your SLURM script. Similarly, the default RAM per allocated CPU is set to 2GB. Adapt your SLURM script if you require more memory!

#SBATCH --time=05:00:00  # for 5 hours
#SBATCH --mem-per-cpu=5G  # for 5GB per allocated CPU

Batch command:

sbatch --job-name="test" --nodes=1 --ntasks=4 --mem-per-cpu=2GB job.sh

Job script:

#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --output %j.stdout
#SBATCH --error  %j.stderr
module load spack/default gcc/12.3.0 cuda/12.3.0 openmpi/4.1.6 \
            fftw/3.3.10 boost/1.84.0 cmake/3.31.5 python/3.12.1
source espresso-5.0/venv/bin/activate
srun --cpu-bind=cores python3 espresso-5.0/testsuite/python/particle.py
deactivate

Benchmarks¶

Multi-GPU job¶

Run mpi4py on multiple nodes, one CPU per GPU, and make only one GPU visible per CPU.

Environment:

python3 -m venv venv
. venv/bin/activate
pip install mpi4py pycuda "numpy<2"

Launcher (gpu_vis_wrapper):

#!/bin/bash
CUDA_VISIBLE_DEVICES=$((${SLURM_PROCID} % ${SLURM_GPUS_ON_NODE})) $*

Executor (list_cuda.py):

from mpi4py import MPI
import pycuda.driver as cuda
import os

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
cuda_visible_devices = os.environ.get("CUDA_VISIBLE_DEVICES", "")
host_name = os.environ.get("SLURMD_NODENAME", "")
print(f"{rank=} {host_name=} {cuda_visible_devices=}")
import pycuda.autoinit
cuda.init()
device_count = cuda.Device.count()
print(f"{rank=} {host_name=} Number of CUDA devices available: {device_count}")
for i in range(device_count):
    device = cuda.Device(i)
    print(f"{rank=} {host_name=} Device {i}: {device.name()} - Memory: {device.total_memory() // (1024**2)} MB")

Output:

$ srun --nodes=2 -J mpi4py --ntasks-per-node=2 --gres=gpu:2 --mem-per-cpu=100MB \
       --time=00:02:00 bash ./gpu_vis_wrapper python3 ./list_cuda.py
rank=0 host_name='compute02' cuda_visible_devices='0'
rank=0 host_name='compute02' Number of CUDA devices available: 1
rank=0 host_name='compute02' Device 0: NVIDIA L4 - Memory: 22478 MB
rank=1 host_name='compute02' cuda_visible_devices='1'
rank=1 host_name='compute02' Number of CUDA devices available: 1
rank=1 host_name='compute02' Device 0: NVIDIA L4 - Memory: 22478 MB
rank=2 host_name='compute03' cuda_visible_devices='0'
rank=2 host_name='compute03' Number of CUDA devices available: 1
rank=2 host_name='compute03' Device 0: NVIDIA L4 - Memory: 22478 MB
rank=3 host_name='compute03' cuda_visible_devices='1'
rank=3 host_name='compute03' Number of CUDA devices available: 1
rank=3 host_name='compute03' Device 0: NVIDIA L4 - Memory: 22478 MB