.. _Using bwForCluster: Using bwForCluster ================== .. _bwForCluster login: Login ----- There are several gateways that redirect to any of the login nodes in a load-balanced way: ============================================ ================================================================ Hostname Node type ============================================ ================================================================ ``helix.bwservices.uni-heidelberg.de`` login to one of the two Helix login nodes ``justus2.uni-ulm.de`` login to one of the four JUSTUS 2 login nodes ``justus2-vis.uni-ulm.de`` login to one of the two JUSTUS VNC visualization login nodes ============================================ ================================================================ Host key fingerprint for Helix: ========= ======================================================== Algorithm Fingerprint (SHA256) ========= ======================================================== RSA ``SHA256:mFBTLtf0a4xSyrMh6x1A8Ah8FzAD0ZRHCo0mkivrYsU`` ECDSA ``SHA256:yMNcTOEsgtUoglxiJSaXtqx+pJPo3Wc8zxdG0aZeNdA`` ED25519 ``SHA256:3+Kq1tHmhjOkuAYsDttaacGXNasWe5JsrwgSWhJcGdY`` ========= ======================================================== Your username for the cluster will be your ICP ID with an ``st_`` prefix. For example, if your ID is ``ac123456``, then your Helix username will be ``st_ac123456``. More details can be found in the `wiki pages of the clusters `__. .. _bwForCluster building dependencies: Building dependencies --------------------- Python ^^^^^^ .. code-block:: bash # last update: December 2024 module load compiler/gnu/12.1 mpi/openmpi/4.1 CLUSTER_PYTHON_VERSION=3.12.4 curl -L https://www.python.org/ftp/python/${CLUSTER_PYTHON_VERSION}/Python-${CLUSTER_PYTHON_VERSION}.tgz | tar xz cd Python-${CLUSTER_PYTHON_VERSION}/ ./configure --enable-optimizations --with-lto --prefix="${HOME}/bin/cpython-${CLUSTER_PYTHON_VERSION}" make -j 10 make install make clean Boost ^^^^^ .. code-block:: bash # last update: December 2024 module load compiler/gnu/12.1 mpi/openmpi/4.1 mkdir boost-build cd boost-build BOOST_VERSION=1.82.0 BOOST_DOMAIN="https://boostorg.jfrog.io/artifactory/main" BOOST_ROOT="${HOME}/bin/boost_mpi_${BOOST_VERSION//./_}" mkdir -p "${BOOST_ROOT}" curl -sL "${BOOST_DOMAIN}/release/${BOOST_VERSION}/source/boost_${BOOST_VERSION//./_}.tar.bz2" | tar xj cd "boost_${BOOST_VERSION//./_}" echo 'using mpi ;' > tools/build/src/user-config.jam ./bootstrap.sh --with-libraries=filesystem,system,mpi,serialization,test ./b2 -j 4 install --prefix="${BOOST_ROOT}" FFTW ^^^^ .. code-block:: bash # last update: December 2024 module load compiler/gnu/12.1 mpi/openmpi/4.1 mkdir fftw-build cd fftw-build FFTW3_VERSION=3.3.10 FFTW3_ROOT="${HOME}/bin/fftw_${FFTW3_VERSION//./_}" curl -sL "https://www.fftw.org/fftw-${FFTW3_VERSION}.tar.gz" | tar xz cd "fftw-${FFTW3_VERSION}" for floating_point in "" "--enable-float"; do ./configure --enable-shared --enable-mpi --enable-threads --enable-openmp \ --disable-fortran --enable-avx --prefix="${FFTW3_ROOT}" ${floating_point} make -j 10 make install make clean done CUDA ^^^^ .. code-block:: bash # last update: August 2023 module load compiler/gnu/12.1 devel/cuda/12.1 export CLUSTER_CUDA_ROOT="${HOME}/bin/cuda_12_1" mkdir -p "${CLUSTER_CUDA_ROOT}/lib" ln -s "${CUDA_HOME}/targets/x86_64-linux/lib/stubs/libcuda.so" "${CLUSTER_CUDA_ROOT}/lib/libcuda.so" ln -s "${CUDA_HOME}/targets/x86_64-linux/lib/stubs/libcuda.so" "${CLUSTER_CUDA_ROOT}/lib/libcuda.so.1" .. _bwForCluster building software: Building software ----------------- ESPResSo ^^^^^^^^ Release 4.2: .. code-block:: bash # last update: August 2023 module load compiler/gnu/12.1 mpi/openmpi/4.1 devel/cmake/3.24.1 devel/cuda/12.1 CLUSTER_FFTW3_VERSION=3.3.10 CLUSTER_BOOST_VERSION=1.82.0 export BOOST_ROOT="${HOME}/bin/boost_mpi_${CLUSTER_BOOST_VERSION//./_}" export FFTW3_ROOT="${HOME}/bin/fftw_${CLUSTER_FFTW3_VERSION//./_}" export CUDA_HOME="${CUDA_PATH}" export CUDA_ROOT="${CUDA_PATH}" export LD_LIBRARY_PATH="${BOOST_ROOT}/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" export LD_LIBRARY_PATH="${FFTW3_ROOT}/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" export LD_LIBRARY_PATH="${LD_LIBRARY_PATH:+$LD_LIBRARY_PATH:}${CUDA_HOME}/targets/x86_64-linux/lib/stubs" git clone --recursive --branch 4.2 --origin upstream \ https://github.com/espressomd/espresso.git espresso-4.2 cd espresso-4.2 sed -ri 's/find_package\(PythonInterp 3\.[0-9] REQUIRED/find_package\(PythonInterp 3.6 REQUIRED/' CMakeLists.txt python3 -m pip install --user 'cython>=0.29.21,<3.0' python3 -m pip install --user -c "requirements.txt" setuptools numpy scipy vtk mkdir build cd build cp ../maintainer/configs/maxset.hpp myconfig.hpp sed -i "/ADDITIONAL_CHECKS/d" myconfig.hpp cmake .. -D CMAKE_BUILD_TYPE=Release -D WITH_CUDA=ON -D WITH_CCACHE=OFF -D WITH_SCAFACOS=OFF -D WITH_HDF5=OFF make -j 4 Release 4.3: .. code-block:: bash # last update: December 2024 module load compiler/gnu/12.1 mpi/openmpi/4.1 devel/cuda/12.1 CLUSTER_FFTW3_VERSION=3.3.10 CLUSTER_BOOST_VERSION=1.82.0 CLUSTER_PYTHON_VERSION=3.12.4 export BOOST_ROOT="${HOME}/bin/boost_mpi_${CLUSTER_BOOST_VERSION//./_}" export FFTW3_ROOT="${HOME}/bin/fftw_${CLUSTER_FFTW3_VERSION//./_}" export CUDA_HOME="${CUDA_PATH}" export CUDA_ROOT="${HOME}/bin/cuda_12_1" export LD_LIBRARY_PATH="${BOOST_ROOT}/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" export LD_LIBRARY_PATH="${FFTW3_ROOT}/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" export LD_LIBRARY_PATH="${LD_LIBRARY_PATH:+$LD_LIBRARY_PATH:}${CUDA_HOME}/targets/x86_64-linux/lib/stubs" export LD_LIBRARY_PATH="${LD_LIBRARY_PATH:+$LD_LIBRARY_PATH:}${CUDA_ROOT}/lib" "${HOME}/bin/cpython-${CLUSTER_PYTHON_VERSION}/bin/python" -m venv "${HOME}/venv" source "${HOME}/venv/bin/activate" git clone --recursive --branch python --origin upstream \ https://github.com/espressomd/espresso.git espresso-4.3 cd espresso-4.3 python3 -m pip install -c "requirements.txt" cython setuptools numpy scipy vtk cmake mkdir build cd build cp ../maintainer/configs/maxset.hpp myconfig.hpp sed -i "/ADDITIONAL_CHECKS/d" myconfig.hpp cmake .. -D CUDAToolkit_ROOT="${CUDA_HOME}" \ -D CMAKE_BUILD_TYPE=Release -D ESPRESSO_BUILD_WITH_CUDA=ON \ -D ESPRESSO_BUILD_WITH_CCACHE=OFF -D ESPRESSO_BUILD_WITH_WALBERLA=ON \ -D ESPRESSO_BUILD_WITH_SCAFACOS=OFF -D ESPRESSO_BUILD_WITH_HDF5=OFF make -j 10 .. _bwForCluster submitting jobs: Submitting jobs --------------- Batch command: .. code-block:: bash sbatch job.sh Job script: .. code-block:: bash #!/bin/bash #SBATCH --partition=cpu-single # Helix offers a variety of partitions #SBATCH --job-name=test #SBATCH --ntasks=1 #SBATCH --time=00:10:00 #SBATCH --output %j.stdout #SBATCH --error %j.stderr # last update: December 2024 module load compiler/gnu/12.1 mpi/openmpi/4.1 devel/cuda/12.1 CLUSTER_FFTW3_VERSION=3.3.10 CLUSTER_BOOST_VERSION=1.82.0 export BOOST_ROOT="${HOME}/bin/boost_mpi_${CLUSTER_BOOST_VERSION//./_}" export FFTW3_ROOT="${HOME}/bin/fftw_${CLUSTER_FFTW3_VERSION//./_}" export CUDA_HOME="${CUDA_PATH}" export CUDA_ROOT="${HOME}/bin/cuda_12_1" export LD_LIBRARY_PATH="${BOOST_ROOT}/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" export LD_LIBRARY_PATH="${FFTW3_ROOT}/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" export LD_LIBRARY_PATH="${LD_LIBRARY_PATH:+$LD_LIBRARY_PATH:}${CUDA_HOME}/targets/x86_64-linux/lib/stubs" export LD_LIBRARY_PATH="${LD_LIBRARY_PATH:+$LD_LIBRARY_PATH:}${CUDA_ROOT}/lib" export PYTHONPATH="${HOME}/espresso-4.3/build-maxset/src/python${PYTHONPATH:+:$PYTHONPATH}" source "${HOME}/venv/bin/activate" mpiexec --bind-to core --map-by core python3 script.py The desired partition needs to be specified via ``#SBATCH --partition`` command, without which your job will not be allocated any resources. Helix has following partitions available: =============== ======================================== ====================================== Partition Default Configuration Limit =============== ======================================== ====================================== ``devel`` ntasks=1, time=00:10:00, mem-per-cpu=2gb nodes=2, time=00:30:00 ``cpu-single`` ntasks=1, time=00:30:00, mem-per-cpu=2gb nodes=1, time=120:00:00 ``gpu-single`` ntasks=1, time=00:30:00, mem-per-cpu=2gb nodes=1, time=120:00:00 ``cpu-multi`` nodes=2, time=00:30:00 nodes=32, time=48:00:00 ``gpu-multi`` nodes=2, time=00:30:00 nodes=8, time=48:00:00 =============== ======================================== ====================================== The documentation recommends using the MPI-specific launcher, i.e. ``mpiexec`` or ``mpirun`` for OpenMPI, instead of SLURM's ``srun``. The number of processes and node information is automatically passed to the launcher. When using ``srun`` instead of the MPI-specific launcher, if the job script loads python via ``module load``, it is necessary to preload the SLURM shared objects, like so: .. code-block:: bash LD_PRELOAD=/usr/lib64/slurm/libslurmfull.so \ sbatch --partition=devel --nodes=2 --ntasks-per-node=2 job.sh Otherwise, the following fatal error is triggered: .. code-block:: none python3: error: plugin_load_from_file: dlopen(/usr/lib64/slurm/auth_munge.so): /usr/lib64/slurm/auth_munge.so: undefined symbol: slurm_conf python3: error: Couldn't load specified plugin name for auth/munge: Dlopen of plugin file failed python3: error: cannot create auth context for auth/munge python3: fatal: failed to initialize auth plugin Refer `Helix Slurm Documentation `__ for more details on submitting job scripts on Helix.