# ecTrans Dwarf This repository contains the installation procedure for the ecTrans dwarf that works with GPUs. It builds the following ecTrans repository by Daan Degrauwe internally: [https://github.com/ddegrauwe/ectrans](https://github.com/ddegrauwe/ectrans). The sections below will guide you through the (altered) installation process and execution of different comfigurations. ## Installation Installation is quite easy, as you only need to run the installation script: ```bash ./install.sh ``` This can be done from a login node or a compute node. You can remove all directories and files created during the installation process by running the cleaning script: ```bash ./clean.sh ``` ## Execution There are two ways of executing the ecTrans dwarf on Lumi-G: 1. Through a SBATCH job. 2. Through an interactive node. The subsections below will elaborate on both. ### SBATCH execution In the root of the repository, you will find the `run_sbatch.sh` script. This script allocates a node and sets up the CPU bindings in an GPU and OpenMP aware way. It then executes the specified model. Hence, submitting an exTrans dwarf job is simply: ```bash sbatch ./run_sbatch.sh ``` ### Interactive node execution In the root of the repository, you will find the `run_interactive.sh` script. In order to execute the ecTrans dwarf on an interactive node, we first need to allocate an interactive node. The script supports two ways of working with interactive nodes, through `salloc` or `sbatch.` Both are described in the subsections below, but we advise you to use the `sbatch` approach as it does not have the down sides of `salloc`. These subsections also contain examples of how to execute the ecTrans dwarf. #### Interactive SALLOC node You can allocate an interactive node through `salloc` and execute `bash` on top of the newly allocated node. Note however, that executing bash on top of the node can only be done in the same terminal session. If you close the terminal or lose connection with Lumi, you lose the access to the allocated node. This is why we advise to use the `sbatch` method from the subsection below. In order to allocate a node, you can execute: ```bash #!/usr/bin/env bash # ------------------------------------------------------------------------------ # Allocates a node that can be accessed through bash. # ------------------------------------------------------------------------------ JOB_NAME="ia_gpu_dev" GPUS_PER_NODE=8 NODES=1 NTASKS=8 PARTITION="dev-g" ACCOUNT="project_465000454" TIME="3:00:00" # Allocate interactive node with the set variables above. salloc \ --gpus-per-node=$GPUS_PER_NODE \ --exclusive \ --nodes=$NODES \ --ntasks=$NTASKS \ --partition=$PARTITION \ --account=$ACCOUNT \ --time=$TIME \ --mem=0 \ --job-name=$JOB_NAME ``` Note that you can alter any setting as you please, including the partition. After the node is allocated, you can access it through: ```bash srun --cpu_bind=none --nodes=1 --pty bash -i ``` When you are in bash on the node, you can run the ecTrans dwarf by simply executing the script: ```bash ./run_interactive.sh ``` #### Interactive SBATCH node You can allocate a node through `sbatch` and run the interactive script afterwards. In order to allocate a node, you `sbatch` the following script: ```bash #!/usr/bin/env bash # ------------------------------------------------------------------------------ # Creates a running job that sleeps and can be used for interactive runs. # ------------------------------------------------------------------------------ #SBATCH --job-name=sia_gpu #SBATCH --partition=dev-g #SBATCH --exclusive #SBATCH --mem=0 #SBATCH --account=project_465000454 #SBATCH --nodes=1 #SBATCH --gpus-per-node=8 #SBATCH --ntasks=8 #SBATCH --time=3:00:00 # Sleep to prevent SLURM process cancelation. sleep "3h" ``` Note that you can alter any setting as you please, including the partition. After the node is allocated, you can execute the ecTrans dwarf via the script by passing the acquired `$SLURM_JOB_ID` as an environment variable: ```bash SLURM_JOB_ID=xxxxxxx ./run_interactive.sh ``` The `$SLURM_JOB_ID` will be printed after allocating the node, or can be found through the `squeue -u $USER` command.