README.md

# ecTrans Dwarf

This repository contains scripts for installing and executing the GPU-aware 
ecTrans dwarf as defined by Daan Degrauwe's repository:
[https://github.com/ddegrauwe/ectrans/tree/alaro_ectrans](https://github.com/ddegrauwe/ectrans/tree/alaro_ectrans).
Note that this is not the official repository of ecTrans.

The installation procedure installs the following packages and their versions:

- ecBuild - Tag 3.8.2
- FIAT - Tag 1.2.0
- ecTrans - Custom `alaro_ectrans` version by Daan Degrauwe

The sections below will guide you through the (altered) installation process and execution of different comfigurations.

## Installation

Installation is quite easy, as you only need to run the installation script:

```bash
./install.sh
```

This can be done from a login node or a compute node.

You can remove all directories and files created during the installation process
by running the cleaning script:

```bash
./clean.sh
```

## Execution

There are two ways of executing the ecTrans dwarf on Lumi-G:

1. Through a SBATCH job.
2. Through an interactive node.

The subsections below will elaborate on both.

### SBATCH execution

In the root of the repository, you will find the `run_sbatch.sh` script.
This script allocates a node and sets up the CPU bindings in an GPU and OpenMP
aware way.
It then executes the specified model.  
Hence, submitting an exTrans dwarf job is simply:

```bash
sbatch ./run_sbatch.sh
```

### Interactive node execution

In the root of the repository, you will find the `run_interactive.sh` script.
In order to execute the ecTrans dwarf on an interactive node, we first need to
allocate an interactive node.
The script supports two ways of working with interactive nodes, through `salloc`
or `sbatch.`
Both are described in the subsections below, but we advise you to use the
`sbatch` approach as it does not have the down sides of `salloc`.
These subsections also contain examples of how to execute the ecTrans dwarf.

#### Interactive SALLOC node

You can allocate an interactive node through `salloc` and execute `bash` on top
of the newly allocated node.
Note however, that executing bash on top of the node can only be done in the
same terminal session.
If you close the terminal or lose connection with Lumi, you lose the access to
the allocated node.
This is why we advise to use the `sbatch` method from the subsection below.

In order to allocate a node, you can execute:

```bash
#!/usr/bin/env bash
# ------------------------------------------------------------------------------
# Allocates a node that can be accessed through bash.
# ------------------------------------------------------------------------------
JOB_NAME="ia_gpu_dev"
GPUS_PER_NODE=8
NODES=1
NTASKS=8
PARTITION="dev-g"
ACCOUNT="project_465000454"
TIME="3:00:00"

# Allocate interactive node with the set variables above.
salloc \
    --gpus-per-node=$GPUS_PER_NODE \
    --exclusive \
    --nodes=$NODES \
    --ntasks=$NTASKS \
    --partition=$PARTITION \
    --account=$ACCOUNT \
    --time=$TIME \
    --mem=0 \
    --job-name=$JOB_NAME
```

Note that you can alter any setting as you please, including the partition.
After the node is allocated, you can access it through:

```bash
srun --cpu_bind=none --nodes=1 --pty bash -i
```

When you are in bash on the node, you can run the ecTrans dwarf by simply
executing the script:

```bash
./run_interactive.sh
```

#### Interactive SBATCH node

You can allocate a node through `sbatch` and run the interactive script 
afterwards. In order to allocate a node, you `sbatch` the following script:

```bash
#!/usr/bin/env bash
# ------------------------------------------------------------------------------
# Creates a running job that sleeps and can be used for interactive runs.
# ------------------------------------------------------------------------------
#SBATCH --job-name=sia_gpu
#SBATCH --partition=dev-g
#SBATCH --exclusive
#SBATCH --mem=0
#SBATCH --account=project_465000454
#SBATCH --nodes=1
#SBATCH --gpus-per-node=8
#SBATCH --ntasks=8
#SBATCH --time=3:00:00

# Sleep to prevent SLURM process cancelation.
sleep "3h"
```

Note that you can alter any setting as you please, including the partition.
After the node is allocated, you can execute the ecTrans dwarf via the script by
passing the acquired `$SLURM_JOB_ID` as an environment variable:

```bash
SLURM_JOB_ID=xxxxxxx ./run_interactive.sh
```

The `$SLURM_JOB_ID` will be printed after allocating the node, or can be found
through the `squeue -u $USER` command.

## Loading ROCtx markers.

First create the `hipinit.F90` and `roctx.F90` files in
`sources/ectrans/src/programs/`. Their contents should be:

hipinit.F90:

```fortran
module hipinit
    interface 
        function hipInit_(flags) bind(c, name="hipInit")
        use iso_c_binding, only: c_int
        implicit none
    
        integer :: hipInit_
        integer(c_int),value :: flags
        end function
    end interface
end module hipinit
```

roctx.F90:

```fortran
module roctx
    use iso_c_binding
    implicit none
    
    integer, private, parameter :: ROCTX_MAX_LEN = 256
    
    interface roctxMarkA
        subroutine roctxMarkA(name) bind(C, name="roctxMarkA")
            use iso_c_binding
            character(kind=c_char) :: name(256)
        end subroutine roctxMarkA
    end interface roctxMarkA
    
    interface roctxRangeStartA
        subroutine roctxRangeStartA(name) bind(C, name='roctxRangeStartA')
            use iso_c_binding
            character(kind=c_char) :: name(256)
        end subroutine roctxRangeStartA
    end interface roctxRangeStartA
    
    interface roctxRangePushA
        subroutine roctxRangePushA(name) bind(C, name='roctxRangePushA')
            use iso_c_binding
            character(kind=c_char) :: name(256)
        end subroutine roctxRangePushA
    end interface roctxRangePushA
    
    interface roctxRangePop
        subroutine roctxRangePop() bind(C, name='roctxRangePop')
        end subroutine roctxRangePop
    end interface roctxRangePop
    
    contains

    subroutine roctxRangePush(name)
        character(kind=c_char,len=*) :: name
        
        call roctxRangePushA(formatString(name))
    end subroutine roctxRangePush
    
    subroutine roctxMark(name)
        character(kind=c_char,len=*) :: name
    
        call roctxMarkA(formatString(name))
    end subroutine roctxMark
    
    function formatString(str)
        character(kind=c_char,len=*) :: str
        character :: c_str(ROCTX_MAX_LEN)
        integer:: i, str_len
        character(kind=c_char) :: formatString(ROCTX_MAX_LEN)
        
        str_len = len(trim(str))    
        do i = 1, len(trim(str))
            c_str(i) = str(i:i)
        end do
        c_str(str_len+1) = C_NULL_CHAR
    
        formatString = c_str
    end function formatString
end module roctx
```

Next, add the HIP and ROCtx modules to the build process by adding the sources
to the `CMakeLists.txt` file in `sources/ectrans/src/programs/`. You need to
add it for every occurance of `ectrans-lam-benchmark`. So change the SOURCES
lines on lines 47, 94, and 108 into:

```fortran
SOURCES hipinit.F90 roctx.F90 ectrans-lam-benchmark.F90
```

You also need to link the created libraries by adding `roctx64` to the list on
lines 54, 103, and 117.

Then link the roctx library in `toolchain_lumi.cmake` on lines 31, 32, 33:

```CMake
set( OpenACC_C_FLAGS "-hacc -lroctx64" )
set( OpenACC_CXX_FLAGS "-hacc -lroctx64" )
set( OpenACC_Fortran_FLAGS "-hacc -lroctx64 -h acc_model=deep_copy:no_fast_addr:auto_async_none" )
```

Then load these newmodules in the `ectrans-lam-benchmark.F90` file by adding
the includes on line 50:

```fortran
! Add the custom hipinit and roctx files.
use roctx          ! ROCTX Interface
use hipinit        ! Hip Init Interface
```

Within `ectrans-lam-benchmark.F90`, also create an integer for the HIP
initialization at the end of the definitions on line 208, and initialize the
HIP environment after the includes on line 223:

```fortran
integer ::   val        ! On line 208

val = hipInit_(0)       ! On line 224
```

Now you can add roctx markers where you want and create the traces.
Simply add a `roctxRangePush(roctx_name)` call with the name of choice at the beginning of the region, and close with a call to `roctxRangePop()`.