... | @@ -10,6 +10,50 @@ NOTE: This wiki does not contain all information, please also read the [specific |
... | @@ -10,6 +10,50 @@ NOTE: This wiki does not contain all information, please also read the [specific |
|
|
|
|
|
## Parallel directives
|
|
## Parallel directives
|
|
|
|
|
|
|
|
These directives can be used for invoking GPU parallelization.
|
|
|
|
|
|
|
|
**target**
|
|
|
|
Declares portion of code to be executed on GPU.
|
|
|
|
|
|
|
|
**teams**
|
|
|
|
With teams, the code runs on GPU using multiple "teams".
|
|
|
|
|
|
|
|
- Invokes fork/join model
|
|
|
|
- No collective synchronization! (i.e. barriers)
|
|
|
|
- Must be within "omp target"
|
|
|
|
|
|
|
|
**distribute**
|
|
|
|
Iterations of the loop below are partitioned across "teams".
|
|
|
|
Without, only thread 0 of each team would be used.
|
|
|
|
|
|
|
|
- Loop partitioning (like omp do/far)
|
|
|
|
- Iterations are partitioned across "teams"
|
|
|
|
- NO implied barrier at end of loop!
|
|
|
|
- Best practice to combine with teams (target teams distribute)
|
|
|
|
|
|
|
|
**simd**
|
|
|
|
Optimize internal functions with SIMD instructions if possible. This directive gets ignored if this is not possible, thus it can always be used.
|
|
|
|
Maps the work below to GPU threads within the "teams" blocks.
|
|
|
|
|
|
|
|
- Used for two-level GPU parallelism
|
|
|
|
- "teams" maps to GPU threadblocks
|
|
|
|
- "simd" maps to GPU threads within the teams
|
|
|
|
- According to Oak Ridge, always use in combination with parallel do.
|
|
|
|
|
|
|
|
! NOTE - parallel and simd are inconsistent across implementations !
|
|
|
|
|
|
|
|
- CCE-Classic maps "simd" to GPU threads and skips "parallel for"
|
|
|
|
- Clang maps "parallel for" to GPU threads and skips "simd"
|
|
|
|
- This will change for CCE16, where "parallel" will have the function that "simd" has in CCE15.
|
|
|
|
|
|
|
|
**parallel do**
|
|
|
|
Maps the kernel to the available threads.
|
|
|
|
Most usefull in combination with "teams" blocks.
|
|
|
|
Causes the work done in a loop inside a parallel region to be divided among threads.
|
|
|
|
|
|
|
|
**loop**
|
|
|
|
Needs to be bounded to the teams with bind(teams) or is done implicitely when called inside a "teams" region.
|
|
|
|
Basically does a *parallel do simd* internally? But it is newer.
|
|
|
|
|
|
## Memory directives.
|
|
## Memory directives.
|
|
|
|
|
... | | ... | |