This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
library:computing:ifs_impi_troubles [2017/08/11 14:30] mcastril [IFS @ MN4] |
library:computing:ifs_impi_troubles [2017/08/24 13:56] (current) mcastril |
||
---|---|---|---|
Line 2: | Line 2: | ||
===== IFS @ MN4 ===== | ===== IFS @ MN4 ===== | ||
+ | |||
+ | ==== Issue 1: IFS memory corrupted when activating AXV512 ==== | ||
**Environment: | **Environment: | ||
Line 7: | Line 9: | ||
* Intel 2017.4 & Intel MPI 2017.4 | * Intel 2017.4 & Intel MPI 2017.4 | ||
- | The problem was reported when using -O2 & -O3 optimization flags in conjunction with activation of [[http:// | + | The problem was reported when using -O2 & -O3 optimization flags in conjunction with activation of [[http:// |
- | **Problem: | + | **Problem: |
< | < | ||
Line 43: | Line 45: | ||
</ | </ | ||
- | The error was happening at different steps or processes depending on the run. | + | (The error was happening at different steps or in different |
- | However, if the model is run using __-O2__ (and also -xCORE-AVX512) in the compilation, | + | However, if the model is run using __-O2__ (and also -xCORE-AVX512) in the compilation, |
< | < | ||
Line 74: | Line 76: | ||
</ | </ | ||
- | **Actions taken:** We first debugged the regions of the code referred in the error trace using Allinea DDT and though it is difficult to debug using -O2 or -O3 (because variables are optimized and its value hidden, and code lines does not always correspond to the ones actually being executed), we could see where the problem | + | **Actions taken:** We first debugged the regions of the code referred in the error trace by using Allinea DDT and, though it is difficult to debug using -O2 or -O3 (because variables are optimized and its value hidden, and code lines does not always correspond to the ones actually being executed), we could see where the problem |
- | The part of the code that was exiting in the O3 mode is in the ludcmp routine: | + | The part of the code that was exiting in the -O3 mode was in the //ludcmp// routine: |
< | < | ||
Line 87: | Line 89: | ||
</ | </ | ||
- | Basically it is checking | + | Basically it is checking |
- | The part of the code that crashes in the O2 mode is in surfexcdriver_ctl_mod module: | + | The part of the code that crashes in the O2 mode is in //surfexcdriver_ctl_mod// module: |
< | < | ||
Line 102: | Line 104: | ||
</ | </ | ||
- | Here the code is assigning a weigthed value to PSSRFLTI, | + | Here the code is assigning a weigthed value to PSSRFLTI, |
- | The fact that using O2 provoked an "array index out of bounds" | + | The fact that using -O2 provoked an "array index out of bounds" |
- | In order to have more information | + | In order to have more information |
- | **Diagnosis: | + | **Diagnosis: |
- | **Solution: | + | **Solution: |
+ | |||
+ | The first fix is to use [[https:// | ||
+ | |||
+ | < | ||
+ | !DIR$ NOVECTOR | ||
+ | DO JTILE=1, | ||
+ | !DIR$ NOVECTOR | ||
+ | DO JL=KIDIA, | ||
+ | ! Disaggregate solar flux but limit to 700 W/m2 (due to inconsistency | ||
+ | ! with albedo) | ||
+ | PSSRFLTI(JL, | ||
+ | & (1.0_JPRB-ZALB(JL)))*PSSRFL(JL) | ||
+ | IF (PSSRFLTI(JL, | ||
+ | LLHISSR(JL)=.TRUE. | ||
+ | PSSRFLTI(JL, | ||
+ | ENDIF | ||
+ | </ | ||
+ | |||
+ | |||
+ | The second is getting the conditional IF out of the loop and make two independent loops instead: | ||
+ | |||
+ | < | ||
+ | DO JTILE=1, | ||
+ | DO JL=KIDIA, | ||
+ | IF (LLHISSR(JL)) THEN | ||
+ | PSSRFLTI(JL, | ||
+ | ENDIF | ||
+ | ENDDO | ||
+ | |||
+ | DO JL=KIDIA, | ||
+ | ZSRFD(JL)=PSSRFLTI(JL, | ||
+ | ENDDO | ||
+ | </ | ||
+ | |||
+ | We could see in the vectorization report that, being some of the other loops in the same function merged for optimization, | ||
+ | |||
+ | Both fixes work with both __-O2__ and __-O3__, so the matrix is no detected as singular in //ludcmp//. | ||
**More information: | **More information: | ||
+ | |||
+ | Intel® AVX-512 Instructions introduction: | ||
+ | |||
+ | [[https:// | ||
+ | |||
+ | Compiling for the Intel® Xeon Phi™ Processor and the Intel® Advanced Vector Extensions 512 ISA: | ||
+ | |||
+ | [[https:// | ||
+ | |||
+ | Quick Reference Guide to Optimization with Intel® C++ and Fortran Compilers v16: | ||
+ | |||
+ | [[https:// | ||
+ | |||
+ | Vectorization and Optimization Reports: | ||
+ | |||
+ | [[https:// | ||
+ | |||
+ | Generating a Vectorization Report: | ||
+ | |||
+ | [[https:// | ||
+ | |||
+ | General Compiler Directives: | ||
+ | |||
+ | [[https:// | ||
+ |