This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
library:computing:ifs_impi_troubles [2017/08/11 11:30] mcastril created |
library:computing:ifs_impi_troubles [2017/08/24 13:56] (current) mcastril |
||
---|---|---|---|
Line 2: | Line 2: | ||
===== IFS @ MN4 ===== | ===== IFS @ MN4 ===== | ||
+ | |||
+ | ==== Issue 1: IFS memory corrupted when activating AXV512 ==== | ||
**Environment: | **Environment: | ||
Line 7: | Line 9: | ||
* Intel 2017.4 & Intel MPI 2017.4 | * Intel 2017.4 & Intel MPI 2017.4 | ||
- | The problem was reported when using -O2 & -O3 optimization flags in conjunction with activation of [[http:// | + | The problem was reported when using -O2 & -O3 optimization flags in conjunction with activation of [[http:// |
- | **Problem: | + | **Problem: |
+ | < | ||
+ | | ||
+ | MPL_ABORT: CALLED FROM PROCESSOR | ||
+ | | ||
+ | | ||
+ | Calling traceback from intel_trbk() | ||
+ | Image PC Routine | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | libc-2.22.so | ||
+ | ifsmaster-ecconf | ||
+ | </ | ||
+ | (The error was happening at different steps or in different processes, depending on the run) | ||
- | **Actions taken:** | + | However, if the model is run using __-O2__ (and also -xCORE-AVX512) in the compilation, |
- | **Diagnosis:** | + | < |
+ | forrtl: severe (154): array index out of bounds | ||
+ | Image PC Routine | ||
+ | ifsmaster-ecconf | ||
+ | libpthread-2.22.s | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | ifsmaster-ecconf | ||
+ | libc-2.22.so | ||
+ | ifsmaster-ecconf | ||
+ | </ | ||
- | **Solution: | + | **Actions taken:** We first debugged the regions of the code referred in the error trace by using Allinea DDT and, though it is difficult to debug using -O2 or -O3 (because variables are optimized and its value hidden, and code lines does not always correspond to the ones actually being executed), we could see where the problem was coming from. |
+ | |||
+ | The part of the code that was exiting in the -O3 mode was in the //ludcmp// routine: | ||
+ | |||
+ | < | ||
+ | DO JL=KIDIA, | ||
+ | IF (ZAAMAX(JL) <= 0.0_JPRB) THEN | ||
+ | CALL ABOR1(' | ||
+ | ENDIF ! SINGULAR MATRIX | ||
+ | ZVV(JL,I) = 1.0_JPRB/ | ||
+ | ENDDO | ||
+ | </ | ||
+ | |||
+ | Basically it is checking all the maximum values for each given row of the matrix (stored in ZAAMAX), and if any maximum is equal or smaller than 0, it declares the matrix as singular, because it would contain a zero-vector, | ||
+ | |||
+ | The part of the code that crashes in the O2 mode is in // | ||
+ | |||
+ | < | ||
+ | DO JTILE=1, | ||
+ | DO JL=KIDIA, | ||
+ | IF (LLHISSR(JL)) THEN | ||
+ | PSSRFLTI(JL, | ||
+ | ENDIF | ||
+ | ZSRFD(JL)=PSSRFLTI(JL, | ||
+ | ENDDO | ||
+ | |||
+ | </ | ||
+ | |||
+ | Here the code is assigning a weigthed value to PSSRFLTI, proportional to its contribution to the ZSSRFL1 variable in a previous calculation. LLHISSR(JL) stores the indexes of PSSRFLTI values that in a previous loop were bigger than 700 and so were assigned a 700 value. | ||
+ | |||
+ | The fact that using -O2 provoked an "array index out of bounds" | ||
+ | |||
+ | In order to have more information from the vectorization applied by the compiler in this situation, we generated // | ||
+ | |||
+ | **Diagnosis: | ||
+ | |||
+ | **Solution: | ||
+ | |||
+ | The first fix is to use [[https:// | ||
+ | |||
+ | < | ||
+ | !DIR$ NOVECTOR | ||
+ | DO JTILE=1, | ||
+ | !DIR$ NOVECTOR | ||
+ | DO JL=KIDIA, | ||
+ | ! Disaggregate solar flux but limit to 700 W/m2 (due to inconsistency | ||
+ | ! with albedo) | ||
+ | PSSRFLTI(JL, | ||
+ | & (1.0_JPRB-ZALB(JL)))*PSSRFL(JL) | ||
+ | IF (PSSRFLTI(JL, | ||
+ | LLHISSR(JL)=.TRUE. | ||
+ | PSSRFLTI(JL, | ||
+ | ENDIF | ||
+ | </ | ||
+ | |||
+ | |||
+ | The second is getting the conditional IF out of the loop and make two independent loops instead: | ||
+ | |||
+ | < | ||
+ | DO JTILE=1, | ||
+ | DO JL=KIDIA, | ||
+ | IF (LLHISSR(JL)) THEN | ||
+ | PSSRFLTI(JL, | ||
+ | ENDIF | ||
+ | ENDDO | ||
+ | |||
+ | DO JL=KIDIA, | ||
+ | ZSRFD(JL)=PSSRFLTI(JL, | ||
+ | ENDDO | ||
+ | </ | ||
+ | |||
+ | We could see in the vectorization report that, being some of the other loops in the same function merged for optimization, | ||
+ | |||
+ | Both fixes work with both __-O2__ and __-O3__, so the matrix is no detected as singular in //ludcmp//. | ||
**More information: | **More information: | ||
+ | |||
+ | Intel® AVX-512 Instructions introduction: | ||
+ | |||
+ | [[https:// | ||
+ | |||
+ | Compiling for the Intel® Xeon Phi™ Processor and the Intel® Advanced Vector Extensions 512 ISA: | ||
+ | |||
+ | [[https:// | ||
+ | |||
+ | Quick Reference Guide to Optimization with Intel® C++ and Fortran Compilers v16: | ||
+ | |||
+ | [[https:// | ||
+ | |||
+ | Vectorization and Optimization Reports: | ||
+ | |||
+ | [[https:// | ||
+ | |||
+ | Generating a Vectorization Report: | ||
+ | |||
+ | [[https:// | ||
+ | |||
+ | General Compiler Directives: | ||
+ | |||
+ | [[https:// | ||
+ |