User Tools

Site Tools


library:computing:ifs_impi_troubles

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
library:computing:ifs_impi_troubles [2017/08/11 11:30]
mcastril created
library:computing:ifs_impi_troubles [2017/08/24 13:56] (current)
mcastril
Line 2: Line 2:
  
 ===== IFS @ MN4 ===== ===== IFS @ MN4 =====
 +
 +==== Issue 1: IFS memory corrupted when activating AXV512 ====
  
 **Environment:** IFS 36r4. This bug was observed using the following compilers and MPI libraries: **Environment:** IFS 36r4. This bug was observed using the following compilers and MPI libraries:
Line 7: Line 9:
   * Intel 2017.4 & Intel MPI 2017.4   * Intel 2017.4 & Intel MPI 2017.4
  
-The problem was reported when using -O2 & -O3 optimization flags in conjunction with activation of [[http://example.com|AVX-512]], aka 512-bit SIMD or which is the same, 512-bit vector extensions.+The problem was reported when using -O2 & -O3 optimization flags in conjunction with activation of [[http://example.com|AVX-512]], aka 512-bit SIMD or which is the same, 512-bit vector extensions. The code was free of this issue when using AVX or AVX2.
  
-**Problem:** Running IFS with O3 (the highest optimization level) and activating AVX-512 makes the model exit due to a singular matrix (a matrix that does not have an inverse) that cannot be used to solve the system of equations in the physics part.+**Problem:** Running IFS with __-O3__ (the highest optimization level) and -xCORE-AVX512, to enable SIMD 512 instructions, makes the model trigger an exit due to a check on a matrix that detects it as a singular matrix (a matrix that does not have an inverse) that therefore cannot be used to solve system of equations needed in physics solver.
  
 +<code>
 + ABORT!  381 LUDCMP: Singular matrix
 +MPL_ABORT: CALLED FROM PROCESSOR    381 THRD     1
 + MPL_ABORT: THRD             LUDCMP: Singular matrix
 + SDL_TRACEBACK: Calling INTEL_TRBK, THRD =            1
 +Calling traceback from intel_trbk()
 +Image              PC                Routine            Line        Source
 +ifsmaster-ecconf   0000000003C806AD  Unknown               Unknown  Unknown
 +ifsmaster-ecconf   000000000396934D  intel_trbk_                10  gentrbk.F90
 +ifsmaster-ecconf   0000000003939117  sdl_mod_mp_sdl_tr          66  sdl_mod.F90
 +ifsmaster-ecconf   000000000390DA77  mpl_abort_mod_mp_          35  mpl_abort_mod.F90
 +ifsmaster-ecconf   000000000393BDBA  abor1_                     31  abor1.F90
 +ifsmaster-ecconf   0000000002D50CF6  ludcmp_                    63  ludcmp.F90
 +ifsmaster-ecconf   0000000002BE1466  cloudsc_                 2703  cloudsc.F90
 +ifsmaster-ecconf   0000000002AFACB7  callpar_                 3449  callpar.F90
 +ifsmaster-ecconf   000000000272334A  ec_phys_                  795  ec_phys.F90
 +ifsmaster-ecconf   0000000001631661  ec_phys_drv_              403  ec_phys_drv.F90
 +ifsmaster-ecconf   00000000015F20F5  gp_model_                 553  gp_model.F90
 +ifsmaster-ecconf   0000000000A8D351  scan2m_                   548  scan2m.F90
 +ifsmaster-ecconf   0000000000A89E6B  scan2h_                   107  scan2h.F90
 +ifsmaster-ecconf   0000000000450D4A  stepo_                    373  stepo.F90
 +ifsmaster-ecconf   00000000004168D5  cnt4_                    1087  cnt4.F90
 +ifsmaster-ecconf   000000000041328C  cnt3_                     324  cnt3.F90
 +ifsmaster-ecconf   00000000004113A4  cnt2_                      76  cnt2.F90
 +ifsmaster-ecconf   00000000004111D1  cnt1_                     116  cnt1.F90
 +ifsmaster-ecconf   0000000000410C94  cnt0_                     154  cnt0.F90
 +ifsmaster-ecconf   00000000004102B8  MAIN__                     33  master.F90
 +ifsmaster-ecconf   000000000041021E  Unknown               Unknown  Unknown
 +libc-2.22.so       00002B9828CB56E5  __libc_start_main     Unknown  Unknown
 +ifsmaster-ecconf   0000000000410129  Unknown               Unknown  Unknown
 +</code>
  
 +(The error was happening at different steps or in different processes, depending on the run)
  
-**Actions taken:**+However, if the model is run using __-O2__ (and also -xCORE-AVX512) in the compilation, the error is different, and it is triggered in a part of the program that is executed before the one failing when the -O3 flag is used.
  
-**Diagnosis:**+<code> 
 +forrtlsevere (154): array index out of bounds 
 +Image              PC                Routine            Line        Source 
 +ifsmaster-ecconf   0000000003C88D29  Unknown               Unknown  Unknown 
 +libpthread-2.22.s  00002AC171F19B10  Unknown               Unknown  Unknown 
 +ifsmaster-ecconf   00000000036FBD3D  surfexcdriver_ctl         479  surfexcdriver_ctl_mod.F90 
 +ifsmaster-ecconf   00000000036E2096  surfexcdriver_            663  surfexcdriver.F90 
 +ifsmaster-ecconf   000000000338CEA2  vdfmain_                  608  vdfmain.F90 
 +ifsmaster-ecconf   0000000002DE21C4  vdfouter_                 618  vdfouter.F90 
 +ifsmaster-ecconf   0000000002AF0ABC  callpar_                 2526  callpar.F90 
 +ifsmaster-ecconf   000000000272334A  ec_phys_                  795  ec_phys.F90 
 +ifsmaster-ecconf   0000000001631661  ec_phys_drv_              403  ec_phys_drv.F90 
 +ifsmaster-ecconf   00000000015F20F5  gp_model_                 553  gp_model.F90 
 +ifsmaster-ecconf   0000000000A8D351  scan2m_                   548  scan2m.F90 
 +ifsmaster-ecconf   0000000000A89E6B  scan2h_                   107  scan2h.F90 
 +ifsmaster-ecconf   0000000000450D4A  stepo_                    373  stepo.F90 
 +ifsmaster-ecconf   00000000004168D5  cnt4_                    1087  cnt4.F90 
 +ifsmaster-ecconf   000000000041328C  cnt3_                     324  cnt3.F90 
 +ifsmaster-ecconf   00000000004113A4  cnt2_                      76  cnt2.F90 
 +ifsmaster-ecconf   00000000004111D1  cnt1_                     116  cnt1.F90 
 +ifsmaster-ecconf   0000000000410C94  cnt0_                     154  cnt0.F90 
 +ifsmaster-ecconf   00000000004102B8  MAIN__                     33  master.F90 
 +ifsmaster-ecconf   000000000041021E  Unknown               Unknown  Unknown 
 +libc-2.22.so       00002AC1724436E5  __libc_start_main     Unknown  Unknown 
 +ifsmaster-ecconf   0000000000410129  Unknown               Unknown  Unknown 
 +</code>
  
-**Solution:** +**Actions taken:** We first debugged the regions of the code referred in the error trace by using Allinea DDT and, though it is difficult to debug using -O2 or -O3 (because variables are optimized and its value hidden, and code lines does not always correspond to the ones actually being executed), we could see where the problem was coming from. 
 + 
 +The part of the code that was exiting in the -O3 mode was in the //ludcmp// routine: 
 + 
 +<code> 
 +  DO JL=KIDIA,KFDIA 
 +    IF (ZAAMAX(JL) <= 0.0_JPRB) THEN 
 +      CALL ABOR1('LUDCMP: Singular matrix'
 +    ENDIF ! SINGULAR MATRIX  
 +    ZVV(JL,I) = 1.0_JPRB/ZAAMAX(JL) !SAVE THE SCALING.  
 +  ENDDO 
 +</code> 
 + 
 +Basically it is checking all the maximum values for each given row of the matrix (stored in ZAAMAX), and if any maximum is equal or smaller than 0, it declares the matrix as singular, because it would contain a zero-vector, so the code triggers an //Abort//
 + 
 +The part of the code that crashes in the O2 mode is in //surfexcdriver_ctl_mod// module: 
 + 
 +<code> 
 + DO JTILE=1,KTILES 
 +  DO JL=KIDIA,KFDIA 
 +    IF (LLHISSR(JL)) THEN 
 +      PSSRFLTI(JL,JTILE)=PSSRFLTI(JL,JTILE)*PSSRFL(JL)/ZSSRFL1(JL) 
 +    ENDIF 
 +    ZSRFD(JL)=PSSRFLTI(JL,JTILE)/(1.0_JPRB-PALBTI(JL,JTILE)) 
 +  ENDDO 
 + 
 +</code> 
 + 
 +Here the code is assigning a weigthed value to PSSRFLTI, proportional to its contribution to the ZSSRFL1 variable in a previous calculation. LLHISSR(JL) stores the indexes of PSSRFLTI values that in a previous loop were bigger than 700 and so were assigned a 700 value. 
 + 
 +The fact that using -O2 provoked an "array index out of bounds" error and the fact that the error in -O3 mode was in a subsequent routine, made us think that this array access error could be guilty of messing with the matrix values and filling it with zeroes so it became singular. Therefore, we focused first in the //-O2 problem//
 + 
 +In order to have more information from the vectorization applied by the compiler in this situation, we generated //optimization and vectorization reports//, and indeed the loop was being vectorized, but there was not much difference between the -O2 and the -O3 case. We also generated an output file with the //assembly code// for both routines, and beside there were differences between -O2 and -O3, the structure was similar and the conditional was not ignored. 
 + 
 +**Diagnosis:** Aside of the validity of the scientific algorithm, the fact is that using conditionals inside loop structures is a bad practice. So it is likely than at the moment of the vectorization, there is a bug in the Intel compiler that is not able to deal correctly with this conditional. A proof of the validity of this hypothesis is that the compiler does not automatically merge the two resulting loops when we splitted them (see the solution for more detail on this).  
 + 
 +**Solution:** We developed two working solutions for this issue. Both of them rely on small modifications in the IFS source. Obviously the best solution would be that Intel fixes their compiler, but our approach can work in the meantime. 
 + 
 +The first fix is to use [[https://software.intel.com/en-us/node/692388|compiler directives]] to avoid the vectorization of the loop in //surfexcdriver_ctl_mod//: 
 + 
 +<code> 
 +!DIR$ NOVECTOR 
 +DO JTILE=1,KTILES 
 +  !DIR$ NOVECTOR 
 +  DO JL=KIDIA,KFDIA 
 +! Disaggregate solar flux but limit to 700 W/m2 (due to inconsistency 
 +!  with albedo) 
 +    PSSRFLTI(JL,JTILE)=((1.0_JPRB-PALBTI(JL,JTILE))/& 
 +   & (1.0_JPRB-ZALB(JL)))*PSSRFL(JL) 
 +    IF (PSSRFLTI(JL,JTILE) > 700._JPRB) THEN 
 +      LLHISSR(JL)=.TRUE. 
 +      PSSRFLTI(JL,JTILE)=700._JPRB 
 +    ENDIF 
 +</code> 
 + 
 + 
 +The second is getting the conditional IF out of the loop and make two independent loops instead: 
 + 
 +<code> 
 +DO JTILE=1,KTILES 
 +  DO JL=KIDIA,KFDIA 
 +    IF (LLHISSR(JL)) THEN 
 +      PSSRFLTI(JL,JTILE)=PSSRFLTI(JL,JTILE)*PSSRFL(JL)/ZSSRFL1(JL) 
 +    ENDIF 
 +  ENDDO 
 + 
 +  DO JL=KIDIA,KFDIA 
 +    ZSRFD(JL)=PSSRFLTI(JL,JTILE)/(1.0_JPRB-PALBTI(JL,JTILE)) 
 +  ENDDO 
 +</code> 
 + 
 +We could see in the vectorization report that, being some of the other loops in the same function merged for optimization, this was not merged again. This can happen because the compiler thinks it is risky or sub-optimal to do so. 
 + 
 +Both fixes work with both __-O2__ and __-O3__, so the matrix is no detected as singular in //ludcmp//.
  
 **More information:** **More information:**
 +
 +Intel® AVX-512 Instructions introduction:
 +
 +[[https://software.intel.com/en-us/blogs/2013/avx-512-instructions|https://software.intel.com/en-us/blogs/2013/avx-512-instructions]]
 +
 +Compiling for the Intel® Xeon Phi™ Processor and the Intel® Advanced Vector Extensions 512 ISA:  
 +
 +[[https://software.intel.com/en-us/articles/compiling-for-the-intel-xeon-phi-processor-and-the-intel-avx-512-isa|https://software.intel.com/en-us/articles/compiling-for-the-intel-xeon-phi-processor-and-the-intel-avx-512-isa]]
 +
 +Quick Reference Guide to Optimization with Intel® C++ and Fortran Compilers v16:  
 +
 +[[https://software.intel.com/sites/default/files/managed/12/f1/Quick-Reference-Card-Intel-Compilers-v16.pdf|https://software.intel.com/sites/default/files/managed/12/f1/Quick-Reference-Card-Intel-Compilers-v16.pdf]]
 +
 +Vectorization and Optimization Reports:  
 +
 +[[https://software.intel.com/en-us/articles/vectorization-and-optimization-reports|https://software.intel.com/en-us/articles/vectorization-and-optimization-reports]]
 +
 +Generating a Vectorization Report: 
 +
 +[[https://software.intel.com/en-us/node/590464|https://software.intel.com/en-us/node/590464]]
 +
 +General Compiler Directives:  
 +
 +[[https://software.intel.com/en-us/node/692388|https://software.intel.com/en-us/node/692388]]
 +
library/computing/ifs_impi_troubles.1502451028.txt.gz · Last modified: 2017/08/11 11:30 by mcastril