User Tools

Site Tools


library:computing:ifs_impi_troubles

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
library:computing:ifs_impi_troubles [2017/08/11 14:45]
mcastril [IFS @ MN4]
library:computing:ifs_impi_troubles [2017/08/24 13:56] (current)
mcastril
Line 2: Line 2:
  
 ===== IFS @ MN4 ===== ===== IFS @ MN4 =====
 +
 +==== Issue 1: IFS memory corrupted when activating AXV512 ====
  
 **Environment:** IFS 36r4. This bug was observed using the following compilers and MPI libraries: **Environment:** IFS 36r4. This bug was observed using the following compilers and MPI libraries:
Line 9: Line 11:
 The problem was reported when using -O2 & -O3 optimization flags in conjunction with activation of [[http://example.com|AVX-512]], aka 512-bit SIMD or which is the same, 512-bit vector extensions. The code was free of this issue when using AVX or AVX2. The problem was reported when using -O2 & -O3 optimization flags in conjunction with activation of [[http://example.com|AVX-512]], aka 512-bit SIMD or which is the same, 512-bit vector extensions. The code was free of this issue when using AVX or AVX2.
  
-**Problem:** Running IFS with __-O3__ (the highest optimization level) and -xCORE-AVX512, to enable SIMD 512 instructions, makes the model exit due to a check on a matrix that declares it as a singular matrix (a matrix that does not have an inverse)that therefore cannot be used to solve the system of equations in the physics part.+**Problem:** Running IFS with __-O3__ (the highest optimization level) and -xCORE-AVX512, to enable SIMD 512 instructions, makes the model trigger an exit due to a check on a matrix that detects it as a singular matrix (a matrix that does not have an inverse) that therefore cannot be used to solve system of equations needed in physics solver.
  
 <code> <code>
Line 43: Line 45:
 </code> </code>
  
-The error was happening at different steps or processes depending on the run.+(The error was happening at different steps or in different processesdepending on the run)
  
-However, if the model is run using __-O2__ (and also -xCORE-AVX512) in the compilation, the error is different, and it is triggered in a part of the code that is interpreted before the one occurring when the -O3 flag was used.+However, if the model is run using __-O2__ (and also -xCORE-AVX512) in the compilation, the error is different, and it is triggered in a part of the program that is executed before the one failing when the -O3 flag is used.
  
 <code> <code>
Line 74: Line 76:
 </code> </code>
  
-**Actions taken:** We first debugged the regions of the code referred in the error trace using Allinea DDT and though it is difficult to debug using -O2 or -O3 (because variables are optimized and its value hidden, and code lines does not always correspond to the ones actually being executed), we could see where the problem could come from.+**Actions taken:** We first debugged the regions of the code referred in the error trace by using Allinea DDT andthough it is difficult to debug using -O2 or -O3 (because variables are optimized and its value hidden, and code lines does not always correspond to the ones actually being executed), we could see where the problem was coming from.
  
-The part of the code that was exiting in the -O3 mode is in the //ludcmp// routine:+The part of the code that was exiting in the -O3 mode was in the //ludcmp// routine:
  
 <code> <code>
Line 87: Line 89:
 </code> </code>
  
-Basically it is checking if the maximum value for each row of a given matrix (stored in ZAAMAX) is bigger than 0, otherwise it declares the matrix as singular, because it contains a zero-vector, and exits.+Basically it is checking all the maximum values for each given row of the matrix (stored in ZAAMAX), and if any maximum is equal or smaller than 0, it declares the matrix as singular, because it would contain a zero-vector, so the code triggers an //Abort//.
  
 The part of the code that crashes in the O2 mode is in //surfexcdriver_ctl_mod// module: The part of the code that crashes in the O2 mode is in //surfexcdriver_ctl_mod// module:
Line 102: Line 104:
 </code> </code>
  
-Here the code is assigning a weigthed value to PSSRFLTI, from its contribution to the ZSSRFL1 variable in a previous calculation. LLHISSR(JL) stores the indexes of PSSRFLTI values that were bigger than 700 and so were assigned the value of 700.+Here the code is assigning a weigthed value to PSSRFLTI, proportional to its contribution to the ZSSRFL1 variable in a previous calculation. LLHISSR(JL) stores the indexes of PSSRFLTI values that in a previous loop were bigger than 700 and so were assigned a 700 value.
  
-The fact that using O2 provoked an "array index out of bounds" error and that the error in O3 mode was in a subsequent routine, made as think that this array access error could be guilty of messing with the matrix values and filling it with zeroes so it became singular. Therefore, we focused first in the O2 problem.+The fact that using -O2 provoked an "array index out of bounds" error and the fact that the error in -O3 mode was in a subsequent routine, made us think that this array access error could be guilty of messing with the matrix values and filling it with zeroes so it became singular. Therefore, we focused first in the //-O2 problem//.
  
-In order to have more information of the behavior of the vectorization applied by the compiler in this situation we generated optimization and vectorization reports, and indeed the loop was being vectorized, but there was not much difference between the O2 and O3 case. We also generated an output file with the assembly code for this routines, and beside there were differences between O2 and O3 the structure was similar and the conditional was not ignored.+In order to have more information from the vectorization applied by the compiler in this situationwe generated //optimization and vectorization reports//, and indeed the loop was being vectorized, but there was not much difference between the -O2 and the -O3 case. We also generated an output file with the //assembly code// for both routines, and beside there were differences between -O2 and -O3the structure was similar and the conditional was not ignored.
  
-**Diagnosis:** Apart from the validity of the scientific algorithm, the fact is that using conditionals inside loop is a bad practice. So it is likely than at the moment of the vectorization, there is a bug in the Intel compiler that is not dealing with this conditional in the best way. A proof of the validity of this hypothesis is that the compiler does not automatically merge the two resulting loops when we splitted them (see the solution for more detail on this). +**Diagnosis:** Aside of the validity of the scientific algorithm, the fact is that using conditionals inside loop structures is a bad practice. So it is likely than at the moment of the vectorization, there is a bug in the Intel compiler that is not able to deal correctly with this conditional. A proof of the validity of this hypothesis is that the compiler does not automatically merge the two resulting loops when we splitted them (see the solution for more detail on this). 
  
-**Solution:** We developed two working solutions from this issue. Both of them depend on modifications of the IFS source. The best solution would be that Intel fixes their compiler, but our approach can work in the meantime.+**Solution:** We developed two working solutions for this issue. Both of them rely on small modifications in the IFS source. Obviously the best solution would be that Intel fixes their compiler, but our approach can work in the meantime.
  
 The first fix is to use [[https://software.intel.com/en-us/node/692388|compiler directives]] to avoid the vectorization of the loop in //surfexcdriver_ctl_mod//: The first fix is to use [[https://software.intel.com/en-us/node/692388|compiler directives]] to avoid the vectorization of the loop in //surfexcdriver_ctl_mod//:
Line 130: Line 132:
  
  
-The second one is to get the conditional IF out of the loop and make two independent loops instead:+The second is getting the conditional IF out of the loop and make two independent loops instead:
  
 <code> <code>
Line 145: Line 147:
 </code> </code>
  
 +We could see in the vectorization report that, being some of the other loops in the same function merged for optimization, this was not merged again. This can happen because the compiler thinks it is risky or sub-optimal to do so.
  
-Both fixes work with -O2 as also with -O3, so the matrix is no detected as singular in //ludcmp//.+Both fixes work with both __-O2__ and __-O3__, so the matrix is no detected as singular in //ludcmp//.
  
 **More information:** **More information:**
library/computing/ifs_impi_troubles.1502462742.txt.gz ยท Last modified: 2017/08/11 14:45 by mcastril