This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
|
library:computing:ifs_impi_troubles [2017/08/11 14:45] mcastril [IFS @ MN4] |
library:computing:ifs_impi_troubles [2017/08/24 13:56] (current) mcastril |
||
|---|---|---|---|
| Line 2: | Line 2: | ||
| ===== IFS @ MN4 ===== | ===== IFS @ MN4 ===== | ||
| + | |||
| + | ==== Issue 1: IFS memory corrupted when activating AXV512 ==== | ||
| **Environment: | **Environment: | ||
| Line 9: | Line 11: | ||
| The problem was reported when using -O2 & -O3 optimization flags in conjunction with activation of [[http:// | The problem was reported when using -O2 & -O3 optimization flags in conjunction with activation of [[http:// | ||
| - | **Problem: | + | **Problem: |
| < | < | ||
| Line 43: | Line 45: | ||
| </ | </ | ||
| - | The error was happening at different steps or processes depending on the run. | + | (The error was happening at different steps or in different |
| - | However, if the model is run using __-O2__ (and also -xCORE-AVX512) in the compilation, | + | However, if the model is run using __-O2__ (and also -xCORE-AVX512) in the compilation, |
| < | < | ||
| Line 74: | Line 76: | ||
| </ | </ | ||
| - | **Actions taken:** We first debugged the regions of the code referred in the error trace using Allinea DDT and though it is difficult to debug using -O2 or -O3 (because variables are optimized and its value hidden, and code lines does not always correspond to the ones actually being executed), we could see where the problem | + | **Actions taken:** We first debugged the regions of the code referred in the error trace by using Allinea DDT and, though it is difficult to debug using -O2 or -O3 (because variables are optimized and its value hidden, and code lines does not always correspond to the ones actually being executed), we could see where the problem |
| - | The part of the code that was exiting in the -O3 mode is in the //ludcmp// routine: | + | The part of the code that was exiting in the -O3 mode was in the //ludcmp// routine: |
| < | < | ||
| Line 87: | Line 89: | ||
| </ | </ | ||
| - | Basically it is checking | + | Basically it is checking |
| The part of the code that crashes in the O2 mode is in // | The part of the code that crashes in the O2 mode is in // | ||
| Line 102: | Line 104: | ||
| </ | </ | ||
| - | Here the code is assigning a weigthed value to PSSRFLTI, | + | Here the code is assigning a weigthed value to PSSRFLTI, |
| - | The fact that using O2 provoked an "array index out of bounds" | + | The fact that using -O2 provoked an "array index out of bounds" |
| - | In order to have more information | + | In order to have more information |
| - | **Diagnosis: | + | **Diagnosis: |
| - | **Solution: | + | **Solution: |
| The first fix is to use [[https:// | The first fix is to use [[https:// | ||
| Line 130: | Line 132: | ||
| - | The second | + | The second is getting |
| < | < | ||
| Line 145: | Line 147: | ||
| </ | </ | ||
| + | We could see in the vectorization report that, being some of the other loops in the same function merged for optimization, | ||
| - | Both fixes work with -O2 as also with -O3, so the matrix is no detected as singular in //ludcmp//. | + | Both fixes work with both __-O2__ and __-O3__, so the matrix is no detected as singular in //ludcmp//. |
| **More information: | **More information: | ||