This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
library:computing:ifs_impi_troubles [2017/08/11 14:45] mcastril [IFS @ MN4] |
library:computing:ifs_impi_troubles [2017/08/24 13:56] (current) mcastril |
||
---|---|---|---|
Line 2: | Line 2: | ||
===== IFS @ MN4 ===== | ===== IFS @ MN4 ===== | ||
+ | |||
+ | ==== Issue 1: IFS memory corrupted when activating AXV512 ==== | ||
**Environment: | **Environment: | ||
Line 9: | Line 11: | ||
The problem was reported when using -O2 & -O3 optimization flags in conjunction with activation of [[http:// | The problem was reported when using -O2 & -O3 optimization flags in conjunction with activation of [[http:// | ||
- | **Problem: | + | **Problem: |
< | < | ||
Line 43: | Line 45: | ||
</ | </ | ||
- | The error was happening at different steps or processes depending on the run. | + | (The error was happening at different steps or in different |
- | However, if the model is run using __-O2__ (and also -xCORE-AVX512) in the compilation, | + | However, if the model is run using __-O2__ (and also -xCORE-AVX512) in the compilation, |
< | < | ||
Line 74: | Line 76: | ||
</ | </ | ||
- | **Actions taken:** We first debugged the regions of the code referred in the error trace using Allinea DDT and though it is difficult to debug using -O2 or -O3 (because variables are optimized and its value hidden, and code lines does not always correspond to the ones actually being executed), we could see where the problem | + | **Actions taken:** We first debugged the regions of the code referred in the error trace by using Allinea DDT and, though it is difficult to debug using -O2 or -O3 (because variables are optimized and its value hidden, and code lines does not always correspond to the ones actually being executed), we could see where the problem |
- | The part of the code that was exiting in the -O3 mode is in the //ludcmp// routine: | + | The part of the code that was exiting in the -O3 mode was in the //ludcmp// routine: |
< | < | ||
Line 87: | Line 89: | ||
</ | </ | ||
- | Basically it is checking | + | Basically it is checking |
The part of the code that crashes in the O2 mode is in // | The part of the code that crashes in the O2 mode is in // | ||
Line 102: | Line 104: | ||
</ | </ | ||
- | Here the code is assigning a weigthed value to PSSRFLTI, | + | Here the code is assigning a weigthed value to PSSRFLTI, |
- | The fact that using O2 provoked an "array index out of bounds" | + | The fact that using -O2 provoked an "array index out of bounds" |
- | In order to have more information | + | In order to have more information |
- | **Diagnosis: | + | **Diagnosis: |
- | **Solution: | + | **Solution: |
The first fix is to use [[https:// | The first fix is to use [[https:// | ||
Line 130: | Line 132: | ||
- | The second | + | The second is getting |
< | < | ||
Line 145: | Line 147: | ||
</ | </ | ||
+ | We could see in the vectorization report that, being some of the other loops in the same function merged for optimization, | ||
- | Both fixes work with -O2 as also with -O3, so the matrix is no detected as singular in //ludcmp//. | + | Both fixes work with both __-O2__ and __-O3__, so the matrix is no detected as singular in //ludcmp//. |
**More information: | **More information: |