This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
library:computing:xios_impi_troubles [2017/08/04 13:32] 84.88.184.232 [Issue 1: NEMO fails to read input files] |
library:computing:xios_impi_troubles [2024/05/20 12:58] 84.88.52.107 old revision restored (2017/08/04 14:30) |
||
---|---|---|---|
Line 202: | Line 202: | ||
**Solution: | **Solution: | ||
- | ==== Issue 3: ==== | + | ==== Issue 3: MPI kills XIOS when writing model output |
**Environment: | **Environment: | ||
Line 255: | Line 255: | ||
- | **Actions taken:** A similar error was observed with NEMO standalone v3.6r6499. In that case, operations | + | **Actions taken:** A similar error was observed with NEMO standalone v3.6r6499. In that case, Ops told us to use the //fabric// module, which selects //ofi// as internode fabrics, similarly to the solution used in MN3 (see above). Using this module solved the problem for NEMO standalone, although it had the collateral effect that jobs were never ending. In coupled EC-Earth this module produced a dead lock, commented below. |
- | **Diagnosis: | + | We tried an alternative solution, which was to increment the number of XIOS servers in order to reduce the number of messages sent to the same process and by the moment it seems that it is effective. |
- | **Solution: | + | **Diagnosis: |
+ | |||
+ | **Solution: | ||
About Intel Communication Fabrics control: | About Intel Communication Fabrics control: | ||
[[https:// | [[https:// | ||
+ | |||
+ | Ips_proto.c source code: | ||
+ | |||
+ | [[https:// | ||
==== Issue 4: ==== | ==== Issue 4: ==== | ||