User Tools

Site Tools


library:computing:xios_impi_troubles

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
library:computing:xios_impi_troubles [2017/08/03 12:45]
84.88.184.232 [Issue 3:]
library:computing:xios_impi_troubles [2024/06/18 11:15] (current)
84.88.52.107 old revision restored (2024/06/01 21:17)
Line 70: Line 70:
 ===== NEMO-XIOS @ MN4 ===== ===== NEMO-XIOS @ MN4 =====
  
-==== Issue 1: ====+==== Issue 1: NEMO fails to read input files ====
  
 **Environment:** Auto-EC-Earth 3.2.2_develop_MN4 (EC-Earth 3.2 r4063-runtime-unification).  **Environment:** Auto-EC-Earth 3.2.2_develop_MN4 (EC-Earth 3.2 r4063-runtime-unification). 
Line 126: Line 126:
 </code> </code>
  
-**Actions taken:** Operations had observed this error using NEMO standalone and NeTCDF 4.4.4.1, so they installed NetCDF 4.4.0 version. We could not reproduce the failure they reported with NEMO and NetCDF 4.4.4.1 but we got the same error when running EC-Earth. So we also moved to NetCDF 4.4.0. However, we got other XIOS errors when writing outputs (commented in following issues) and we asked for the same version we were using at MN3 to be installed. When operations installed 4.2 version we surprisingly got again the same error.+**Actions taken:** Operations had observed this error using NEMO standalone and NetCDF 4.4.4.1, so they installed NetCDF 4.4.0 version. We could not reproduce the failure they reported with NEMO and NetCDF 4.4.4.1 but we got the same error when running EC-Earth. So we also moved to NetCDF 4.4.0 and this error was not arising. However, we got other XIOS errors when writing outputs (commented in following issues) and we asked for the same version we were using at MN3 to be installed. When operations installed 4.2 version we surprisingly got again the same error.
  
-**Diagnosis:**+After looking for differences between NetCDF 4.4.0 and NetCDF 4.2 configurations (using nc-config & nf-config commands), we found out that while NetCDF 4.4.0 was compiled with no support for nc4 nor P-NetCDF (a library used that gives parallel I/O support for classic NetCDF files), while NetCDF was supporting this features. Then operations compiled again NetCDF without linking P-NetCDF, and this error disappeared.
  
-**Solution:**+In order to know more about the source of this bug, we compared the behavior of two NEMO executables, compiled with NetCDF with and without P-NetCDF support. Both executions were linked with NetCDF without P-NetCDF support at runtime. The result is that the NEMO compiled with P-NetCDF did not run, so something was wrong at the NEMO binary itself. 
 + 
 +We did a comparison of the functions included in both binaries through the nm command, and we found that they were identical. Then we did a more in deep comparison of both binaries with objdump and we found out little differences, but some of that differences were pointing to a XIOS header file called netcdf.hpp. This header is responsible to include some NetCDF function definitions, and its behavior depends on the environment (preprocessing macros). In order to know if this file is the responsible of the bug we would have to compile NetCDF ourselves in debugging mode (with -g flag). 
 + 
 +**Diagnosis:** What we know until now is that compiling NetCDF with P-NetCDF messes up the nc_open function so it cannot be used by NEMO. We are not sure if this problem is produced by NetCDF alone, or if netcdf.hpp header files included with XIOS 2.0 are doing the mess. As stated above, compiling NetCDF ourselves would be needed to know more about the problem. 
 + 
 +**Solution:** Until we have more information the best solution is to use a NetCDF version that does not have P-NetCDF support. In any case XIOS uses NC4, which is using HDF5 for parallel write.
  
 **More information:** **More information:**
library/computing/xios_impi_troubles.1501764301.txt.gz · Last modified: 2017/08/03 12:45 by 84.88.184.232