XIOS I/O server with Intel MPI

XIOS @ MN3

Environment: NEMO 3.6 stable, XIOS 1.0. This bug was documented using the following compilers and MPI libraries:

Intel 13.0.1 & Intel MPI 4.1.3.049
Intel 16.0.1 & Intel MPI 5.1.2.150

The problem was reported when using default optimization flags as well as using -O3 optimization flag.

Problem: When using more than 1.920 MPI processes (120 MN3 nodes), during the XIOS initialization, the simulation was falling into a dead lock:

Some of the NEMO clients remain stuck in client.cpp doing an MPI send:

MPI_Send(buff,buffer.count(),MPI_CHAR,serverLeader,1,CXios::globalComm) ;

XIOS master server (first XIOS process), remains in CServer::listenContext(void) routine at server.cpp, trying to dispatch all the messages:

MPI_Iprobe(MPI_ANY_SOURCE,1,CXios::globalComm, &flag, &status) ;

Actions taken: Prints were placed into the code, before and after the mentioned call. It could be seen that some NEMO processes were waiting in the MPI barrier (synchronous send), while XIOS master server was looping to infinite, trying to get all the messages (the total number of messages should be equal to the number of clients, or NEMO processes).

The error could be reproduced in a small code, in order to facilitate the debug and the error fixing.

Diagnosis: It seemed that some messages sent from the clients to the master server were lost, maybe because all of these messages were sent from all the nodes at the same time.

Solution: Our first workaround was to include a call to the sleep function before de MPI_Send in the clients' code, to interleave the outcome messages and avoid flooding buffers and network. This obviously is not the cleanest solution, because it introduces a total delay of 15 secs. in the execution, but it is an affordable approach, given that this code is only executed in the initialization.

char hostName[50];
gethostname(hostName, 50);

// Sleep Fix
sleep(rank%16);

MPI_Comm_create_errhandler(eh, &newerr);
MPI_Comm_set_errhandler(CXios::globalComm, newerr );

error_code = MPI_Send(buff,buffer.count(),MPI_CHAR,serverLeader,1,CXios::globalComm) ;

delete [] buff ;

BSC operations provided another solution: enabling the User Datagram protocol by using Intel's MPI environment variables (more information below). This alternative works and doesn't need any code modification, but it entails a penalty in performance: we observed that simulations using this option were increasingly slower (5%-20%) as the number of cores was aumented, in comparison with the reference ones.

I_MPI_DAPL_UD=on

More information:

This bug was reported in the XIOS portal:

http://forge.ipsl.jussieu.fr/ioserver/ticket/90

About Intel Communication Fabrics control:

https://software.intel.com/en-us/node/528821

DAPL UD-capable Network Fabrics Control:

https://software.intel.com/en-us/node/528824

XIOS @ MN4

Issue 1:

Environment: Auto-EC-Earth 3.2.2_develop_MN4 (EC-Earth 3.2 r4063-runtime-unification).

Compiler: Intel 2017.4
MPI: Intel 2017.3.196
NetCDF: 4.4.4.1 & 4.2
HDF5: 1.8.19
Flags: -O3 & -O0

Problem: NEMO crashes in the initialization when reading input files:

ocean.output:

 ===>>> : E R R O R
         ===========

 iom_nf90_check : NetCDF: Invalid argument
                     iom_nf90_open ~~~
                     iom_nf90_open ~~~ open existing file: ./weights_WOA13d1_2_o
 rca1_bilinear.nc in READ mode

 ===>>> : E R R O R
         ===========

 iom_nf90_check : NetCDF: Invalid argument
                     iom_nf90_open ~~~

 ===>>> : E R R O R
         ===========

     fld_weight : unable to read the file
                                                                                                                                                                                                                                                             879,6       Final

log.err:

forrtl: severe (408): fort: (7): Attempt to use pointer FLY_DTA when it is not associated with a target

Image              PC                Routine            Line        Source
nemo.exe           00000000027994F6  Unknown               Unknown  Unknown
nemo.exe           0000000000B3F219  fldread_mp_fld_in        1375  fldread.f90
nemo.exe           0000000000B15D4E  fldread_mp_fld_ge         614  fldread.f90
nemo.exe           0000000000B13A6B  fldread_mp_fld_in         413  fldread.f90
nemo.exe           0000000000B0A69B  fldread_mp_fld_re         175  fldread.f90
nemo.exe           0000000000978301  dtatsd_mp_dta_tsd         224  dtatsd.f90
nemo.exe           0000000000C312DF  istate_mp_istate_         196  istate.f90
nemo.exe           000000000043C33F  nemogcm_mp_nemo_i         326  nemogcm.f90
nemo.exe           000000000043A64D  nemogcm_mp_nemo_g         120  nemogcm.f90
nemo.exe           000000000043A606  MAIN__                     18  nemo.f90
nemo.exe           000000000043A5DE  Unknown               Unknown  Unknown
libc-2.22.so       00002B64D88596E5  __libc_start_main     Unknown  Unknown
nemo.exe           000000000043A4E9  Unknown               Unknown  Unknown

Actions taken: Operations had observed this error using NEMO standalone and NeTCDF 4.4.4.1, so they installed NetCDF 4.4.0 version. We could not reproduce it with NEMO and NetCDF 4.4.4.1 but we got the same error when running EC-Earth. So we also moved to NetCDF 4.4.0. However, we got some other XIOS errors when writting outputs (commented in following issues). When operations installed 4.2 version we were seeing again the same error.

Diagnosis:

Solution:

More information:

This bug was reported in Unidata Github:

https://github.com/Unidata/netcdf4-python/issues/170

Issue 2:

Environment: Auto-EC-Earth 3.2.2_develop_MN4 (EC-Earth 3.2 r4063-runtime-unification).

Compiler: Intel 2017.4
MPI: Intel 2017.3.196
NetCDF: 4.4.0
HDF5: 1.8.19
Flags: -O3

Problem: XIOS2 breaks when writing output files.

ocean.output:

Actions taken:

Diagnosis:

Solution:

Issue 3:

Environment:

Problem:

Actions taken:

Diagnosis:

Solution:

Issue 4:

Environment:

Problem:

Actions taken:

Diagnosis:

Solution:

Sidebar

Table of Contents

XIOS I/O server with Intel MPI

XIOS @ MN3

XIOS @ MN4

Issue 1:

Issue 2:

Issue 3:

Issue 4:

User Tools

Site Tools

Sidebar

Table of Contents

XIOS I/O server with Intel MPI

XIOS @ MN3

XIOS @ MN4

Issue 1:

Issue 2:

Issue 3:

Issue 4:

Page Tools