Differences

This shows you the differences between two versions of the page.

--- reproducibility [2015/11/12 10:05]
fmassonn [2 November 2015]
+++ reproducibility [2015/12/11 12:42]
fmassonn
@@ Line 145: / Line 145: @@
 ===== Summary of monthly meetings =====
-==== 13 May 2015 ====
+===== 13 May 2015 =====
 === Agenda ===
@@ Line 180: / Line 180: @@
 '''Paco''': Send an e-mail to Uwe to clarify his idea of experiment for MPP reproducibility.
-==== 17 June 2015 ====
+===== 17 June 2015 =====
 === Points Discussed ===
@@ Line 217: / Line 217: @@
   * '''Asif and François''': Make new experiments, new repro restarts, recompile the code on MN and ECMWF with the same distribution and configuration of processors, possibly new keys (fp_precise), and less agressive optimization.
-==== 2 November 2015 ====
+===== 2 November 2015 =====
 === Points Discussed ===
@@ Line 228: / Line 228: @@
 **Asif and François** launch the reproducibility experiment from stabilized **i06c** on Ithaca and MareNostrum
-==== 12 November 2015 ====
+===== 9 November 2015 =====
+Stabilization of **i06c**. The plots {{stabilization_GMST.pdf|here}} and {{stabilization_ice.pdf|here}} show the stabilization of one of the member of i06c that was extended to achieve equilibrium. It was decided that the run was now in a sufficiently stable climate to perform the second stream of reproducibility experiments.
+===== 12 November 2015 =====
 There was a meeting involving François, Kim and Oriol, to discuss about the meaning and the use of compilation flags/options in these experiments.
-As a reminder: here's the current status. We don't have reproducibility (look at discussion above) but the setup was not 100% perfect. There were two problems: 1) Problems linked to the differences in domain decomposition (number and distribution of processors) and 2) Problems linked to the differences in versions of compilers, the use of aggressive optimization levels and the absence of certain keys like fp-model strict/precise.
+As a reminder: here's the current status. We don't have reproducibility in EC-Earth3.1 (look at discussion above) but the setup was not 100% perfect. There were two problems: 1) Problems linked to the differences in domain decomposition (number and distribution of processors) and 2) Problems linked to the differences in versions of compilers, the use of aggressive optimization levels and the absence of certain keys like fp-model strict/precise.
 To distinguish between the two problems and following Oriol and Kim's suggestion, here is the updated plan.
+** 0) Talk to NEMO and IFS teams** to at least inform them on our plans. **François** sends an e-mail to Sébastien Masson and **Kim** to IFS people at ECMWF. We know already that different domain decomposition gives different results (bit-wise) for NEMO. Whether the results can be different climate-wise is what we want to test, but these teams could have obtained results we are unaware of.
+**1) Reproducibility on Ithaca**. To isolate the effect of the decomposition of processors, we'll first run a reproducibility experiment on Ithaca, started from the equilibrated restart we have obtained after 60 years of simulation. We'll just change the domain decomposition (number and distribution). Since all other things will be equal by construction, this will allow to examine the sole effect of processors on reproducibility. Questions to elucidate at this stage are:
+  * Can we risk this strategy given that we don't know when we won't have access to Ithaca anymore?
+  * The reference decomposition is 72: (32+16+22) . What can be another decomposition? I suggest 64: (32+12+20) but without any clue if this makes sense
+  * Ideally, the compiler version, MPI and LAPACK versions, SZIP-HDF5-NetCDF-GRIB versions should also be freezed now, if we want to then run other experiments on other platforms.
+  * Flags for compilation should have the **-fp-model source** option, that favors reproducibility and portability (see the reference below). Unless what we all might think, the **-fp-model precise** or **-fp-model strict** options allow for accuracy, but not necessarily for reproducibility. Actually, not both characteristics can be achieved simultaneously -- look at the reference below. Thanks Kim for raising that.
+  * Optimization flags should be set to **-O0**. This will likely reduce the time of execution, but we don't know by how much yet. I would suggest to start the experiment. If we realize that it will take too long to finish, we might come back to this choice.
+**2) Reproducibility across machines**. When looking at the table prepared by Asif (above), we can see that there are well differences in the versions of compilers. We'll have to make sure all versions of compilers are identical, at least as much as we can. As a reminder, the idea is to make everything we have in our hands to make the experiments reproducible. For now only simulations on Ithaca and MareNostrum3 can be conducted.
+**3) We'll use the same diagnostics as we did earlier this year**. This part is ready, there is no reason why diagnostics should change.
+More about compilation options can be found {{https://software.intel.com/sites/default/files/Compiler_QRG_2013.pdf|here}}. A description of the tradeoffs in floating-point operations is {{https://software.intel.com/es-es/node/582224|here}} and {{https://software.intel.com/es-es/node/582223|here}}
+===== 9 December 2015 =====
+We had a meeting with usual people + Klaus and Uwe (SMHI) who are also tracking this reproducibility issue and are interested in what we are doing. Please visit [[https://dev.ec-earth.org/boards/6/topics/375|this page]]. They are mentioning an interesting paper by [[http://www.geosci-model-dev.net/8/2829/2015/gmd-8-2829-2015.pdf|Barker et al.]]. In this paper, a software is presented to track differences in the CESM model. The paper is not properly about GCMs, more about atmosphere and about short periods (1-yr).
+The discussions were quite rich, and here is the summary in a few bullet points
+  * We have to be extremely **careful** when saying things like "EC-Earth is not reproducible". First because "reproducibility" is a loosely defined concept: bit-for-bit reproducibility is different from climate-for-climate reproducibility. Second because we //users// might offend //developers// who strive to make their models reproducible, and this could be seen as a lack of respect.
+  * SMHI is mostly interested in understanding what are the configurations under which EC-Earth is reproducible, while the initial question we (at IC3) asked back then was: can we run EC-Earth on different platforms if we follow our common standards.
+  *

User Tools

Site Tools

Differences

Page Tools