This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
reproducibility [2015/11/12 10:18] fmassonn [12 November 2015] |
reproducibility [2017/11/10 14:03] fmassonn |
||
---|---|---|---|
Line 145: | Line 145: | ||
===== Summary of monthly meetings ===== | ===== Summary of monthly meetings ===== | ||
- | ==== 13 May 2015 ==== | + | ===== 13 May 2015 ===== |
=== Agenda === | === Agenda === | ||
Line 180: | Line 180: | ||
''' | ''' | ||
- | ==== 17 June 2015 ==== | + | ===== 17 June 2015 ===== |
=== Points Discussed === | === Points Discussed === | ||
Line 217: | Line 217: | ||
* ''' | * ''' | ||
- | ==== 2 November 2015 ==== | + | ===== 2 November 2015 ===== |
=== Points Discussed === | === Points Discussed === | ||
Line 228: | Line 228: | ||
**Asif and François** launch the reproducibility experiment from stabilized **i06c** on Ithaca and MareNostrum | **Asif and François** launch the reproducibility experiment from stabilized **i06c** on Ithaca and MareNostrum | ||
- | ==== 12 November 2015 ==== | + | ===== 9 November 2015 ===== |
+ | Stabilization of **i06c**. The plots {{stabilization_GMST.pdf|here}} and {{stabilization_ice.pdf|here}} show the stabilization of one of the member of i06c that was extended to achieve equilibrium. It was decided that the run was now in a sufficiently stable climate to perform the second stream of reproducibility experiments. | ||
+ | |||
+ | ===== 12 November 2015 ===== | ||
There was a meeting involving François, Kim and Oriol, to discuss about the meaning and the use of compilation flags/ | There was a meeting involving François, Kim and Oriol, to discuss about the meaning and the use of compilation flags/ | ||
- | As a reminder: here's the current status. We don't have reproducibility (look at discussion above) but the setup was not 100% perfect. There were two problems: 1) Problems linked to the differences in domain decomposition (number and distribution of processors) and 2) Problems linked to the differences in versions of compilers, the use of aggressive optimization levels and the absence of certain keys like fp-model strict/ | + | As a reminder: here's the current status. We don't have reproducibility |
To distinguish between the two problems and following Oriol and Kim's suggestion, here is the updated plan. | To distinguish between the two problems and following Oriol and Kim's suggestion, here is the updated plan. | ||
- | ** 0) Talk to NEMO and IFS teams** to at least inform them on our plans. **François** sends an e-mail to Sébastien Masson and **Kim** to IFS contacts | + | ** 0) Talk to NEMO and IFS teams** to at least inform them on our plans. **François** sends an e-mail to Sébastien Masson and **Kim** to IFS people |
+ | |||
+ | **1) Reproducibility on Ithaca**. To isolate the effect of the decomposition of processors, we'll first run a reproducibility experiment on Ithaca, started from the equilibrated restart we have obtained after 60 years of simulation. We'll just change the domain decomposition (number and distribution). Since all other things will be equal by construction, | ||
+ | * Can we risk this strategy given that we don't know when we won't have access to Ithaca anymore? | ||
+ | * The reference decomposition is 72: (32+16+22) . What can be another decomposition? | ||
+ | * Ideally, the compiler version, MPI and LAPACK versions, SZIP-HDF5-NetCDF-GRIB versions should also be freezed now, if we want to then run other experiments on other platforms. | ||
+ | * Flags for compilation should have the **-fp-model source** option, that favors reproducibility and portability (see the reference below). Unless what we all might think, the **-fp-model precise** or **-fp-model strict** options allow for accuracy, but not necessarily for reproducibility. Actually, not both characteristics can be achieved simultaneously -- look at the reference below. Thanks Kim for raising that. | ||
+ | * Optimization flags should be set to **-O0**. This will likely reduce the time of execution, but we don't know by how much yet. I would suggest to start the experiment. If we realize that it will take too long to finish, we might come back to this choice. | ||
+ | |||
+ | **2) Reproducibility across machines**. When looking at the table prepared by Asif (above), we can see that there are well differences in the versions of compilers. We'll have to make sure all versions of compilers are identical, at least as much as we can. As a reminder, the idea is to make everything we have in our hands to make the experiments reproducible. For now only simulations on Ithaca and MareNostrum3 can be conducted. | ||
+ | |||
+ | **3) We'll use the same diagnostics as we did earlier this year**. This part is ready, there is no reason why diagnostics should change. | ||
+ | |||
+ | More about compilation options can be found {{https:// | ||
+ | |||
+ | |||
+ | ===== 9 December 2015 ===== | ||
+ | We had a meeting with usual people + Klaus and Uwe (SMHI) who are also tracking this reproducibility issue and are interested in what we are doing. Please visit [[https:// | ||
+ | |||
+ | The discussions were quite rich, and here is the summary in a few bullet points | ||
+ | * We have to be extremely **careful** when saying things like " | ||
+ | * SMHI is mostly interested in understanding what are the configurations under which EC-Earth is reproducible, | ||
+ | * Assessing reproducibility of a whole system is different from assessing reproducibility of one particular variable (e.g., Antarctic sea ice extent in winter). A good point of the Barker et al. paper referenced above is that their test is multivariate, | ||
+ | |||
+ | The topic is becoming extremely complex, far-reaching and our team looking into the topic is growing every month. On the other hand it has been a long-standing issue (almost one year now) and we need to have insights for the next EC-Earth meeting and the upcoming CMIP6. Here is a suggestion as how to continue the work: this should be split in two tasks | ||
+ | * **Developer aspect** - Xavi Yepes is now looking in the bit-for-bit reproducibility issue with EC-Earth 3.2. and for short (3-month) runs. SMHI (Uwe, Klaus) and KNMI (Philippe Le Sager) are aware of this. He is making several tests: | ||
+ | - Changing the number of processors in NEMO, IFS, both. | ||
+ | - Setting optimization to -O2 or -O3 | ||
+ | - Setting the -fp-model to precise, strict, source | ||
+ | * **User aspect** - Asif, François will continue adopting the " | ||
+ | |||
+ | ===== 15 December 2015 ===== | ||
+ | The **User aspect** experiments are launched. Ithaca' | ||
+ | |||
+ | ===== 17 December 2015 ===== | ||
+ | François and Xavier agreed that it is necessary to perform several executions changing technical aspects. Ideally, the following aspects should be all evaluated, but it is not feasible to handle it, because combinations grow exponentially. So, the parameters to try are: | ||
+ | |||
+ | * Compulsory: | ||
+ | * Code optimization: | ||
+ | * Regarding floating-point calculations: | ||
+ | * Usage of -xHost flag (best instructions according to host machine) | ||
+ | * Two processor combinations: | ||
+ | * IFS 320 and NEMO 288 | ||
+ | * IFS 128 and NEMO 64 | ||
+ | * Optional: | ||
+ | * Without -xHost | ||
+ | * Without -fp-model clause | ||
+ | * Try -fp-model source | ||
+ | * Explore more processor combinations | ||
+ | |||
+ | So, we should have 4 compulsory compilations: | ||
+ | * -O2 -fp-model precise -xHost | ||
+ | * -O2 -fp-model strict -xHost | ||
+ | * -O3 -fp-model precise -xHost | ||
+ | * -O3 -fp-model strict -xHost | ||
+ | |||
+ | And consequently, | ||
+ | |||
+ | Additional considerations: | ||
+ | |||
+ | * Use last EC-Earth 3.2beta release | ||
+ | * Enable key_mpp_rep | ||
+ | * 1 month, writing every day | ||
+ | * Use optimization to avoid mpi_allgather use at the northfold | ||
+ | |||
+ | ===== 4 February 2016 ===== | ||
+ | Javier García-Serrano and Mario Acosta have showed some reproducibility results in the EC-earth meeting 2016. The community recommend us to finish the reproducibility experiments and publish the results. Some issues should be treated before: | ||
+ | |||
+ | -Different combination of flags for optimization and floating-point operations have been checked in marenostrum3, | ||
+ | |||
+ | * Determine the best method to quantify differences between runs | ||
+ | * Propose a reference which we can use to compare the rest of experiments. This reference could be use in the future to check runs in new platforms, the inclusion of new modules, etc. | ||
+ | * Use a statistical method to quantify the differences between runs and propose a minimum to achieve instead of bitwise precision in order to avoid critical restrictions in performance. | ||
+ | * Propose a method to know which of two simulations with valid results is the best. Some experiments using different compiler flags will obtain similar valid results (maybe with differences of only 1%). It would be convenient to know which obtain better results (quality of the simulation results). | ||
+ | * Determine a combination of flags (Floating-point control and optimization) and additional optimization methods which achieve a balance between performance and accuracy & reproducibility. | ||
+ | * Suggest a combination of flags and/or implement some specific optimizations to achieve the best performance possible and at the same time the differences are less than X% using a particular platform and less than Y% using two different platforms with a similar architecture (being Y > X). | ||
+ | * If bit for bit reproducibility was achieved using ec-earth3.1, | ||
+ | |||
+ | ===== 27th of May 2016 ===== | ||
+ | See the summarizing presentations of {{20160526_groupmeeting.pdf | François }} and {{20160526_EC-Earth3.2_MarioAcosta.pdf | Mario }}. A more general set of slides about climate-reproducibility is available {{ 20160526_EC-Earth3.1_FrancoisMassonnet.pdf | here }} and was also posted on the EC-Earth development portal issue {{https:// | ||
+ | |||
+ | Actions: | ||
+ | * Mario runs an experiment with **-fpe0** activated, on ECMWF. | ||
+ | * Mario/ | ||
+ | |||
+ | ===== 10th of November 2017 ===== | ||
+ | Martin and François have worked to make the scripts testing the reproducibility more universal. These can now be found in the following gitlab project: | ||
- | **1) Reproducibility on Ithaca**. To isolate the effect of the decomposition of processors, we'll first run a reproducibility experiment on Ithaca. We'll just change the domain decomposition (number and distribution). Since all other things will be equal by construction, | + | https:// |
- | 2) **Reproducibility across machines** | + | A draft of the paper has been created: |
- | More about compilation options can be found {{https://software.intel.com/sites/default/files/Compiler_QRG_2013.pdf|here}} | + | https://docs.google.com/document/d/1aMsdggygIGmbyiFmmEOEFIl6ZVe-EO7Jcd04B6ZP91A/edit |