High performance and scores
Currently, computation of scores or other diagnostics in s2dverification is slow.
It is needed to optimize the functions in terms of:
a) Computing time, for example by using base R vectorised functions or by calling C functions
b) Memory usage, for example by avoiding unneeded copies of arrays in the score functions.
c) Parallelism, particularly making them able to run on clusters with thousands of cores.
@ncortesi already programmed a set of scripts that allow to run score functions by chunks on MareNostrum (or SMP, with less than 128 tasks) and analysed the performance, giving a solution to c). More information can be found at https://earth.bsc.es/wiki/lib/exe/fetch.php?media=library:internal:20160615_running_diagnostics_on_marenostrum.pdf . This solution is, however, not ready to be used without some difficult configuration steps. @nmanubens is in contact with him to solve this issue. Additionally, in the long term, this solution should ideally be ported to R and make it available through parameters in the typical score functions or by evolving the veriApply()
function in the package easyVerification.
@nmanubens and @jginer are working on a compatibility break in s2dverification in which the data model used throughout will be revised and improved taking into account existing libraries that, for example, offer abstractions of distributed arrays in a way that, a simple sum of two arrays, is launched transparently and automatically in the multiple nodes. Furthermore, this model has to be agreed and common with the one in downscaleR (discussion ongoing with Joaquín from UNICAN) and it should also end up being, ideally, the data model used in QA4Seas.
The Data & Diagnostics team is also investigating existing paradigms and techniques to deal with this kind of problems.