PROVENANCE for SUNSET (METACLIP vocabularies)
metaclipR is a package implementing the METACLIP Provenance Framework in the R environment, specially tailored to efficiently deal with the specificities of climate4R
, allowing an easier abstraction of command calls and data structures to the entities defined in the METACLIP vocabularies.
We've been studying the functions of metaclipR and their compatibility with SUNSET:
-
Some functions can be directly implemented without the need for a climate4R object. For example, metaclipR.Dataset(), which we previously used in the "mini project," can generate dataset provenance metadata.
-
The data object managed by climate4R and metaclipR (netMl file) have a different structure than the s2dv_cube that SUNSET works with. We're exploring whether it's feasible to transform s2dv_cube into netML to enable some metaclipR functions to run properly.
-
Certain metaclipR functions require the processed data (netML) after a climate4R transformation to extract provenance directly from it.
-
metaclipR functions heavily rely on the UDG repository datasource, especially when defining the dataset and dataset subset. Things become more complicated when using an "unknown" dataset.
-
Fully describing the provenance of a climate product from SUNSET would necessitate expanding the ontology defined by METACLIP. Some data objects, such as ds:DecadalHindcast, are not defined. We need to consider how to expand the ontology while ensuring the interpreter can still function effectively.
We also tested the possibility of implementing provenance definition fully based on the METACLIP ontology, utilizing several metaclipR functions as well as functions from the igraph package. The results looked promising, making the possibility of developing our own provenance functions an alternative if the direct implementation through metaclipR proves to be too complicated.
After encountering several issues with the metaclipR package, we decided to develop our own functions to describe provenance for SUNSET. On one hand, metaclipR lacked functions to describe certain transformations executed in SUNSET, such as downscaling or unit transformation. Moreover, some functions were so specific to climate4R that they required modifications to function properly. Additionally, the way a provenance graph is defined in metaclipR makes it difficult to trace and identify the node names of all the represented steps, which complicates the provenance definition of complex relations. Therefore, we decided to create our own functions to gain specificity in the provenance definitions and have a set of functions that could easily integrate with SUNSET, extracting information directly from the s2dv_cube at each step.