GitLab

Produce RO-Crate archive with Autosubmit

Merged Bruno de Paula Kinoshita requested to merge rocrate into master Feb 16, 2023

Add ro-crate-py as dependency, and produce an initial RO-Crate ZIP file with the metadata and original workflow configuration.

To test it:

Install Autosubmit using this branch (rocrate), with pip install -e . (in a venv? conda/mamba env?) https://autosubmit.readthedocs.io/en/master/installation/index.html
Create an experiment for mHM following installation instructions from the README here: https://github.com/kinow/auto-mhm-test-domains/blob/fb57531b52b1e2a2dbd6a1c53a64c954fcca4e5d/README.md (Use the branch from this PR instead of main: https://github.com/kinow/auto-mhm-test-domains/pull/12/files)
The command above from the README (i.e. autosubmit expid ....) will give you the experiment ID (hereafter <expid>)
You may also need to build the container used by the workflow, see README https://github.com/kinow/auto-mhm-test-domains/blob/fb57531b52b1e2a2dbd6a1c53a64c954fcca4e5d/README.md
Prepare the Autosubmit experiment (it will git-clone the remote repo) autosubmit create <expid>
Run the workflow with autosubmit run <expid>
Now create the RO-Crate: autosubmit -lc DEBUG archive --rocrate <expid>
Unzip and check the JSON to inspect the metadata unzip -p ~/autosubmit/<expid>/rocrate.zip ro-crate-metadata.json > ~/autosubmit//ro-crate-metadata.json`

Progress:

Use ro-crate-py Python library
Unify the YAML configuration of Autosubmit and produce a single workflow.yml (prospective provenance)
Produce an RO-Crate zip file with the JSON metadata, and the workflow.yml (correctly linked)
- Add authors, license, keywords (from an external YAML like COMPSs)
- Fetch the exp description from the DB
- Add logs as traces (retrospective provenance, all other items added below)
- Add plot if present
- Add databases
- Add pickle files
- Add inputs and outputs (use a convention, see https://github.com/ResearchObject/ro-crate-py/issues/148#issuecomment-1460482169)
- Have a look if we can add system usage (energy, memory, nodes, etc.) as per this comment https://github.com/ResearchObject/workflow-run-crate/issues/10#issuecomment-1456168053 (might be easier to do that in a follow-up, as that discussion is not closed yet)
Find a good public workflow to produce an RO-Crate and validate with the RO-Crate community
Validate the RO-Crate
Write tests
Write docs
Test archive & unarchive

Edited Aug 22, 2023 by Bruno de Paula Kinoshita

Assignee

Assign to

Reviewers

Request review from

Time tracking

Source branch: rocrate