From d2ae4d446794b9370cea448b4293342bf583f592 Mon Sep 17 00:00:00 2001 From: "Bruno P. Kinoshita" Date: Tue, 30 Apr 2024 15:55:46 +0200 Subject: [PATCH] Add the Traceability docs, linking from index, simplifying a bit Provenance --- docs/source/userguide/index.rst | 1 + docs/source/userguide/provenance.rst | 8 +- docs/source/userguide/traceability.rst | 189 +++++++++++++++++++++++++ 3 files changed, 192 insertions(+), 6 deletions(-) create mode 100644 docs/source/userguide/traceability.rst diff --git a/docs/source/userguide/index.rst b/docs/source/userguide/index.rst index f1ec91c9f..706c289f9 100644 --- a/docs/source/userguide/index.rst +++ b/docs/source/userguide/index.rst @@ -15,6 +15,7 @@ User Guide /userguide/variables /userguide/expids /userguide/provenance + /userguide/traceability Command list ============ diff --git a/docs/source/userguide/provenance.rst b/docs/source/userguide/provenance.rst index 073fc6a70..610d2a97c 100644 --- a/docs/source/userguide/provenance.rst +++ b/docs/source/userguide/provenance.rst @@ -11,12 +11,8 @@ Each Autosubmit experiment is assigned a :doc:`unique experiment ID ` (also called expid). It also provides a central database and utilities that permit experiments to be referenced. -Every Autosubmit command issued by a user generates a timestamped log -file in ``/tmp/ASLOGS/``. For example, when the user runs -``autosubmit create `` and ``autosubmit run ``, these -commands should create files like ``/tmp/ASLOGS/20230808_092350_create.log`` -and ``/tmp/ASLOGS/20230808_092400_run.log``, with the same content -that was displayed in the console output to the user running it. +The commands issued by users generate log files, and become part of the +Autosubmit experiment, as explained in the :doc:`Traceability section `. Users can :ref:`archive Autosubmit experiments `. These archives contain the complete logs and other files in the experiment directory, and can be later unarchived diff --git a/docs/source/userguide/traceability.rst b/docs/source/userguide/traceability.rst new file mode 100644 index 000000000..5e8f3f68d --- /dev/null +++ b/docs/source/userguide/traceability.rst @@ -0,0 +1,189 @@ +############ +Traceability +############ + +.. + TODO: Add diagrams to illustrate traceability + TODO: Add links to more information about each log + +Configuration +------------- + +An Autosubmit experiment starts with its creation using a version +of Autosubmit to issue the command ``autosubmit expid``. The generated +experiments contain minimal YAML configuration to bootstrap the +experiment. + +For an Autosubmit experiment of type ``Git``, the rest of the experiment +configuration is located at a location like ``/proj/git_project/`` (the +``proj`` part is constant, but the ``git_project`` is configurable) and imported +by Autosubmit. The ``/proj/git_project/`` subdirectory contains a clone +of a Git repository (i.e. there is a proj/git_project/.git). + +.. note:: + + Autosubmit combines multiple YAML files and generates a merged YAML + file at ``/conf/metadata/experiment_data.yml``. This file can be used to analyse the final configuration used for the run and compare with the information from trace files. + +The cloned repository may contain YAML configuration files in a location +such as ``/proj/git_project/conf``, for example, with settings +for models and applications, Autosubmit ``jobs``, as well as the template +scripts (e.g. under ``/proj/git_project/templates``, or anywhere +the user may choose). + +These configuration files, template scripts, and the Git information +from ``/proj/git_project/`` (and any Git submodules), are one part of +the traces used for provenance and reproducibility of the Autosubmit +experiments. The rest of the traces and the data produced by running +the experiment workflow jobs are explained in the following sections. + +Logs +---- + +Most of the Autosubmit commands that take an ``expid`` argument (``autosubmit create``, +``autosubmit setstatus``, ``autosubmit run``, etc.) write to log +files persisted in the computer where the command is issued, along with +the rest of the workflow configuration and other traces. The only exception +being ``autosubmit delete ``, which will write to the global log +path, as it deletes the experiment folder, along with its ``ASLOGS`` folder. + +By default, these command logs are saved under ``/tmp/ASLOGS``, contain in their names +the timestamp of the command, and always come in pairs of "``.log``" and +"``_err.log``" files (one for the command standard output, and one for +the error output). + +If the user issuing the command is not the owner of the experiment, then +Autosubmit will try to write the log file in the ``ASLOGS`` folder first, +and should that fail, it will try to write to the ``tmp`` folder or to the +global log path, depending on the file system permissions for the user. + +.. note:: + + Autosubmit keeps ``10`` logs of each command, i.e. up to ``10`` logs of + ``autosubmit create``, ``10`` logs of ``autosubmit run``, etc., and then + removes older log files when new ones are created. + +For Autosubmit commands that do not contain an ``expid`` argument +(e.g. ``autosubmit expid``, ``autosubmit testcase``, ``autosubmit readme``, etc.) +will write to the global log path, which can be configured in the ``.autosubmitrc`` +configuration file. + +.. note:: + + The are commands that do not produce any log, e.g. ``autosubmit delete`` as + +The logs of the workflow tasks are retrieved from remote platforms by Autosubmit +and written to ``/tmp/LOG_/``. They contain the output and errors, +as well as the trace output of the template script after parameter expansion +(done via the set -x mode in Bash Shell). + +The parent directory, ``/tmp``, contains other trace files: + +- ``.cmd`` files that are the scripts created by Autosubmit from the templates + and used to run each task (locally or to a remote platform with Slurm, for example); +- ``*_COMPLETED`` files that confirm a task was marked as completed by the platform; +- ``*_STAT`` files that contain the latest start and end date of the job; and +- ``*_TOTAL_STATS`` that aggregates the information of all ``*_STAT`` info for + the current and previous jobs. + +Data +---- + +The Autosubmit experiment ID acts as an persistent identifier (**PID**), which +can be used to link data produced, traces, and configuration. + +For example, it is possible to use the experiment ID in directories or +as metadata to data written to remote file systems and databases. This way, +one can verify if the experiment produced the expected data, or what experiment +produced certain data. + +Users must decide on the policy to maintain experiments. Depending on the number +of experiments (thousands, millions) and storage limitations (user quota) it may +be necessary to remove experiments and any data in the experiment directory. + +It is possible to :ref:`archive Autosubmit experiments `, or delete +old experiments. Another possibility is to compress logs and traces generated by +experiments, keeping the experiments in the Autosubmit experiments directory. + +A practical example +------------------- + +Given an **experiment ID**, such as ``a001``, the experiment directory in a machine +could be something similar to ``/$HOME/a001/`` (configurable). For brevity, the +rest of this section will use relative directories like ``tmp/`` instead of +``/app/autosubmit/a001/tmp/``. + +The **YAML configuration** files of the experiments are stored in the ``conf/`` +subdirectory and may import other YAML files from ``proj/git_project/`` (where +``proj`` is a directory common to all Autosubmit experiments, but ``git_project`` +is configurable). + +The complete YAML configuration used by Autosubmit, after all files have been +included by Autosubmit, is stored at ``conf/metadata/experiment_data.yml``. + +The ``autosubmit`` commands issued for the experiment ``a001`` will have access +to this YAML configuration, and will be logged to files in the platforms configured +(local or remote). The log files are later retrieved by Autosubmit automatically, +and saved to the machine where the ``autosubmit`` command was issued at. The +**command logs** are stored in the directory ``tmp/ASLOGS``. + +Running ``autosubmit setstatus``, for example, would produce files that could be +stored for example as ``tmp/ASLOGS/20240319_141712_setstatus.log`` and +``tmp/ASLOGS/20240319_141712_setstatus_err.log.``. These two files contain the +standard output and error output of the ``autosubmit setstatus`` command, issued on +``2024-03-19 at 14:17:12`` (computer time). The "``.log``" file contains the output +produced by Autosubmit, whereas the "``_err.log`` file would contain the error or +be empty if no error occurred. + +.. parsed-literal:: + :name: 20240319_141712_setstatus + + 2024-03-19 14:17:17,772 Autosubmit is running with **4.1.0** + 2024-03-19 14:17:17,782 Preparing .lock file to avoid multiple instances with same expid. + 2024-03-19 14:17:17,782 Exp ID: **a001** + 2024-03-19 14:17:17,782 Save: **False** + 2024-03-19 14:17:17,782 Final status: WAITING + 2024-03-19 14:17:17,782 List of jobs to change: **a001_20200101_fc0_285_SIM a001_20200101_fc0_284_SIM** + 2024-03-19 14:17:17,782 Chunks to change: None + 2024-03-19 14:17:17,782 Status of jobs to change: None + 2024-03-19 14:17:17,782 Sections to change: None + ... + +The **workflow task logs** are stored in the directory ``tmp/LOG_``, +``tmp/LOG_a001/`` in this example. The task logs are written on the remote +platforms used in the experiment configuration (e.g. a cloud server, or HPC). +These files are copied automatically by Autosubmit to the computer where the +``autosubmit`` command was issued at. + +These log files, like the ``autosubmit`` commands logs described before, also +come in pairs "``.out``" and "``.err``". However, in this case the "``.err``" +file contains the workflow task script source with the Bash Shell script +generated by Autosubmit and the expanded parameters (produced with the Bash +Shell attribute ``-x``). The file name also contains a timestamp from when the +job was started. + +.. parsed-literal:: + :name: a001_20200101_fc0_337_SIM.20240327051605.err + + [INFO] JOBID=**6709774** + job_name_ptrn='/scratch/**a001**/LOG_a001/**a001_20200101_fc0_337_SIM**' + + job_name_ptrn=/scratch/a001/LOG_a001/a001_20200101_fc0_337_SIM + echo $(date +%s) > ${job_name_ptrn}_STAT + ++ date +%s + + echo 1711509353 + ... + +The ``.err`` and ``.out`` files both contain the ``JOBID`` data, which for +remote platforms like HPC batch systems (e.g. Slurm) represent the Job ID. +As well as any other output from the workflow task. + +Users can also access the jobs data stored by Autosubmit in +``/metadata/data/job_data_a001.db``, to query for information +from previous jobs: + +.. parsed-literal:: + :name: job_data_example + + $ sqlite3 ~/job_data_a001.db "select job_id from job_data where job_name = 'a001_20200101_fc0_337_SIM';" + 6709774 + $ # Use sacct, scontrol, etc. in the remote platform to query the Job information \ No newline at end of file -- GitLab