From 135174dc3d502805e0582ebc0d7822f0927f04f1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Manuel=20Gim=C3=A9nez=20de=20Castro?= Date: Thu, 5 Sep 2024 04:40:25 +0200 Subject: [PATCH 1/3] clarify recovery command --- .../userguide/modifying_workflow/index.rst | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/docs/source/userguide/modifying_workflow/index.rst b/docs/source/userguide/modifying_workflow/index.rst index 332f8598..681ec303 100644 --- a/docs/source/userguide/modifying_workflow/index.rst +++ b/docs/source/userguide/modifying_workflow/index.rst @@ -3,11 +3,17 @@ How to restart the experiment ============================= -This procedure allows you to restart an experiment. Autosubmit looks for the COMPLETED file for jobs that are considered active (SUBMITTED, QUEUING, RUNNING), UNKNOWN or READY. +The recovery command Autosubmit will reset (i.e set to WAITING) the state of all the tasks that are not COMPLETED, UNKNOWN, or READY. -.. warning:: You can only restart the experiment if there are not active jobs. You can use -f flag to cancel running jobs automatically. +.. warning:: You can only reset tasks that are not active (i.e. SUBMITTED, QUEUEING, or RUNNING). + + +With the -f we specify to also reset the status of those tasks which are active. Autosubmit will make sure to kill them in remote. + +.. warning:: without the -s Autosubmit will only perform a dry-run (i.e. it will not take effect) of the command + +We must execute: -You must execute: :: autosubmit recovery EXPID @@ -45,14 +51,13 @@ Options: -f, --force Cancel active jobs -v, --update_version Update experiment version -Example: +Examples: :: - + # to recover all tasks that are not active and actually save them autosubmit recovery cxxx -s In order to understand more the grouping options, which are used for visualization purposes, please check :ref:`grouping`. - .. hint:: When we are satisfied with the results we can use the parameter -s, which will save the change to the pkl file and rename the update file. The --all flag is used to synchronize all jobs of our experiment locally with the information available on the remote platform -- GitLab From 6d64a6becc52ef76f31a1341b85f9efaf0b20803 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Manuel=20Gim=C3=A9nez=20de=20Castro?= Date: Thu, 5 Sep 2024 09:56:24 +0200 Subject: [PATCH 2/3] add a restart section --- .../userguide/modifying_workflow/index.rst | 22 ++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/docs/source/userguide/modifying_workflow/index.rst b/docs/source/userguide/modifying_workflow/index.rst index 681ec303..0ba8e56c 100644 --- a/docs/source/userguide/modifying_workflow/index.rst +++ b/docs/source/userguide/modifying_workflow/index.rst @@ -3,12 +3,27 @@ How to restart the experiment ============================= -The recovery command Autosubmit will reset (i.e set to WAITING) the state of all the tasks that are not COMPLETED, UNKNOWN, or READY. +How to do a full restart of the experiment +------------------------------------------ -.. warning:: You can only reset tasks that are not active (i.e. SUBMITTED, QUEUEING, or RUNNING). +By default, Autosubmit will assume you want to recover from a failed run. +If that is not the case, and you want to start from scratch we must issue the following commands: -With the -f we specify to also reset the status of those tasks which are active. Autosubmit will make sure to kill them in remote. +:: + # clears all COMPLETE files + autosubmit create cxxx + # sets all tasks to WAITING + autosubmit recovery cxxx --all -s + +How to recover an experiment +---------------------------- + +The recovery command will change the state of all the tasks that are (or can be) in READY status to COMPLETED if a completed file for that task is found. A complete file indicates that the job ran successfully at least once. + +.. warning:: You can only recover when the workflow is not running (i.e. there no QUEUING, SUBMITTED, or RUNNING tasks) + +To recover a running workflow, we must issue the recovery command with -f, so Autosubmit kills the tasks in remote before resetting their status. .. warning:: without the -s Autosubmit will only perform a dry-run (i.e. it will not take effect) of the command @@ -52,6 +67,7 @@ Options: -v, --update_version Update experiment version Examples: + :: # to recover all tasks that are not active and actually save them autosubmit recovery cxxx -s -- GitLab From 289aaec81b896ef001cba6649f3b0c1d75cfb75c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Manuel=20Gim=C3=A9nez=20de=20Castro?= Date: Fri, 6 Sep 2024 02:27:41 +0200 Subject: [PATCH 3/3] unify to plural and add comments on commands --- .../userguide/modifying_workflow/index.rst | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/docs/source/userguide/modifying_workflow/index.rst b/docs/source/userguide/modifying_workflow/index.rst index 0ba8e56c..e4104084 100644 --- a/docs/source/userguide/modifying_workflow/index.rst +++ b/docs/source/userguide/modifying_workflow/index.rst @@ -6,11 +6,12 @@ How to restart the experiment How to do a full restart of the experiment ------------------------------------------ -By default, Autosubmit will assume you want to recover from a failed run. +By default, Autosubmit will assume we want to recover from a failed run. -If that is not the case, and you want to start from scratch we must issue the following commands: +If that is not our case, and we want to start from scratch we must issue the following commands: :: + # clears all COMPLETE files autosubmit create cxxx # sets all tasks to WAITING @@ -27,13 +28,6 @@ To recover a running workflow, we must issue the recovery command with -f, so Au .. warning:: without the -s Autosubmit will only perform a dry-run (i.e. it will not take effect) of the command -We must execute: - -:: - - autosubmit recovery EXPID - -*EXPID* is the experiment identifier. Options: :: @@ -69,7 +63,9 @@ Options: Examples: :: - # to recover all tasks that are not active and actually save them + # performs a dry run of tasks and outputs their potential states to .txt file + autosubmit recovery cxxx + # sets all ready tasks with completition file present to COMPLETE and saves autosubmit recovery cxxx -s In order to understand more the grouping options, which are used for visualization purposes, please check :ref:`grouping`. @@ -80,8 +76,9 @@ The --all flag is used to synchronize all jobs of our experiment locally with th (i.e.: download the COMPLETED files we may not have). In case new files are found, the ``pkl`` will be updated. Example: -:: +:: + # fetches all the autosubmit recovery cxxx --all -s How to rerun a part of the experiment -- GitLab