diff --git a/docs/source/userguide/modifying_workflow/index.rst b/docs/source/userguide/modifying_workflow/index.rst index 332f8598a021cebd1aa08941c252969493e9eddc..e41040841010d78231d7974c1c76af85bb7e45e5 100644 --- a/docs/source/userguide/modifying_workflow/index.rst +++ b/docs/source/userguide/modifying_workflow/index.rst @@ -3,16 +3,31 @@ How to restart the experiment ============================= -This procedure allows you to restart an experiment. Autosubmit looks for the COMPLETED file for jobs that are considered active (SUBMITTED, QUEUING, RUNNING), UNKNOWN or READY. +How to do a full restart of the experiment +------------------------------------------ -.. warning:: You can only restart the experiment if there are not active jobs. You can use -f flag to cancel running jobs automatically. +By default, Autosubmit will assume we want to recover from a failed run. + +If that is not our case, and we want to start from scratch we must issue the following commands: -You must execute: :: - autosubmit recovery EXPID + # clears all COMPLETE files + autosubmit create cxxx + # sets all tasks to WAITING + autosubmit recovery cxxx --all -s + +How to recover an experiment +---------------------------- + +The recovery command will change the state of all the tasks that are (or can be) in READY status to COMPLETED if a completed file for that task is found. A complete file indicates that the job ran successfully at least once. + +.. warning:: You can only recover when the workflow is not running (i.e. there no QUEUING, SUBMITTED, or RUNNING tasks) + +To recover a running workflow, we must issue the recovery command with -f, so Autosubmit kills the tasks in remote before resetting their status. + +.. warning:: without the -s Autosubmit will only perform a dry-run (i.e. it will not take effect) of the command -*EXPID* is the experiment identifier. Options: :: @@ -45,22 +60,25 @@ Options: -f, --force Cancel active jobs -v, --update_version Update experiment version -Example: -:: +Examples: +:: + # performs a dry run of tasks and outputs their potential states to .txt file + autosubmit recovery cxxx + # sets all ready tasks with completition file present to COMPLETE and saves autosubmit recovery cxxx -s In order to understand more the grouping options, which are used for visualization purposes, please check :ref:`grouping`. - .. hint:: When we are satisfied with the results we can use the parameter -s, which will save the change to the pkl file and rename the update file. The --all flag is used to synchronize all jobs of our experiment locally with the information available on the remote platform (i.e.: download the COMPLETED files we may not have). In case new files are found, the ``pkl`` will be updated. Example: -:: +:: + # fetches all the autosubmit recovery cxxx --all -s How to rerun a part of the experiment