... | ... | @@ -3,90 +3,84 @@ The way to use Autosubmit in the Earth Sciences network is by loading the corres |
|
|
This means that the **default** version is **proved** to be robust and it is the recommender option for production experiments. However, some features and improvements are only available in development versions, which we recommend testing so they can be improved further and become operational, always under the **user's responsibility**.
|
|
|
|
|
|
The aim of this section is to provide useful **guidelines** about the features that the default and development versions provide, as well as recommendations to run experiment configurations that may not be fully supported at the moment:
|
|
|
|
|
|
|
|
|
* **3.12.0** - This is version is already outdated, and users should run 3.13.0.
|
|
|
* In this version **vertical** and **horizontal** **wrappers** are fully supported.
|
|
|
* **Horizontal-vertical** wrappers are supported too. They were first developed in this Autosubmit version and they have been used in production together.
|
|
|
* Due to **technical limitations**, we don't recommend running experiments having many startdates/members (increased concurrency) or very large wrappers with 3.12.0b. As a rule of thumb, experiments with more than 10-20 members in total or wrappers with more than 50 jobs (the user can always reduce the wrapper size) may experience delays in the Autosubmit refresh cycle and generating the monitor views.
|
|
|
|
|
|
* **3.12.0** - This is version is already outdated, and users should run 3.13.0.
|
|
|
* In this version **vertical** and **horizontal** **wrappers** are fully supported.
|
|
|
* **Horizontal-vertical** wrappers are supported too. They were first developed in this Autosubmit version and they have been used in production together.
|
|
|
* Due to **technical limitations**, we don't recommend running experiments having many startdates/members (increased concurrency) or very large wrappers with 3.12.0b. As a rule of thumb, experiments with more than 10-20 members in total or wrappers with more than 50 jobs (the user can always reduce the wrapper size) may experience delays in the Autosubmit refresh cycle and generating the monitor views.
|
|
|
* In this version, inner jobs inside QUEUING wrappers show a SUBMITTED status. This is fixed in 3.13.0.
|
|
|
* Autosubmit **migrate** is not secured for big experiments, so it is recommended to backup the offered experiment (in the remote platforms) first. We encourage to use only 3.13.0 migrate.
|
|
|
|
|
|
|
|
|
* **3.13.0** - This is the actual default version and provides a lot of improvements, especially in terms of efficiency and stability. It also brings a full refactor of the wrapper module.
|
|
|
* **Multi-threaded** wrappers were introduced in this version.
|
|
|
* It provides the possibility to specify **multiple hosts** for the same platform (in a list) so it is more robust against connection issues/login failures.
|
|
|
* In general, **big experiments**, with **many startdates/members** or featuring very **big wrappers** run much more efficiently with 3.13.0.
|
|
|
* A completely new implementation of remote dependencies (**PreSubmission**) was introduced in this version. It helps to speed up the jobs in a Slurm platform by sending the next 10 Waiting jobs in advance to the queues.
|
|
|
* Workflows have more **flexibility** by the inclusion of a new way to define dependencies for specific chunks.
|
|
|
* Changes were made to the algorithm that handles the maximum active jobs by platform. From this version, **wrapped jobs** count as a **single job** for Autosubmit, and the maximum number of inner jobs can be defined with new wrapper parameters.
|
|
|
* New **POLICY** option allowing to tune the behaviour for creating wrapper jobs (more greedy, more conservative, and a more balanced one.
|
|
|
* Wrappers has a new option, **QUEUE**, that allows putting the wrapper job in a different queue than the single jobs.
|
|
|
* There is a new **log** (.err, .out, COMPLETED, STAT files) **recovering system**, that performs re-tries (in background threads) of the log files transfer from the remote platforms in case of failure.
|
|
|
* The user can specify a `datetime` or `time` to trigger the **experiment start** by sending the `-st` flag (plus the right format) using the `autosubmit run` command.
|
|
|
* The user can specify an **experiment dependency** by providing the `-sa` (plus the right expid format) flag to the `autosubmit run` command. The experiment will start when the experiment specified in the `-sa` flag finishes.
|
|
|
* When the user **quits** Autosubmit by using the `CTRL+C` keys, Autosubmit will make sure all threads are finished correctly before closing.
|
|
|
* Job lifecycle information is stored in an **external database** that will allow users to visualize job historical information. This information is gathered in a way that does not interfere with the normal workflow (even if the information gathering fails or any of its components). Furthermore, threading is implemented to prevent unnecessary delays.
|
|
|
* **Specific members** can be selected to run by using the `-rm` flag with `autosubmit run`. Autosubmit will only run jobs belonging to the specified members. Jobs already running will be monitored and properly completed.
|
|
|
* The `git clone` operation (`Autosubmit create`) now implements a **backup procedure** that will prevent loss of information in case of wrong configuration or network error.
|
|
|
* There is an improvement of the **security**, now all commands that could change the workflow are locked by an owner-only mechanism. Ej: create, refresh and run.
|
|
|
* New `autosubmit dbfix expid` command allowsing users to fix the `database malformed` error.
|
|
|
* Custom **shebang** (header of the script templates) so it is possible to use Python or R templates with a specific Python/R version dependency.
|
|
|
* Only `create` and `run` commands can **update** the workflow configuration and structure information. In the case of `run`, they will only be updated if a change is detected before the starting of the main run loop.
|
|
|
* Increased **robustness**. AS will try to prevent as many errors as possible at the beginning of the run and will handle other delicate operations before run time.
|
|
|
* **Multi-threaded** wrappers were introduced in this version.
|
|
|
* It provides the possibility to specify **multiple hosts** for the same platform (in a list) so it is more robust against connection issues/login failures.
|
|
|
* In general, **big experiments**, with **many startdates/members** or featuring very **big wrappers** run much more efficiently with 3.13.0.
|
|
|
* A completely new implementation of remote dependencies (**PreSubmission**) was introduced in this version. It helps to speed up the jobs in a Slurm platform by sending the next 10 Waiting jobs in advance to the queues.
|
|
|
* Workflows have more **flexibility** by the inclusion of a new way to define dependencies for specific chunks.
|
|
|
* Changes were made to the algorithm that handles the maximum active jobs by platform. From this version, **wrapped jobs** count as a **single job** for Autosubmit, and the maximum number of inner jobs can be defined with new wrapper parameters.
|
|
|
* New **POLICY** option allowing to tune the behaviour for creating wrapper jobs (more greedy, more conservative, and a more balanced one.
|
|
|
* Wrappers has a new option, **QUEUE**, that allows putting the wrapper job in a different queue than the single jobs.
|
|
|
* There is a new **log** (.err, .out, COMPLETED, STAT files) **recovering system**, that performs re-tries (in background threads) of the log files transfer from the remote platforms in case of failure.
|
|
|
* The user can specify a `datetime` or `time` to trigger the **experiment start** by sending the `-st` flag (plus the right format) using the `autosubmit run` command.
|
|
|
* The user can specify an **experiment dependency** by providing the `-sa` (plus the right expid format) flag to the `autosubmit run` command. The experiment will start when the experiment specified in the `-sa` flag finishes.
|
|
|
* When the user **quits** Autosubmit by using the `CTRL+C` keys, Autosubmit will make sure all threads are finished correctly before closing.
|
|
|
* Job lifecycle information is stored in an **external database** that will allow users to visualize job historical information. This information is gathered in a way that does not interfere with the normal workflow (even if the information gathering fails or any of its components). Furthermore, threading is implemented to prevent unnecessary delays.
|
|
|
* **Specific members** can be selected to run by using the `-rm` flag with `autosubmit run`. Autosubmit will only run jobs belonging to the specified members. Jobs already running will be monitored and properly completed.
|
|
|
* The `git clone` operation (`Autosubmit create`) now implements a **backup procedure** that will prevent loss of information in case of wrong configuration or network error.
|
|
|
* There is an improvement of the **security**, now all commands that could change the workflow are locked by an owner-only mechanism. Ej: create, refresh and run.
|
|
|
* New `autosubmit dbfix expid` command allowsing users to fix the `database malformed` error.
|
|
|
* Custom **shebang** (header of the script templates) so it is possible to use Python or R templates with a specific Python/R version dependency.
|
|
|
* Only `create` and `run` commands can **update** the workflow configuration and structure information. In the case of `run`, they will only be updated if a change is detected before the starting of the main run loop.
|
|
|
* Increased **robustness**. AS will try to prevent as many errors as possible at the beginning of the run and will handle other delicate operations before run time.
|
|
|
* Allows **prioritizing** a list of jobs to be run before the rest of the workflow. Via the use of the `Two_step_start` variable set in expdef.conf
|
|
|
* Allows **skipping** jobs of the same section if their last queuing member/chunk is higher than other on queuing/waiting/ready status.
|
|
|
* Reworked **migrate** command, with improvements in robustness and security.
|
|
|
* New `pklfix` command to restore a corrupted local database.
|
|
|
* New `updatedescrip` command to modify the experiment's description.
|
|
|
* Added Nord3 support.
|
|
|
|
|
|
* **3.14.0** - This latest development version for autosubmit 3 provides new functionalities, especially regarding workflow flexibility.
|
|
|
|
|
|
* Workflows have now increased flexibility which includes:
|
|
|
* There were improvements in select_chunks and added select_members.
|
|
|
* A new way of setting up the dependencies in the form of weak dependencies when specified in a job allows some dependencies to fail (dependencies marked with the '?' char).
|
|
|
* Added an Exclusion parameter that allows disabling the creation of a job for a given member/chunk.
|
|
|
* Reintroduced the re-run mechanism with improvements to the job selection and now it allows re-running any job.
|
|
|
* Workflow behaviour configuration:
|
|
|
* Added an extensible wall clock in cases where the internal retrial mechanism is triggered.
|
|
|
* Added the possibility of delaying job retrials.
|
|
|
* Now Proj.conf supports dynamic variables (%_%).
|
|
|
* We Added a chunk-dependant wallclock time.
|
|
|
* We standardized the meaning of task, total_jobs, and threads across all platforms and added parameter tasks to the slurm scheduler and hyperthreading for cca.
|
|
|
* Improvements to wrappers:
|
|
|
* Added the possibility of running different types of wrappers under the same experiment.
|
|
|
* Added an internal retrial mechanism for vertical wrappers, bypassing the resubmission of the wrapper.
|
|
|
* Experiment stats improvements:
|
|
|
* Job historical database is improved.
|
|
|
* Redesigned dbfix command to restore the historical job database in case of corruption.
|
|
|
* Now, AS can detect and recover the corruption of the historical job database in real time.
|
|
|
* The autosubmit stats command is more detailed.
|
|
|
* Homogeneous locale (LANG) enforced across all autosubmit iterations (wrapper, code, runtime and experiments).
|
|
|
* Project data/code management
|
|
|
* Depth and githooks are now supported.
|
|
|
* Submodules of submodules are now supported.
|
|
|
* Better project transfer when used in local mode.
|
|
|
* Security changes
|
|
|
* Security increased based on owner-only writing permissions for Autosubmit folders, having * only tmp accessible for other users.
|
|
|
* Redesigned Allowed/Denied hosts: authorized/forbidden mechanism for specific hosts and commands.
|
|
|
* Users who do not own a given experiment can monitor others' experiments without changing the experiment tree (except the temp folder).
|
|
|
* We Added security and a cancel mechanism into Autosubmit recovery (it will detect existing active jobs).
|
|
|
* Notifications changes:
|
|
|
* Now Autosubmit users will receive an e-mail notification if there are issues with any remote platform.
|
|
|
* Now Autosubmit users will always receive their e-mails with the proper timestamp.
|
|
|
* Improvements across all recovery procedures.
|
|
|
* New I/O recovery.
|
|
|
* New scheduler errors recovery.
|
|
|
* Log retrieval features a better recovery mechanism.
|
|
|
* Improved expid commands regarding the expid deletion, database storage and messages provided.
|
|
|
* The improved migrate command now supports bigger file sizes and an increment in the transfer speed.
|
|
|
* New installation methods are supported ( pip, conda )
|
|
|
* Docs structure and content are improved.
|
|
|
* Implements a set of new optimizations that address some issues regarding memory and performance.
|
|
|
|
|
|
* **4.0.0b** - This latest beta version features a complete overhaul of the configuration management alongside the porting to python 3.7 and 3.9.
|
|
|
* Written in Python 3.
|
|
|
* TODO
|
|
|
* Added Nord3 support.
|
|
|
* **<span dir="">3.14.0</span>**<span dir=""> - This latest development version provides new functionalities, especially regarding workflow flexibility.</span>
|
|
|
* **<span dir="">Workflows have now increased flexibility which includes:</span>**
|
|
|
* <span dir="">There were improvements in **select_chunks** and added **select_members.**</span>
|
|
|
* <span dir="">A new way of setting up the dependencies in the form of weak dependencies when specified in a job allows some dependencies to fail (dependencies marked with the '?' char).</span>
|
|
|
* <span dir="">Added an **Exclusion** parameter that allows disabling the creation of a job for a given member/chunk.</span>
|
|
|
* <span dir="">Reintroduced the **re-run mechanism** with improvements to the job selection and now it allows re-running any job.</span>
|
|
|
* **<span dir="">Workflow behaviour configuration:</span>**
|
|
|
* <span dir="">Added an **extensible wall clock** in cases where the internal retrial mechanism is triggered.</span>
|
|
|
* <span dir="">Added the possibility of **delaying job retrials.**</span>
|
|
|
* <span dir="">Now **Proj.conf** supports dynamic variables (%\_%).</span>
|
|
|
* <span dir="">We Added a chunk-dependant **wallclock** time.</span>
|
|
|
* <span dir="">We standardized the meaning of task, total_jobs, and threads across all platforms and added parameter tasks to the slurm scheduler and hyperthreading for cca. </span>
|
|
|
* **<span dir="">Improvements to wrappers:</span>**
|
|
|
* <span dir="">Added the possibility of running **different types of wrappers** under the same experiment.</span>
|
|
|
* <span dir="">Added an **internal retrial** mechanism for vertical wrappers, bypassing the resubmission of the wrapper.</span>
|
|
|
* **<span dir="">Experiment stats improvements:</span>**
|
|
|
* <span dir="">Job historical database is improved.</span>
|
|
|
* <span dir="">Redesigned **dbfix** command to restore the historical job database in case of corruption. </span>
|
|
|
* <span dir="">Now, AS can detect and recover the corruption of the historical job database in real time.</span>
|
|
|
* <span dir="">The autosubmit **stats** command is more detailed.</span>
|
|
|
* <span dir="">Homogeneous **locale (LANG)** enforced across all autosubmit iterations (wrapper, code, runtime and experiments).</span>
|
|
|
* **<span dir="">Project data/code management</span>**
|
|
|
* <span dir="">Depth and githooks are now supported.</span>
|
|
|
* <span dir="">Submodules of submodules are now supported.</span>
|
|
|
* <span dir="">Better project transfer when used in local mode. </span>
|
|
|
* **<span dir="">Security changes</span>**
|
|
|
* <span dir="">Security increased based on owner-only writing permissions for Autosubmit folders, having only tmp accessible for other users.</span>
|
|
|
* <span dir="">Redesigned **Allowed/Denied** hosts: authorized/forbidden mechanism for specific hosts and commands.</span>
|
|
|
* <span dir="">Users who do not own a given experiment can **monitor** others' experiments without changing the experiment tree (except the temp folder).</span>
|
|
|
* <span dir="">We Added security and a cancel mechanism into Autosubmit **recovery** (it will detect existing active jobs).</span>
|
|
|
* **<span dir="">Notifications changes:</span>**
|
|
|
* <span dir="">Now Autosubmit users will receive an **e-mail notification** if there are issues with any remote platform.</span>
|
|
|
* <span dir="">Now Autosubmit users will always receive their e-mails with the proper timestamp.</span>
|
|
|
* **<span dir="">Improvements across all recovery procedures.</span>**
|
|
|
* <span dir="">New I/O recovery.</span>
|
|
|
* <span dir="">New scheduler errors recovery.</span>
|
|
|
* <span dir="">Log retrieval features a better recovery mechanism.</span>
|
|
|
* **<span dir="">Improved expid</span>**<span dir=""> commands regarding the **expid deletion, database storage and messages provided.**</span>
|
|
|
* **<span dir="">The improved migrate command</span>**<span dir=""> now supports bigger file sizes and an increment in the transfer speed.</span>
|
|
|
* <span dir="">New **installation methods** are supported ( pip, conda )</span>
|
|
|
* **<span dir="">Docs</span>**<span dir=""> structure and content are improved. </span>
|
|
|
* <span dir="">Implements a set of **new optimizations** that address some issues regarding memory and performance. </span>
|
|
|
* **<span dir="">4.0.0b</span>**<span dir=""> - This latest beta version complete overhaul of the configuration management alongside the porting to python 3.7 and 3.9.</span>
|
|
|
* <span dir="">Written in Python 3. </span>
|
|
|
* TODO
|
|
|
|
|
|
Further information about all these functionalities and commands can be found at the [Autosubmit documentation](https://autosubmit.readthedocs.io/en/latest/introduction.html). |
|
|
\ No newline at end of file |