Newer
Older
3.13.0 - This has been the Autosubmit development version for many months, so it provides a lot of improvements, especially in terms of efficiency and stability. It also brings a full refactor of the wrapper module.
It provides the possibility to specify multiple hosts for the same platform (in a list) so it is more robust against connection issues/login failures.
In general, big experiments, with many startdates/members or featuring very big wrappers run much more efficiently with 3.13.0.
A completely new implementation of remote dependencies (PreSubmission) was introduced in this version. It helps to speed up the jobs in a Slurm platform by sending the next 10 Waiting jobs in advance to the queues.
Workflows have more flexibility by the inclusion of a new way to define dependencies for specific chunks.
Changes were made to the algorithm that handles the maximum active jobs by platform. From this version, wrapped jobs count as a single job for Autosubmit, and the maximum number of inner jobs can be defined with new wrapper parameters.
New POLICY option allowing to tune the behaviour for creating wrapper jobs (more greedy, more conservative, and a more balanced one.
Wrappers has a new option, QUEUE, that allows putting the wrapper job in a different queue than the single jobs.
There is a new log (.err, .out, COMPLETED, STAT files) recovering system, that performs re-tries (in background threads) of the log files transfer from the remote platforms in case of failure.
The user can specify a datetime or time to trigger the experiment start by sending the -st flag (plus the right format) using the autosubmit run command.
The user can specify an experiment dependency by providing the -sa (plus the right expid format) flag to the autosubmit run command. The experiment will start when the experiment specified in the -sa flag finishes.
When the user quits Autosubmit by using the CTRL+C keys, Autosubmit will make sure all threads are finished correctly before closing.
Job lifecycle information is stored in an external database that will allow users to visualize job historical information. This information is gathered in a way that does not interfere with the normal workflow (even if the information gathering fails or any of its components). Furthermore, threading is implemented to prevent unnecessary delays.
Specific members can be selected to run by using the -rm flag with autosubmit run. Autosubmit will only run jobs belonging to the specified members. Jobs already running will be monitored and properly completed.
The git clone operation (Autosubmit create) now implements a backup procedure that will prevent loss of information in case of wrong configuration or network error.
There is an improvement of the security, now all commands that could change the workflow are locked by an owner-only mechanism. Ej: create, refresh and run.
New autosubmit dbfix expid command allowsing users to fix the database malformed error.
Custom shebang (header of the script templates) so it is possible to use Python or R templates with a specific Python/R version dependency.
Only create and run commands can update the workflow configuration and structure information. In the case of run, they will only be updated if a change is detected before the starting of the main run loop.
Increased robustness. AS will try to prevent as many errors as possible at the beginning of the run and will handle other delicate operations before run time.
Allows prioritizing a list of jobs to be run before the rest of the workflow. Via the use of the Two_step_start variable set in expdef.conf
Allows skipping jobs of the same section if their last queuing member/chunk is higher than other on queuing/waiting/ready status.
Reworked migrate command, with improvements in robustness and security.
New pklfix command to restore a corrupted local database.
New updatedescrip command to modify the experiment's description.
3.12.0 - In this version vertical and horizontal wrappers are fully supported.
Horizontal-vertical wrappers are supported too. They were first developed in this Autosubmit version and they have been used in production together.
Due to technical limitations, we don't recommend to run experiments having many startdates/members (increased concurrency) or very large wrappers with 3.12.0b. As a rule of thumb, experiments with more than 10-20 members in total or wrappers with more than 50 jobs (the user can always reduce the wrapper size) may experience delays in the Autosubmit refresh cycle and generating the monitor views.
In this version, inner jobs inside QUEUING wrappers show a SUBMITTED status. This is fixed in 3.13.0.
Autosubmit migrate is not secured for big experiments, so it is recommended to backup the offered experiment (in the remote platforms) first. We encourage to use only 3.13.0 migrate.
In this version vertical and horizontal wrappers are fully supported.
Horizontal-vertical wrappers are supported too. They were first developed in this Autosubmit version and they have been used in production together.
Due to technical limitations, we don't recommend to run experiments having many startdates/members (increased concurrency) or very large wrappers with 3.12.0b. As a rule of thumb, experiments with more than 10-20 members in total or wrappers with more than 50 jobs (the user can always reduce the wrapper size) may experience delays in the Autosubmit refresh cycle and generating the monitor views.
In this version, inner jobs inside QUEUING wrappers show a SUBMITTED status. This is fixed in 3.12.1b.
3.11.1
Fix minor issues
Added new command for describe experiment
3.11.0
Included %m% in the list of exceptions
Wrapper major refactoring
- WrapperJob, WrapperBuilder and WrapperFactory
- 2 types of hybrid wrapper: vertical-horizontal and horizontal-vertical
- Machinefiles for horizontal and hybrid
- Reduced submitting time by merging commands (rm and find) into one
Fixed stats plot for wrapped jobs
Checks for necessary configuration when defining wrapper in autosubmit.conf:
MAX_WALLCLOCK, MAX_PROCESSORS, PROCESSORS_PER_NODE
Wrapper regression tests for mn4
Flag option not to do transitive reduction
Added the reasons for QUEUING (Reason) as returned by SLURM
Added master as default branch for git projects (otherwise clone failed if empty)
Some bug fixes:
- Bug fix for changing status of synchronized jobs
- Bug fix related to grouping
- Bug fix related to variable substitution %%
3.10.0
Vertical wrapper allowing mixed job types and additional constraints
Job grouping in the visualization graph
Txt output status for autosubmit monitor
Setstatus with allowing multiple job types
Host whitelist option in .autosubmitrc for autosubmit run
DELAY and SPLITS options for job configuration
Setting blank value for absent variables in project configuration
Minor bug fixes
3.9.0
Custom directives for the HPC resource manager headers
can be added on platforms and jobs configuration files
~ only paramiko (LSF, SLURM and PBS)
First version with migrate experiments (to another user)
On CCA, TASKS and THREADS can be expressed in lots (e.g. 127:1)
Some bug fixes:
- QUEUE on slurm specified on directive qos instead of partition
- Variable expansion on CCA (ECMWF) headers
3.8.1
First version with job packages ~ only paramiko (LSF, SLURM and PBS)
- Vertical
- Horizontal
- With dependencies ~ only for vertical
Python wrapper for CCA (ECMWF)
On submission template checking
Some UX improvements
Other minor bug fixes
First version with LSF arrays:
- Include all the bug fixes & features from 3.7.7
- NOT include the bug fixes from 3.7.8
3.7.8
Some bug fixes:
- Database persistence
- Delete command
- Unarchive command
- CHUNKINI option
- Paramiko permissions
- Paramiko non-existing remote copy
3.7.7
Some improvements for Slurm platforms
Geo-definition of processors
New configuration variables:
- CHUNKINI
- MEMORY
- MEMORY_PER_TASK
- HYPERTHREADING
Other minor bug fixes
3.7.6
Fixed refresh
Fixed recovery for ECMWF
Local logs copy can be disabled
Some UX improvements
3.7.4
Forward dependencies
Performance improvements
Log files copied into LOCAL platform
PROCESSORS_PER_NODE/TASKS now optional
Exclusivity for MN3 (with Paramiko)
THREADS optional for ECMWF
3.7.3
Fixed error with logs directives (err & out were swapped)
Added new option for MN3: SCRATCH_FREE_SPACE
PROCESSORS_PER_NODE/TASKS now available with Paramiko
Regression test suite improved
Solved some problems with paramiko & ECMWF platform
3.7.1
Fixed issue in setstatus
3.7.0
Big improvements on memory consumption
Added new configuration variables (default job's type, number of members..)
Added an alternative method to configure autosubmit without dialog library
UX improved (logs fixed, exceptions handled)
Fixed error with COMPLETED jobs shown as FAILED
Fixed error with LSF schedulers by default
Fixed bug on stats feature
Fixed some bugs with Git and SVN
3.6.1
Fixed an incompatibility with recent versions of radical.utils (saga)
3.6.0
Added multi-library communications support: SAGA & Paramiko
UX improved on some error cases
Fixed permission backwards incompatibility
Fixed authorization problems on SAGA implementation
3.5.0
Added another mechanism for SAGA errors prevention
Added no-plot option to setstatus
Added exclusivity and processes per host support for MN
Check method fixed (not working since 3.2)
3.4.1
Hot-fix ECMWF binary (bash, R, python)
Hot-fix Mail Notifications
3.4.0
Added email notifications support
Added mechanisms for incoherence prevention
Added mechanisms for SAGA pty errors prevention
3.3.0
Added filters in monitor
Added support for Python jobs
Added support for R jobs
Added unitary test suite
Synchronize job param
Fixed recovery issue
Other minor bugs fixed
3.2.0
Changed WAIT default
Recovery without -s
Group permissions to log files
Reservation support for MN
Fixed retrials bug
Fixed rerun bug
Fixed stats bug
Other minor bugs fixed
3.2.0b3
SAGA related bug fixes
Minor bug fixes
Now using SAGA for connection and queue management, adding support for more queue types
Stats revamped to provide more information and make it available earlier.
3.1.7
Fix issue StatsSnippet (job_st typo)
Fix issue recovery platforms to test
Fix issue s_rt SGE directive
Fix issue when creating, no option FILE_JOBS_CONF
3.1.5
Connect fixed to use Proxy Command
Fixes in documentation
3.1.4
Documentation for Variables
Minor bug fixes
3.1.3
Minor bug fixes, mostly related to SLURM
3.0.6
Fixed bug in setstatus.
Change in test.
3.0.5
Fixed bug in recovery.
Fixed bug in platform headers.
Fixed bug in delete.
Added readme and changelog commands.
MAX_WAITING_JOBS and TOTAL_JOBS now defined by platform.
Simplified console output of run sub command.
Fixed bug in expid test.
Fixed bug in config.
Restructure layout.