Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in
  • autosubmit autosubmit
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 338
    • Issues 338
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 21
    • Merge requests 21
  • Deployments
    • Deployments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • Earth SciencesEarth Sciences
  • autosubmitautosubmit
  • Wiki
  • Home

Home · Changes

Page history
dmanubens created page: home authored Nov 07, 2014 by Domingo Manubens-Gil's avatar Domingo Manubens-Gil
Hide whitespace changes
Inline Side-by-side
home.markdown
View page @ a63be1bf
......@@ -77,67 +77,113 @@ scheduler, monitoring the number of jobs submitted or queuing and would
submit a new one as soon as a space in the queue would appear until the
entire sequence of jobs is submitted.
` This concept has been kept for the current Python version of Autosubmit with a few capabilities added.`
The most interesting added capability is that Autosubmit can now deal
with the dependency between jobs. (i.e.: it can wait for a particular
job to finish before launching the next one)
` Autosubmit can manage different type of job with different templates.`\
` Autosubmit can also restart a failed job, stop the submission process and restart where it left it.`\
` New object oriented design and refactoring of Python code has been done in Autosubmit and now there is a new module to create experiments from scratch and store small information into a SQLite database.`\
` Thanks to this, there is also the possibility to create, manage and monitor different types of experiments (currently EC-Earth, NEMO and IFS) and to tackling with different queue schedulers (such as PBS, SGE and SLURM). `
` `![`Scheduler.png`](Scheduler.png "fig:Scheduler.png")
` ==== What is a Job? ====`\
` A job in the HPC jargon is a program submitted to the queue system. It can be serial or multi-threaded, use different type of queue and have all the different directives than the scheduler of the HPC system provides.`\
` Within Autosubmit a Job Class has been created and in the rest of the documentation the term "Job" will refer to the python object from that class.`\
` A job has several attributes:`\
` -job.name : This name must be unique if several jobs are created.`\
` -job.id : This jobid is 0 by construction and will be set by the scheduler, hence will only be unique once the job has been submitted.`\
` -job.status: The status is updated regularly and will tell Autosubmit whether a Job is Ready to be submitted, completed, queuing etc.`\
` -job.type: Each job type has a different template, so you can treat differently multi-processors and serial jobs for example.`\
` -job.failcount: This counter is to keep track of the number of time that a job has failed. At the moment if it fails more than 4 times, the job is cancelled and not resubmitted.`
` The depency between jobs is dealt with the concept of inheritance. Each Job has two more attributes:`\
` -job.Children : This is a list of dependent jobs. Those children can only be launched once this job is completed.`\
` -job.Parents : This the list of jobs from which it has to wait for completion. Only when this list is empty can a job be submitted. `
` ==== What is a JobList? ====`\
` The JobList module regroups all the functions necessary for managing a list of jobs. A joblist object can be sorted by status, type, jobid or name and sublists can also be created from there.`\
` The updateJobList() function is called at every loop of Autosubmit and does what it says on the tin. The status of a job is then only 'true' directly after the call of that function.`\
` The SaveJobList() function save the joblist in a pickle file which can then be reloaded for a restart for example.`\
` Other functions like updateGenealogy() are only called once after a joblist is created. When the joblist is created, the dependency or inheritance between jobs can only be created with the job names. The updateGenealogy() function replace the children and parents names by job objects. `
` ==== General HPCQueue ====`\
` Autosubmit needs to interact with the queue system regularly to know how many jobs are in the queue and thus how many jobs can be submitted. The HPCQueue abstract class provides all the functions necessary to communicate with the scheduler so a job can be at all time checked, cancel or submitted and the state of the queue assessed.`
` ==== Concrete HPCQueue ====`\
` A concrete queue is a specialization of an HPCQueue that inherits all the functions common in a general queue and has concrete attributes and concrete methods within each queue system.`\
` Autosubmit currently has the concrete modules to wrap the queue commands from MareNostrum machines, Ithaca cluster and Lindgren machines (MnQueue, ItQueue and LgQueue).`\
` A concrete queue has several attributes:`\
` -queue.host: This is the host name or the IP to set up connections.`\
` -queue.job_status: Each job status has a different code depending on the queue scheduler, so you can treat differently the responses of each concrete HPCQueue.`\
` -queue.submit_cmd: This is the concrete command to submit jobs.`\
` -queue.checkjob_cmd: This is the concrete command to check a job status.`\
` -queue.cancel_cmd: This is the concrete command to cancel jobs. `
` `![`Queues.png`](Queues.png "fig:Queues.png")
` ==== Monitoring the experiment ====`\
` Additional functionality to monitor an experiment have been added in Autosubmit.`\
` From the joblist, it is possible to create a "tree" to visualize the status of the joblist.`\
` Each status has a different color scheme: Green = running, red = failed etc.`
` `![`JobListTree.png`](JobListTree.png "fig:JobListTree.png")
` ==== Job Wrapper ====`\
` Currently supercomputers are increasing their number of cores rapidly but also the rules to make use of them are become more strict (e.g. minimum number of cores per job 2000). This is not feasible with the current state of the EC-Earth which is difficult to scale beyond a few hundred cores.`
` In order to provide a solution to the climate community we have been making some test with a job wrapper. The idea behind this is to run several ensamble members at the same time under the control of a python script. We upload the script for each ensamble member we want to run. The wrapper has to allocate resources for each of the script to run (i.e. if each of the scripts requires 45 CPU and we want to run 10 that would be 450). The wrapping python script creates a thread for every ensamble member and runs them. `
` Further information:`
This concept has been kept for the current Python version of Autosubmit
with a few capabilities added. The most interesting added capability is
that Autosubmit can now deal with the dependency between jobs. (i.e.: it
can wait for a particular job to finish before launching the next one)
Autosubmit can manage different type of job with different templates.
Autosubmit can also restart a failed job, stop the submission process
and restart where it left it. New object oriented design and refactoring
of Python code has been done in Autosubmit and now there is a new module
to create experiments from scratch and store small information into a
SQLite database. Thanks to this, there is also the possibility to
create, manage and monitor different types of experiments (currently
EC-Earth, NEMO and IFS) and to tackling with different queue schedulers
(such as PBS, SGE and SLURM).
![](Scheduler.png "Scheduler.png")
#### What is a Job?
A job in the HPC jargon is a program submitted to the queue system. It
can be serial or multi-threaded, use different type of queue and have
all the different directives than the scheduler of the HPC system
provides. Within Autosubmit a Job Class has been created and in the rest
of the documentation the term "Job" will refer to the python object from
that class. A job has several attributes: -job.name : This name must be
unique if several jobs are created. -job.id : This jobid is 0 by
construction and will be set by the scheduler, hence will only be unique
once the job has been submitted. -job.status: The status is updated
regularly and will tell Autosubmit whether a Job is Ready to be
submitted, completed, queuing etc. -job.type: Each job type has a
different template, so you can treat differently multi-processors and
serial jobs for example. -job.failcount: This counter is to keep track
of the number of time that a job has failed. At the moment if it fails
more than 4 times, the job is cancelled and not resubmitted.
The depency between jobs is dealt with the concept of inheritance. Each
Job has two more attributes: -job.Children : This is a list of dependent
jobs. Those children can only be launched once this job is completed.
-job.Parents : This the list of jobs from which it has to wait for
completion. Only when this list is empty can a job be submitted.
#### What is a JobList?
The JobList module regroups all the functions necessary for managing a
list of jobs. A joblist object can be sorted by status, type, jobid or
name and sublists can also be created from there. The updateJobList()
function is called at every loop of Autosubmit and does what it says on
the tin. The status of a job is then only 'true' directly after the call
of that function. The SaveJobList() function save the joblist in a
pickle file which can then be reloaded for a restart for example. Other
functions like updateGenealogy() are only called once after a joblist is
created. When the joblist is created, the dependency or inheritance
between jobs can only be created with the job names. The
updateGenealogy() function replace the children and parents names by job
objects.
#### General HPCQueue
Autosubmit needs to interact with the queue system regularly to know how
many jobs are in the queue and thus how many jobs can be submitted. The
HPCQueue abstract class provides all the functions necessary to
communicate with the scheduler so a job can be at all time checked,
cancel or submitted and the state of the queue assessed.
#### Concrete HPCQueue
A concrete queue is a specialization of an HPCQueue that inherits all
the functions common in a general queue and has concrete attributes and
concrete methods within each queue system. Autosubmit currently has the
concrete modules to wrap the queue commands from MareNostrum machines,
Ithaca cluster and Lindgren machines (MnQueue, ItQueue and LgQueue). A
concrete queue has several attributes: -queue.host: This is the host
name or the IP to set up connections. -queue.job\_status: Each job
status has a different code depending on the queue scheduler, so you can
treat differently the responses of each concrete HPCQueue.
-queue.submit\_cmd: This is the concrete command to submit jobs.
-queue.checkjob\_cmd: This is the concrete command to check a job
status. -queue.cancel\_cmd: This is the concrete command to cancel jobs.
![](Queues.png "Queues.png")
#### Monitoring the experiment
Additional functionality to monitor an experiment have been added in
Autosubmit. From the joblist, it is possible to create a "tree" to
visualize the status of the joblist. Each status has a different color
scheme: Green = running, red = failed etc.
![](JobListTree.png "JobListTree.png")
#### Job Wrapper
Currently supercomputers are increasing their number of cores rapidly
but also the rules to make use of them are become more strict (e.g.
minimum number of cores per job 2000). This is not feasible with the
current state of the EC-Earth which is difficult to scale beyond a few
hundred cores.
In order to provide a solution to the climate community we have been
making some test with a job wrapper. The idea behind this is to run
several ensamble members at the same time under the control of a python
script. We upload the script for each ensamble member we want to run.
The wrapper has to allocate resources for each of the script to run
(i.e. if each of the scripts requires 45 CPU and we want to run 10 that
would be 450). The wrapping python script creates a thread for every
ensamble member and runs them.
Further information:
1. International Conference on Computational Science (Cairns,
Australia, June 10 - 12, 2014), Impact of I/O and Data Management in
......@@ -151,79 +197,133 @@ job to finish before launching the next one)
10.1016/j.procs.2014.05.221](http://www.sciencedirect.com/science/article/pii/S1877050914003986)
(SPECS, IS-ENES2, INCITE).
` ==== IS-ENES 2 ====`\
` ===== A CNRM-CM6 monitoring using Autosubmit =====`
` A few members of seasonal forecast experiment using CNRM-CM6 on ECMWF IBM Power 7 has been performed using Autosubmit monitoring. A few day long collaboration at IC3 has been sufficient to adapt the existing CNRM workflow script to Autosubmit non-intrusive requirements. Nevertheless, a more comprehensive work would be necessary to fully exploit Autosubmit capabilities to monitor and control the full workflow (from compiling) on any kind of supercomputer platform.`
\<!--===== Lindgren =====
![](lindgren-test1-1.png "fig:lindgren-test1-1.png")
![](lindgren-test1-2.png "fig:lindgren-test1-2.png")
![](lindgren-test1-3.png "fig:lindgren-test1-3.png")
![](lindgren-test1-4.png "fig:lindgren-test1-4.png")
` The technical report descirbing the work is available here: `[`http://www.cerfacs.fr/globc/publication/technicalreport/2014/autosubmit_cnrm-cm.pdf`](http://www.cerfacs.fr/globc/publication/technicalreport/2014/autosubmit_cnrm-cm.pdf)
##### Jaguar
` == Requirements ==`
![](jaguar-test1-1.png "fig:jaguar-test1-1.png")
![](jaguar-test1-2.png "fig:jaguar-test1-2.png")
![](jaguar-test1-3.png "fig:jaguar-test1-3.png")
![](jaguar-test1-4.png "fig:jaguar-test1-4.png")
` === How to deploy/setup Autosubmit (v2) ===`
![](jaguar-test2-1.png "fig:jaguar-test2-1.png")
![](jaguar-test2-2.png "fig:jaguar-test2-2.png")
![](jaguar-test2-3.png "fig:jaguar-test2-3.png")
![](jaguar-test2-4.png "fig:jaguar-test2-4.png") --\>
` Autosubmit has been tested:`\
` with the following Operating Systems:`\
` * Linux Debian`
#### IS-ENES 2
` on the following HPC's/Clusters:`\
` * Ithaca (IC3 machine)`\
` * MareNostrum (BSC machine)`\
` * MareNostrum3 (BSC machine)`\
` * HECToR (EPCC machine)`\
` * Lindgren (PDC machine)`\
` * C2A (ECMWF machine)`
##### A CNRM-CM6 monitoring using Autosubmit
- ARCHER (EPCC machine)
` Pre-requisties: These packages (python2, python-argparse, python-dateutil, python-pydot, python-matplotlib, sqlite3) must be available at local machine. And the machine is also able to access HPC's/Clusters via password-less ssh.`
A few members of seasonal forecast experiment using CNRM-CM6 on ECMWF
IBM Power 7 has been performed using Autosubmit monitoring. A few day
long collaboration at IC3 has been sufficient to adapt the existing CNRM
workflow script to Autosubmit non-intrusive requirements. Nevertheless,
a more comprehensive work would be necessary to fully exploit Autosubmit
capabilities to monitor and control the full workflow (from compiling)
on any kind of supercomputer platform.
` Create a repository for experiments: Say for example "/cfu/autosubmit" then edit the repository path into src/dir_config.py, src/expid.py, conf/autosubmit.conf`\
` Create a blank database: Say for example "autosubmit.db" at above created repository and thereafter:`\
` > cd /cfu/autosubmit`\
` > sqlite3 autosubmit.db`\
` sqlite3>.read ../../src/autosubmit.sql`\
` > chmod 777 autosubmit.db`
The technical report descirbing the work is available here:
<http://www.cerfacs.fr/globc/publication/technicalreport/2014/autosubmit_cnrm-cm.pdf>
` == Use ==`
Requirements
------------
` * Autosubmit 2.4.1 `[`documentation`](http://autosubmit.ic3.cat)\
` ** --`[`Dmanubens`](User:Dmanubens "wikilink")` (`[`talk`](User talk:Dmanubens "wikilink")`) 17:27, 4 July 2014 (CEST) - Autosubmit 2.4.1 CFU presentation `![`AS241.pdf`](AS241.pdf "fig:AS241.pdf")\
` * Autosubmit 2.4.0 `[`documentation`](http://autosubmit.ic3.cat/autosubmit2.4.0)\
` * Autosubmit 2.3 `[`documentation`](http://autosubmit.ic3.cat/autosubmit2.3)\
` * Autosubmit 2.2 `[`documentation`](http://autosubmit.ic3.cat/autosubmit2.2)\
` * Autosubmit 2.1 `[`documentation`](http://autosubmit.ic3.cat/autosubmit2.1)
### How to deploy/setup Autosubmit (v2)
` == Repository ==`\
` To check out a working copy of autosubmit, from the CFU network:`\
` git clone `[`https://dev.cfu.local/autosubmit.git`](https://dev.cfu.local/autosubmit.git)` your_path_to_working_copy `
Autosubmit has been tested: with the following Operating Systems:
` == Contact ==`
- Linux Debian
` The coordinator of this project is Domingo Manubens Gil `<domingo.manubens@ic3.cat>
on the following HPC's/Clusters:
` Domingo Manubens Gil `<domingo.manubens@ic3.cat>`, Oriol Mula-Valls `<oriol.mula-valls@ic3.cat>`, Muhammad Asif `<muhammad.asif@ic3.cat>`, Pierre-Antoine Bretonnière `<pierre-antoine.bretonniere@ic3.cat>
- Ithaca (IC3 machine)
- MareNostrum (BSC machine)
- MareNostrum3 (BSC machine)
- HECToR (EPCC machine)
- Lindgren (PDC machine)
- C2A (ECMWF machine)
- ARCHER (EPCC machine)
` As a new user, please register to this mailing list: `[`http://autosubmit-users.ic3.cat/mailman/listinfo/autosubmit-users`](http://autosubmit-users.ic3.cat/mailman/listinfo/autosubmit-users)\
` You'll then have access to the history of all the emails sent to the users and presenting the`\
` functions and their available options.`
Pre-requisties: These packages (python2, python-argparse,
python-dateutil, python-pydot, python-matplotlib, sqlite3) must be
available at local machine. And the machine is also able to access
HPC's/Clusters via password-less ssh.
Create a repository for experiments: Say for example "/cfu/autosubmit"
then edit the repository path into src/dir\_config.py, src/expid.py,
conf/autosubmit.conf Create a blank database: Say for example
"autosubmit.db" at above created repository and thereafter:
`> cd /cfu/autosubmit`\
`> sqlite3 autosubmit.db`\
`sqlite3>.read ../../src/autosubmit.sql`\
`> chmod 777 autosubmit.db`
Use
---
- Autosubmit 2.4.1 [documentation](http://autosubmit.ic3.cat)
- --[Dmanubens](User:Dmanubens "wikilink")
([talk](User talk:Dmanubens "wikilink")) 17:27, 4 July 2014
(CEST) - Autosubmit 2.4.1 CFU presentation
![](AS241.pdf "fig:AS241.pdf")
- Autosubmit 2.4.0
[documentation](http://autosubmit.ic3.cat/autosubmit2.4.0)
- Autosubmit 2.3
[documentation](http://autosubmit.ic3.cat/autosubmit2.3)
- Autosubmit 2.2
[documentation](http://autosubmit.ic3.cat/autosubmit2.2)
- Autosubmit 2.1
[documentation](http://autosubmit.ic3.cat/autosubmit2.1)
Repository
----------
To check out a working copy of autosubmit, from the CFU network: git
clone <https://dev.cfu.local/autosubmit.git>
your\_path\_to\_working\_copy
Contact
-------
The coordinator of this project is Domingo Manubens Gil
\<domingo.manubens@ic3.cat\>
Domingo Manubens Gil \<domingo.manubens@ic3.cat\>, Oriol Mula-Valls
\<oriol.mula-valls@ic3.cat\>, Muhammad Asif \<muhammad.asif@ic3.cat\>,
Pierre-Antoine Bretonnière \<pierre-antoine.bretonniere@ic3.cat\>
As a new user, please register to this mailing list:
<http://autosubmit-users.ic3.cat/mailman/listinfo/autosubmit-users>
You'll then have access to the history of all the emails sent to the
users and presenting the functions and their available options.
Development
-----------
` == Development ==`
### SCRUM Framework
` === SCRUM Framework ===`
- [ SCRUM Framework](Tools/SCRUM "wikilink")
` * `[ `SCRUM` `Framework`](Tools/SCRUM "wikilink")
### GIT branching scheme
` ===GIT branching scheme===`
- Since Autosubmit 2.2, templates and postp have been moved to new GIT
projects. See the following presentations for better understanding:
- Autosubmit and GIT: new projects
![](ASandGIT.pdf "fig:ASandGIT.pdf")
- Autosubmit 2.3 and GIT ![](AS23andGIT.pdf "fig:AS23andGIT.pdf")
` * Since Autosubmit 2.2, templates and postp have been moved to new GIT projects. See the following presentations for better understanding:`\
` ** Autosubmit and GIT: new projects `![`ASandGIT.pdf`](ASandGIT.pdf "fig:ASandGIT.pdf")\
` ** Autosubmit 2.3 and GIT `![`AS23andGIT.pdf`](AS23andGIT.pdf "fig:AS23andGIT.pdf")` `
See the following page to check the current branching scheme used within
the GIT project 'autosubmit': [ Git branching
scheme](Computing/Git#GIT_branching_scheme "wikilink")
` See the following page to check the current branching scheme used within the GIT project 'autosubmit': `[
`Git` `branching`
`scheme`](Computing/Git#GIT_branching_scheme "wikilink")
Style Guide
-----------
` == Style Guide ==`\
` You can check the style guide for Autosubmit `[ `here`
You can check the style guide for Autosubmit [ here
](Tools/StyleGuides/Python "wikilink")
\ No newline at end of file
Clone repository
  • Code coverage
  • Deployment
  • Issues documenting different aspects
  • Leaflet
  • Possible Operational Problems and Solutions
  • Running Autosubmit in Earth Sciences
  • Testing_Suite
  • Updating ReadTheDocs Autosubmit documentation
  • Visual Identity
  • [DestinE] Autosubmit VM on Lumi
  • background
  • bibtex
  • databases
  • development
  • dissemination
View All Pages