Domingo Manubens-Gil · cbb57d14
--- a/home.md
+++ b/home.md
+Objective
+---------
+
+Autosubmit launches and monitors experiments on any platform used at
+CFU. A general description of what is a typical climate forecast
+experiment and what is the goal of Autosubmit, more technical
+description of the architecture and how it works, how to install on your
+computer, user's manual and documentation and Autosubmit developers'
+page is available here.
+
+-   [List of experiments](http://enterprise:8000/autosubmit_v2) (only
+    accessible form inside CFU network)
+
+```[Autosubmit-triptic-01-2016.pdf](https://earth.bsc.es/gitlab/es/autosubmit/uploads/c3c51d514d45406efeb06a388de88f89/Autosubmit-triptic-01-2016.pdf)
+
+Description
+-----------
+
+### General Description
+
+#### Introduction
+
+A typical climate forecast experiment is a run of a climate model over a
+supercomputer having variable range of forecast length from a few months
+to a few years. And an experiment may have one or more than one
+start-dates and every start-date may comprise of single or many members.
+The full length of forecasting period for the experiment could be
+divided into number of chunks of fixed forecast length by exploiting the
+available options of model restart. Furthermore, in the context of
+computing operations, every chunk could have two big sections; parallel
+section where the actually model run would be performed by using
+computing cores of supercomputer and serial section(s) for performing
+other necessary operations like post-processing of the model output,
+archiving the model output and cleaning the disk space for the smooth
+proceeding of the experiment.
+![](Experiment_new.png "fig:Experiment_new.png")
+
+As we could see in the sample experiment which consists of 10
+start-dates from 1960 to 2005 where every start-date is independent of
+each other and starting after every 5 years while each start-date
+comprise of 5 members. Every member is also independent and has been
+divided into 10 chunks which are dependent on each other. Now let us
+suppose that the forecast length for each chunk is one year and every
+chunk comprises of three types of jobs; a simulation (Sim), a
+post-processing (Post) and an archiving and cleaning job (Clean).
+Therefore with this typical exemplary experiment, one start-date with
+one member comprise of 30 jobs and eventually 1500 jobs will be run in
+total for the completion of the experiment. In short, there is a need of
+a system to automate such type of typical experiments and optimize the
+use of resources.
+
+#### Goal
+
+Autosubmit is a tool to manage and monitor climate forecasting
+experiments by using supercomputers remotely and achieve the following
+goals:
+
+-   Efficient handling of highly dependant jobs
+-   Optimum utilization of available computing resources
+-   Ease of starting, stopping and live monitoring of experiments
+-   Auto restarting the experiment or some part of experiment in case of
+    failure
+-   Use of database for experiment creation and assigning automatic
+    experiment identity
+-   Ability to reproduce the completed experiments fully or partially.
+
+![](Autosubmit24.png "Autosubmit24.png")
+
+### Technical Description
+
+#### Introduction
+
+Originally, Autosubmit consisted in one perl script (written by Xavi
+Abellan\*) and could submit to the queue a sequence of jobs with
+different parameters. All the jobs had a common template and autosubmit
+would fill this template with different parameters value and submit the
+jobs to the queue. Autosubmit would act as a wrapper around the
+scheduler, monitoring the number of jobs submitted or queuing and would
+submit a new one as soon as a space in the queue would appear until the
+entire sequence of jobs is submitted.
+
+This concept has been kept for the current Python version of Autosubmit
+with a few capabilities added. The most interesting added capability is
+that Autosubmit can now deal with the dependency between jobs. (i.e.: it
+can wait for a particular job to finish before launching the next one)
+Autosubmit can manage different type of job with different templates.
+Autosubmit can also restart a failed job, stop the submission process
+and restart where it left it. New object oriented design and refactoring
+of Python code has been done in Autosubmit and now there is a new module
+to create experiments from scratch and store small information into a
+SQLite database. Thanks to this, there is also the possibility to
+create, manage and monitor different types of experiments (currently
+EC-Earth, NEMO and IFS) and to tackling with different queue schedulers
+(such as PBS, SGE and SLURM).
+
+![](Scheduler.png "Scheduler.png")
+
+#### What is a Job?
+
+A job in the HPC jargon is a program submitted to the queue system. It
+can be serial or multi-threaded, use different type of queue and have
+all the different directives than the scheduler of the HPC system
+provides. Within Autosubmit a Job Class has been created and in the rest
+of the documentation the term "Job" will refer to the python object from
+that class. A job has several attributes: -job.name : This name must be
+unique if several jobs are created. -job.id : This jobid is 0 by
+construction and will be set by the scheduler, hence will only be unique
+once the job has been submitted. -job.status: The status is updated
+regularly and will tell Autosubmit whether a Job is Ready to be
+submitted, completed, queuing etc. -job.type: Each job type has a
+different template, so you can treat differently multi-processors and
+serial jobs for example. -job.failcount: This counter is to keep track
+of the number of time that a job has failed. At the moment if it fails
+more than 4 times, the job is cancelled and not resubmitted.
+
+The depency between jobs is dealt with the concept of inheritance. Each
+Job has two more attributes: -job.Children : This is a list of dependent
+jobs. Those children can only be launched once this job is completed.
+-job.Parents : This the list of jobs from which it has to wait for
+completion. Only when this list is empty can a job be submitted.
+
+#### What is a JobList?
+
+The JobList module regroups all the functions necessary for managing a
+list of jobs. A joblist object can be sorted by status, type, jobid or
+name and sublists can also be created from there. The updateJobList()
+function is called at every loop of Autosubmit and does what it says on
+the tin. The status of a job is then only 'true' directly after the call
+of that function. The SaveJobList() function save the joblist in a
+pickle file which can then be reloaded for a restart for example. Other
+functions like updateGenealogy() are only called once after a joblist is
+created. When the joblist is created, the dependency or inheritance
+between jobs can only be created with the job names. The
+updateGenealogy() function replace the children and parents names by job
+objects.
+
+#### General HPCQueue
+
+Autosubmit needs to interact with the queue system regularly to know how
+many jobs are in the queue and thus how many jobs can be submitted. The
+HPCQueue abstract class provides all the functions necessary to
+communicate with the scheduler so a job can be at all time checked,
+cancel or submitted and the state of the queue assessed.
+
+#### Concrete HPCQueue
+
+A concrete queue is a specialization of an HPCQueue that inherits all
+the functions common in a general queue and has concrete attributes and
+concrete methods within each queue system. Autosubmit currently has the
+concrete modules to wrap the queue commands from MareNostrum machines,
+Ithaca cluster and Lindgren machines (MnQueue, ItQueue and LgQueue). A
+concrete queue has several attributes: -queue.host: This is the host
+name or the IP to set up connections. -queue.job\_status: Each job
+status has a different code depending on the queue scheduler, so you can
+treat differently the responses of each concrete HPCQueue.
+-queue.submit\_cmd: This is the concrete command to submit jobs.
+-queue.checkjob\_cmd: This is the concrete command to check a job
+status. -queue.cancel\_cmd: This is the concrete command to cancel jobs.
+
+![](Queues.png "Queues.png")
+
+#### Monitoring the experiment
+
+Additional functionality to monitor an experiment have been added in
+Autosubmit. From the joblist, it is possible to create a "tree" to
+visualize the status of the joblist. Each status has a different color
+scheme: Green = running, red = failed etc.
+
+![](JobListTree.png "JobListTree.png")
+
+#### Job Wrapper
+
+Currently supercomputers are increasing their number of cores rapidly
+but also the rules to make use of them are become more strict (e.g.
+minimum number of cores per job 2000). This is not feasible with the
+current state of the EC-Earth which is difficult to scale beyond a few
+hundred cores.
+
+In order to provide a solution to the climate community we have been
+making some test with a job wrapper. The idea behind this is to run
+several ensamble members at the same time under the control of a python
+script. We upload the script for each ensamble member we want to run.
+The wrapper has to allocate resources for each of the script to run
+(i.e. if each of the scripts requires 45 CPU and we want to run 10 that
+would be 450). The wrapping python script creates a thread for every
+ensamble member and runs them.
+
+Further information:
+
+1.  International Conference on Computational Science (Cairns,
+    Australia, June 10 - 12, 2014), Impact of I/O and Data Management in
+    Ensemble Large Scale Climate Forecasting Using EC-Earth3.
+    ![]( Poster_Masif_ICCS_2014.pdf  "fig: Poster_Masif_ICCS_2014.pdf ")
+2.  [Asif](:File: masif_procs_2014.pdf "wikilink"), M., A. Cencerrado,
+    O. Mula-Valls, D. Manubens, F.J. Doblas-Reyes and A. Cortés (2014).
+    Impact of I/O and data management in ensemble large scale climate
+    forecasting using EC-Earth3. [Procedia Computer Science, 29,
+    2370-2379,
+    10.1016/j.procs.2014.05.221](http://www.sciencedirect.com/science/article/pii/S1877050914003986)
+    (SPECS, IS-ENES2, INCITE).
+
+\<!--===== Lindgren =====
+![](lindgren-test1-1.png "fig:lindgren-test1-1.png")
+![](lindgren-test1-2.png "fig:lindgren-test1-2.png")
+![](lindgren-test1-3.png "fig:lindgren-test1-3.png")
+![](lindgren-test1-4.png "fig:lindgren-test1-4.png")
+
+##### Jaguar
+
+![](jaguar-test1-1.png "fig:jaguar-test1-1.png")
+![](jaguar-test1-2.png "fig:jaguar-test1-2.png")
+![](jaguar-test1-3.png "fig:jaguar-test1-3.png")
+![](jaguar-test1-4.png "fig:jaguar-test1-4.png")
+
+![](jaguar-test2-1.png "fig:jaguar-test2-1.png")
+![](jaguar-test2-2.png "fig:jaguar-test2-2.png")
+![](jaguar-test2-3.png "fig:jaguar-test2-3.png")
+![](jaguar-test2-4.png "fig:jaguar-test2-4.png") --\>
+
+#### IS-ENES 2
+
+##### A CNRM-CM6 monitoring using Autosubmit
+
+A few members of seasonal forecast experiment using CNRM-CM6 on ECMWF
+IBM Power 7 has been performed using Autosubmit monitoring. A few day
+long collaboration at IC3 has been sufficient to adapt the existing CNRM
+workflow script to Autosubmit non-intrusive requirements. Nevertheless,
+a more comprehensive work would be necessary to fully exploit Autosubmit
+capabilities to monitor and control the full workflow (from compiling)
+on any kind of supercomputer platform.
+
+The technical report descirbing the work is available here:
+<http://www.cerfacs.fr/globc/publication/technicalreport/2014/autosubmit_cnrm-cm.pdf>
+
+Requirements
+------------
+
+### How to deploy/setup Autosubmit (v2)
+
+Autosubmit has been tested: with the following Operating Systems:
+
+-   Linux Debian
+
+on the following HPC's/Clusters:
+
+-   Ithaca (IC3 machine)
+-   MareNostrum (BSC machine)
+-   MareNostrum3 (BSC machine)
+-   HECToR (EPCC machine)
+-   Lindgren (PDC machine)
+-   C2A (ECMWF machine)
+-   ARCHER (EPCC machine)
+
+Pre-requisties: These packages (python2, python-argparse,
+python-dateutil, python-pydot, python-matplotlib, sqlite3) must be
+available at local machine. And the machine is also able to access
+HPC's/Clusters via password-less ssh.
+
+Create a repository for experiments: Say for example "/cfu/autosubmit"
+then edit the repository path into src/dir\_config.py, src/expid.py,
+conf/autosubmit.conf Create a blank database: Say for example
+"autosubmit.db" at above created repository and thereafter:
+
+`> cd /cfu/autosubmit`\
+`> sqlite3 autosubmit.db`\
+`sqlite3>.read ../../src/autosubmit.sql`\
+`> chmod 777 autosubmit.db`
+
+Use
+---
+
+-   Autosubmit 2.4.1 [documentation](http://autosubmit.ic3.cat)
+    -   --[Dmanubens](User:Dmanubens "wikilink")
+        ([talk](User talk:Dmanubens "wikilink")) 17:27, 4 July 2014
+        (CEST) - Autosubmit 2.4.1 CFU presentation
+        ![](AS241.pdf "fig:AS241.pdf")
+-   Autosubmit 2.4.0
+    [documentation](http://autosubmit.ic3.cat/autosubmit2.4.0)
+-   Autosubmit 2.3
+    [documentation](http://autosubmit.ic3.cat/autosubmit2.3)
+-   Autosubmit 2.2
+    [documentation](http://autosubmit.ic3.cat/autosubmit2.2)
+-   Autosubmit 2.1
+    [documentation](http://autosubmit.ic3.cat/autosubmit2.1)
+
+Repository
+----------
+
+To check out a working copy of autosubmit, from the CFU network: git
+clone <https://dev.cfu.local/autosubmit.git>
+your\_path\_to\_working\_copy
+
+Contact
+-------
+
+The coordinator of this project is Domingo Manubens Gil
+\<domingo.manubens@ic3.cat\>
+
+Domingo Manubens Gil \<domingo.manubens@ic3.cat\>, Oriol Mula-Valls
+\<oriol.mula-valls@ic3.cat\>, Muhammad Asif \<muhammad.asif@ic3.cat\>,
+Pierre-Antoine Bretonnière \<pierre-antoine.bretonniere@ic3.cat\>
+
+As a new user, please register to this mailing list:
+<http://autosubmit-users.ic3.cat/mailman/listinfo/autosubmit-users>
+You'll then have access to the history of all the emails sent to the
+users and presenting the functions and their available options.
+
+Development
+-----------
+
+### SCRUM Framework
+
+-   [ SCRUM Framework](Tools/SCRUM "wikilink")
+
+### GIT branching scheme
+
+-   Since Autosubmit 2.2, templates and postp have been moved to new GIT
+    projects. See the following presentations for better understanding:
+    -   Autosubmit and GIT: new projects
+        ![](ASandGIT.pdf "fig:ASandGIT.pdf")
+    -   Autosubmit 2.3 and GIT ![](AS23andGIT.pdf "fig:AS23andGIT.pdf")
+
+See the following page to check the current branching scheme used within
+the GIT project 'autosubmit': [ Git branching
+scheme](Computing/Git#GIT_branching_scheme "wikilink")
+
+Style Guide
+-----------
+
+You can check the style guide for Autosubmit [ here
+](Tools/StyleGuides/Python "wikilink")
+
+```bash
+$ autosubmit expid -H HPCname -d Description