README 2.96 KB
Newer Older
Autosubmit is a tool to create, manage and monitor experiments by using 
configured Computing Clusters, HPC's and Supercomputers remotely via ssh.
Muhammad Asif's avatar
 
Muhammad Asif committed

HOW TO DEPLOY/SETUP AUTOSUBMIT FRAMEWORK
========================================
- Autosubmit has been tested:
  with the following Operating Systems:
   * Linux Debian
  on the following HPC's/Clusters:
   * Ithaca (IC3's Cluster)
   * MareNostrum (Barcelona Supercomputing Center)
   * HECToR (UK Based supercomputer)
   * Lindgren (Swedish machine)
- Pre-requisties: These packages (python2, python-argparse, python-dateutil, 
python-pydot, python-matplotlib, sqlite3) must be available at local machine.
And the machine is also able to access HPC's/Clusters via password-less ssh.
- Create a repository for experiments: Say for expample "/cfu/autosubmit" then
  edit the repository path into src/dir_config.py, src/expid.py, conf/autosubmit.conf
- Create a blank database: Say for example "autosubmit.db" at above created repository
  and thereafter:
   > cd /cfu/autosubmit
   > sqlite3 autosubmit.db
   sqlite3>.read ../../src/autosubmit.sql
   > chmod 777 autosubmit.db
HOW TO USE AUTOSUBMIT
=====================
> python expid.py --new ecearth --HPC ithaca --description "experiment is about..."
Muhammad Asif's avatar
 
Muhammad Asif committed

Muhammad Asif's avatar
Muhammad Asif committed
Say for example, "cxxx" is 4 character based expid generated by system automatically.
First character "c" represents the platform such as "i" for ithaca, "b" for bsc, 
"h" for hector, "l" for lindgren, "e" for ecmwf and "m" for marenostrum3 etc. While rest
of three characters "xxx" are to represent unique alphanumeric identity for the experiment.
Muhammad Asif's avatar
 
Muhammad Asif committed

Muhammad Asif's avatar
Muhammad Asif committed
> vi /cfu/autosubmit/cxxx/conf/expdef_cxxx.conf
Muhammad Asif's avatar
Muhammad Asif committed
> vi /cfu/autosubmit/cxxx/conf/autosubmit_cxxx.conf
Muhammad Asif's avatar
 
Muhammad Asif committed

Muhammad Asif's avatar
Muhammad Asif committed
> python create_exp.py cxxx
Muhammad Asif's avatar
 
Muhammad Asif committed

Muhammad Asif's avatar
Muhammad Asif committed
> ./setupexp.sh -e cxxx (in case if SETUP = TRUE in expdef_cxxx.conf then no need to run this command)
Muhammad Asif's avatar
Muhammad Asif committed
> nohup python autosubmit.py cxxx >& cxxx_01.log &
HOW TO MONITOR EXPERIMENT
=========================
Muhammad Asif's avatar
 
Muhammad Asif committed

Muhammad Asif's avatar
Muhammad Asif committed
> python monitor.py -e cxxx -j job_list -o pdf
Muhammad Asif's avatar
Muhammad Asif committed
> python monitor.py -e cxxx -j job_list -o png
Above generated plot with date & time stamp can be found at:
Muhammad Asif's avatar
Muhammad Asif committed
/cfu/autosubmit/cxxx/plot/cxxx_date_time.pdf
Muhammad Asif's avatar
Muhammad Asif committed
/cfu/autosubmit/cxxx/plot/cxxx_date_time.png
HOW TO RESTART EXPERIMENT
=========================
Muhammad Asif's avatar
Muhammad Asif committed
> python recovery.py -e cxxx -j job_list -g # getting/fetching completed files
Muhammad Asif's avatar
Muhammad Asif committed
> python recovery.py -e cxxx -j job_list -s # saving the pickle file
Muhammad Asif's avatar
Muhammad Asif committed
> nohup python autosubmit.py cxxx >& cxxx_02.log &


HOW TO RERUN/EXTEND EXPERIMENT
==============================

> cd src

Muhammad Asif's avatar
Muhammad Asif committed
> vi /cfu/autosubmit/cxxx/conf/expdef_cxxx.conf # modify RERUN, CHUNKLIST
Muhammad Asif's avatar
Muhammad Asif committed
> python create_exp.py cxxx
Muhammad Asif's avatar
Muhammad Asif committed
> nohup python autosubmit.py cxxx >& cxxx_03.log &
Muhammad Asif's avatar
Muhammad Asif committed
> python monitor.py -e cxxx -j rerun_job_list -o pdf
Muhammad Asif's avatar
Muhammad Asif committed
> python recovery.py -e cxxx -j rerun_job_list -g 
Muhammad Asif's avatar
Muhammad Asif committed
> python recovery.py -e cxxx -j rerun_job_list -s