Newer
Older
Autosubmit is a tool to create, manage and monitor experiments by using
Muhammad Asif
committed
configured Computing Clusters, HPC's and Supercomputers remotely via ssh.
Muhammad Asif
committed
HOW TO DEPLOY/SETUP AUTOSUBMIT FRAMEWORK
========================================
Muhammad Asif
committed
- Autosubmit has been tested:
Muhammad Asif
committed
with the following Operating Systems:
* Linux Debian
Muhammad Asif
committed
on the following HPC's/Clusters:
* Ithaca (IC3's Cluster)
* MareNostrum (Barcelona Supercomputing Center)
* HECToR (UK Based supercomputer)
* Lindgren (Swedish machine)
Muhammad Asif
committed
- Pre-requisties: These packages (python2, python-argparse, python-dateutil,
python-pydot, python-matplotlib, sqlite3) must be available at local machine.
And the machine is also able to access HPC's/Clusters via password-less ssh.
Muhammad Asif
committed
- Create a repository for experiments: Say for expample "/cfu/autosubmit" then
edit the repository path into src/dir_config.py, src/expid.py, conf/autosubmit.conf
Muhammad Asif
committed
- Create a blank database: Say for example "autosubmit.db" at above created repository
and thereafter:
> cd /cfu/autosubmit
> sqlite3 autosubmit.db
sqlite3>.read ../../src/autosubmit.sql
> chmod 777 autosubmit.db
Muhammad Asif
committed
HOW TO USE AUTOSUBMIT
=====================
Muhammad Asif
committed
> cd src
Muhammad Asif
committed
> python expid.py -h
Muhammad Asif
committed
> python expid.py --new ecearth --HPC ithaca --description "experiment is about..."
Say for example, "chex" is 4 character based expid generated by system.
First character "c" represents the platform such as "i" for ithaca, "b" for
bsc, "h" for hector etc. While the rest of three characters are to represent
unique hexadecimal number for the experiment.
> vi /cfu/autosubmit/chex/conf/expdef_chex.conf
Muhammad Asif
committed
> vi /cfu/autosubmit/chex/conf/autosubmit_chex.conf
> python create_exp.py chex
> nohup python autosubmit.py chex >& chex_01.log &
Muhammad Asif
committed
Muhammad Asif
committed
HOW TO MONITOR EXPERIMENT
=========================
> cd src
Muhammad Asif
committed
> python monitor.py -h
Muhammad Asif
committed
> python monitor.py -e chex -j job_list -o pdf
or
> python monitor.py -e chex -j job_list -o png
Muhammad Asif
committed
Above generated plot with date & time stamp can be found at:
Muhammad Asif
committed
/cfu/autosubmit/chex/plot/chex_date_time.pdf
or
/cfu/autosubmit/chex/plot/chex_date_time.png
Muhammad Asif
committed
HOW TO RESTART EXPERIMENT
=========================
Muhammad Asif
committed
> cd src
Muhammad Asif
committed
> python recovery.py -h
Muhammad Asif
committed
dmanubens
committed
> python recovery.py -e chex -j job_list -g # getting/fetching completed files
Muhammad Asif
committed
dmanubens
committed
> python recovery.py -e chex -j job_list -s # saving the pickle file
Muhammad Asif
committed
> nohup python autosubmit.py chex >& chex_02.log &
dmanubens
committed
HOW TO RERUN/EXTEND EXPERIMENT
==============================
> cd src
> vi /cfu/autosubmit/chex/conf/expdef_chex.conf # modify RERUN, CHUNKLIST
> python create_exp.py chex
> nohup python autosubmit.py chex >& chex_03.log &
Monitor for RERUN
------------------
> python monitor.py -e chex -j rerun_job_list -o pdf
Recovery for RERUN
-------------------
> python recovery.py -e chex -j rerun_job_list -g
> python recovery.py -e chex -j rerun_job_list -s