Newer
Older
Autosubmit is a tool to create, manage and monitor experiments by using
Muhammad Asif
committed
configured Computing Clusters, HPC's and Supercomputers remotely via ssh.
Muhammad Asif
committed
HOW TO DEPLOY/SETUP AUTOSUBMIT FRAMEWORK
========================================
Muhammad Asif
committed
- Autosubmit has been tested:
Muhammad Asif
committed
with the following Operating Systems:
* Linux Debian
Muhammad Asif
committed
on the following HPC's/Clusters:
* Ithaca (IC3 machine)
* MareNostrum (BSC machine)
* MareNostrum3 (BSC machine)
* HECToR (EPCC machine)
* Lindgren (PDC machine)
* C2A (ECMWF machine)
* ARCHER (EPCC machine)
Muhammad Asif
committed
Javier Vegas-Regidor
committed
- Pre-requisties: These packages (bash, python2, sqlite3, git-scm > 1.8.2, subversion) must be available at local
machine. These packages (argparse, dateutil, pyparsing, numpy, pydotplus, matplotlib) must be available for
python runtime. And the machine is also able to access HPC's/Clusters via password-less ssh.
Muhammad Asif
committed
- Install Autosubmit
> pip install autosubmit
or download, unpack and "python setup.py install"
- Create a repository for experiments: Say for example "/cfu/autosubmit" then
edit the repository path (LOCAL_ROOT_DIR) into autosubmit/config/dir_config.py
Muhammad Asif
committed
- Create a blank database: Say for example "autosubmit.db" at above created repository:
> cp autosubmit/database/data/autosubmit.sql /cfu/autosubmit/
> cd /cfu/autosubmit
> sqlite3 autosubmit.db
sqlite3>.read autosubmit.sql
> chmod 775 autosubmit.db
then edit the database file path and name (DB_DIR, DB_FILE, DB_NAME) into autosubmit/config/dir_config.py
Muhammad Asif
committed
HOW TO USE AUTOSUBMIT
=====================
Muhammad Asif
committed
To run AUTOSUBMiT experiments at CFU a production environment is set up at the local virtual machine "enterprise".
Muhammad Asif
committed
> python expid.py -h
Muhammad Asif
committed
> python expid.py --new --HPC ithaca --description "experiment is about..."
Say for example, "cxxx" is 4 character based expid generated by system automatically.
First character "c" represents the platform such as "i" for ithaca, "b" for bsc,
"h" for hector, "l" for lindgren, "e" for ecmwf and "m" for marenostrum3 etc. While rest
of three characters "xxx" are to represent unique alphanumeric identity for the experiment.
Muhammad Asif
committed
Muhammad Asif
committed
Cautions:
- Before launching autosubmit check the following stuff:
> ssh ithaca # say for example similarly check other HPC's where password-less ssh is feasible
- After launching autosubmit, one must be aware of login expeiry limit and policy (if applicable for any HPC)
and renew the login access accordingly (by using token/key etc) before expiry.
Muhammad Asif
committed
HOW TO MONITOR EXPERIMENT
=========================
Muhammad Asif
committed
> python monitor.py -h
Muhammad Asif
committed
Muhammad Asif
committed
Above generated plot with date & time stamp can be found at:
Muhammad Asif
committed
Muhammad Asif
committed
HOW TO RESTART EXPERIMENT
=========================
Muhammad Asif
committed
Muhammad Asif
committed
> python recovery.py -h
Muhammad Asif
committed
> python recovery.py -e cxxx -j job_list -g # getting/fetching completed files
Muhammad Asif
committed
> python recovery.py -e cxxx -j job_list -s # saving the pickle file
Muhammad Asif
committed
dmanubens
committed
HOW TO RERUN/EXTEND EXPERIMENT
==============================
dmanubens
committed
> vi /cfu/autosubmit/cxxx/conf/expdef_cxxx.conf # modify RERUN, CHUNKLIST
dmanubens
committed
dmanubens
committed
dmanubens
committed
Monitor for RERUN
------------------
> python monitor.py -e cxxx -j rerun_job_list -o pdf
dmanubens
committed
Recovery for RERUN
-------------------
dmanubens
committed