Autosubmit is a tool to create, manage and monitor experiments by using configured Computing Clusters, HPC's and Supercomputers remotely via ssh. HOW TO DEPLOY/SETUP AUTOSUBMIT FRAMEWORK ======================================== - Autosubmit has been tested: with the following Operating Systems: * Linux Debian on the following HPC's/Clusters: * Ithaca (IC3's Cluster) * MareNostrum (Barcelona Supercomputing Center) * HECToR (UK Based supercomputer) * Lindgren (Swedish machine) - Pre-requisties: These packages (python2, python-argparse, python-dateutil, python-pydot, python-matplotlib, sqlite3) must be available at local machine. And the machine is also able to access HPC's/Clusters via password-less ssh. - Create a repository for experiments: Say for expample "/cfu/autosubmit" then edit the repository path into src/dir_config.py, src/expid.py, conf/autosubmit.conf - Create a blank database: Say for example "autosubmit.db" at above created repository and thereafter: > cd /cfu/autosubmit > sqlite3 autosubmit.db sqlite3>.read ../../src/autosubmit.sql > chmod 777 autosubmit.db HOW TO USE AUTOSUBMIT ===================== > cd src > python expid.py -h > python expid.py --new ecearth --HPC ithaca --description "experiment is about..." Say for example, "cxxx" is 4 character based expid generated by system automatically. First character "c" represents the platform such as "i" for ithaca, "b" for bsc, "h" for hector, "l" for lindgren, "e" for ecmwf and "m" for marenostrum3 etc. While rest of three characters "xxx" are to represent unique alphanumeric identity for the experiment. > vi /cfu/autosubmit/cxxx/conf/expdef_cxxx.conf > vi /cfu/autosubmit/cxxx/conf/autosubmit_cxxx.conf > python create_exp.py cxxx > ./setupexp.sh -e cxxx (in case if SETUP = TRUE in expdef_cxxx.conf then no need to run this command) > nohup python autosubmit.py cxxx >& cxxx_01.log & Cautions: - Before launching autosubmit check the following stuff: > ssh ithaca # say for example similarly check other HPC's where password-less ssh is feasible - After launching autosubmit, one must be aware of login expeiry limit and policy (if applicable for any HPC) and renew the login access accordingly (by using token/key etc) before expiry. HOW TO MONITOR EXPERIMENT ========================= > cd src > python monitor.py -h > python monitor.py -e cxxx -j job_list -o pdf or > python monitor.py -e cxxx -j job_list -o png Above generated plot with date & time stamp can be found at: /cfu/autosubmit/cxxx/plot/cxxx_date_time.pdf or /cfu/autosubmit/cxxx/plot/cxxx_date_time.png HOW TO RESTART EXPERIMENT ========================= > cd src > python recovery.py -h > python recovery.py -e cxxx -j job_list -g # getting/fetching completed files > python recovery.py -e cxxx -j job_list -s # saving the pickle file > nohup python autosubmit.py cxxx >& cxxx_02.log & HOW TO RERUN/EXTEND EXPERIMENT ============================== > cd src > vi /cfu/autosubmit/cxxx/conf/expdef_cxxx.conf # modify RERUN, CHUNKLIST > python create_exp.py cxxx > nohup python autosubmit.py cxxx >& cxxx_03.log & Monitor for RERUN ------------------ > python monitor.py -e cxxx -j rerun_job_list -o pdf Recovery for RERUN ------------------- > python recovery.py -e cxxx -j rerun_job_list -g > python recovery.py -e cxxx -j rerun_job_list -s