Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in
  • autosubmit autosubmit
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 338
    • Issues 338
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 21
    • Merge requests 21
  • Deployments
    • Deployments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • Earth SciencesEarth Sciences
  • autosubmitautosubmit
  • Issues
  • #1062
Closed
Open
Issue created Jun 20, 2023 by Eric Ferrer@eferre1Developer

Platforms.conf settings conflict

Hello @dbeltran and @bdepaula,

Autosubmit Version

v3.14.0

Expid affected

a3pg

Which task has issues? Where is the log(If applicable)

  • Full_name: a3pg_REMOTE_SETUP

  • Log_Path: no job log, it cannot run

Summary

Setting the platforms.conf to run with vertical and horizontal wrappers needs these parameters:

MAX_WALLCLOCK = 48:00
MAX_PROCESSORS = 2400
PROCESSORS_PER_NODE = 48

but then, when autosubmit generates the .cmd files, it adds the next lines to the header (marked the ones conflicting):

#SBATCH --cpus-per-task=1
#SBATCH -n 4                      <-- from jobs.conf remote_setup section
#SBATCH --tasks-per-node=48       <-- comes from the PROCESSORS_PER_NODE
#SBATCH -t 2:00:00
#SBATCH -J a3pg_REMOTE_SETUP
#SBATCH --output=/gpfs/scratch/bsc32/bsc32627/a3pg/LOG_a3pg/a3pg_REMOTE_SETUP.cmd.out.0
#SBATCH --error=/gpfs/scratch/bsc32/bsc32627/a3pg/LOG_a3pg/a3pg_REMOTE_SETUP.cmd.err.0
#SBATCH -p interactive            <-- from jobs.conf remote_setup section

This causes the job to not be able to queue, since it asks for 48 tasks per node, when in the interactive queue to compile, there is a restriction to only 4 processors:

bsc32627@login3:~> squeue
  JOBID      PARTITION PRIORITY NAME                           QOS     NOD TIME     TIME_LIMIT ST NODELIST(REASON)
  29032099   interacti 33327    a3pg_REMOTE_SETUP              interac 1   0:00     2:00:00    PD (QOSMaxCpuPerJobLimit)

Steps to reproduce

Use interactive queue in a platform with the settings to run horizontal wrappers, the issue should happen. Otherwise, you can try to copy the a3pg experiment configuration to reproduce it.

What is the current bug behavior?

The REMOTE_SETUP job gets stuck in the queue since it is submitted asking for more cpus than the queue allows.

What is the expected correct behavior?

The PROCESSORS_PER_NODE parameter from the platforms shouldn't affect all the jobs, ¿only the horizontal wrappers?

Assignee
Assign to
Time tracking