Platforms.conf settings conflict
Hello @dbeltran and @bdepaula,
Autosubmit Version
v3.14.0
Expid affected
a3pg
Which task has issues? Where is the log(If applicable)
-
Full_name: a3pg_REMOTE_SETUP
-
Log_Path: no job log, it cannot run
Summary
Setting the platforms.conf
to run with vertical and horizontal wrappers needs these parameters:
MAX_WALLCLOCK = 48:00
MAX_PROCESSORS = 2400
PROCESSORS_PER_NODE = 48
but then, when autosubmit generates the .cmd
files, it adds the next lines to the header (marked the ones conflicting):
#SBATCH --cpus-per-task=1
#SBATCH -n 4 <-- from jobs.conf remote_setup section
#SBATCH --tasks-per-node=48 <-- comes from the PROCESSORS_PER_NODE
#SBATCH -t 2:00:00
#SBATCH -J a3pg_REMOTE_SETUP
#SBATCH --output=/gpfs/scratch/bsc32/bsc32627/a3pg/LOG_a3pg/a3pg_REMOTE_SETUP.cmd.out.0
#SBATCH --error=/gpfs/scratch/bsc32/bsc32627/a3pg/LOG_a3pg/a3pg_REMOTE_SETUP.cmd.err.0
#SBATCH -p interactive <-- from jobs.conf remote_setup section
This causes the job to not be able to queue, since it asks for 48 tasks per node, when in the interactive queue to compile, there is a restriction to only 4 processors:
bsc32627@login3:~> squeue
JOBID PARTITION PRIORITY NAME QOS NOD TIME TIME_LIMIT ST NODELIST(REASON)
29032099 interacti 33327 a3pg_REMOTE_SETUP interac 1 0:00 2:00:00 PD (QOSMaxCpuPerJobLimit)
Steps to reproduce
Use interactive queue in a platform with the settings to run horizontal wrappers, the issue should happen. Otherwise, you can try to copy the a3pg
experiment configuration to reproduce it.
What is the current bug behavior?
The REMOTE_SETUP job gets stuck in the queue since it is submitted asking for more cpus than the queue allows.
What is the expected correct behavior?
The PROCESSORS_PER_NODE parameter from the platforms shouldn't affect all the jobs, ¿only the horizontal wrappers?