- 04 Apr, 2019 6 commits
-
-
Morris Jette authored
-
Morris Jette authored
gres needs to locally keep the mps_table size rather than use node_record_count, which gets reset to zero at shutdown.
-
Morris Jette authored
Check for out of range node index. Not observed, but prevents possible invalid memory reference.
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
- 03 Apr, 2019 22 commits
-
-
Morris Jette authored
Copied array without including the array size pointer, so it did not get freed.
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
It was failing due to an Epilog, but could also fail when run in parallel with other jobs.
-
Morris Jette authored
This includes information about how to get a clean HWLOC report.
-
Morris Jette authored
Without this change I was able to fairly consistently cause "scontrol shutdown" to NOT cause the slurmd to exit. 1. Start slurmd and slurmctld 2. Immediately execute "scontrol reconfig" and "scontrol shutdown"
-
Morris Jette authored
Format changes only
-
Morris Jette authored
log message and comment format changes
-
Alejandro Sanchez authored
Bug 5851.
-
Danny Auble authored
# Conflicts: # slurm/slurm.h.in
-
Danny Auble authored
-
Alejandro Sanchez authored
This prevents rebuilding a job's dependency string when it has at least one invalid (never satisfied) dependency, no matter if such invalid dependency has already been purged (after MinJobAge) or not. This can be useful to track down the culprit invalid dependencies even after they are gone from ctld's in-memory job list. The flag is cleared upon a successful job dependency update or after another job in the dependency list has been satisfied if such list is composed with the '?' symbol (OR'ed). Bug 5851.
-
Alejandro Sanchez authored
Job dependencies separated by "?" (OR'ed) should make the dependant job be independent as soon as any of the dependencies are resolved to be satisfied. Without this patch, if an invalid (non satisfiable) dependency was resolved before a satisfiable one, then the dependant job would never become independent, even after the satisfiable one got eventually resolved. Bug 5851.
-
Alejandro Sanchez authored
No functional change, just preparement for a following commit with an actual fix. Bug 5851.
-
Felip Moll authored
The response of the XCC raw command is always 16 bytes, we log it and return if we don't get an answer of this size. Bug 6743
-
Morris Jette authored
-
Morris Jette authored
If GRES configuration data is unavailable from gres.conf, then use the node's "Gres=" information slurm.conf. This will eliminate or minimize the gres.conf file in many situations. bug 6761
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
They were a bit too verbose for my taste
-
- 02 Apr, 2019 6 commits
-
-
Felip Moll authored
In 0e149092 not setting the variable when job was not requesting any gres was considered a bug. The cuda API will use all devices if the variable is not set. If it is set to some unknown or empty value, it will use no devices. This variable should be used only for testing purposes and ConstrainDevices=yes in cgroup is recommended. Bug 6412
-
Felip Moll authored
gres plugins will setup environment for every gres in the system even if the job has not requested it. Bug 6412
-
Felip Moll authored
than one GRES of the same name but different type" This reverts f7fca7ba Bug 6412
-
Morris Jette authored
initial work needed for bug 6761 support
-
Morris Jette authored
comment format and change some log messages
-
Morris Jette authored
this problem was triggered with a configuation of PrologFlags=Alloc,Contain
-
- 01 Apr, 2019 2 commits
-
-
Morris Jette authored
This eliminates a slurmctld error message when a job shrinks to size zero. There is no need to re-compute the CPU count and the job_resources node_bitmap is empty. Logic works fine without this change if job size shrinks, but not to size zero. bug 6472
-
Morris Jette authored
When a job size was reset to zero, this error message was printed: slurm_allocation_lookup: Job/step already completing or completed which may lead the user to believe the operation failed when it worked as planned. Now it prints this: To reset Slurm environment variables, execute For bash or sh shells: . ./slurm_job_43565_resize.sh For csh shells: source ./slurm_job_43565_resize.csh Where the reset scripts contain zero node count information: export SLURM_NODELIST="" export SLURM_JOB_NODELIST="" export SLURM_NNODES=0 export SLURM_JOB_NUM_NODES=0 export SLURM_JOB_CPUS_PER_NODE="" unset SLURM_NPROCS unset SLURM_NTASKS unset SLURM_TASKS_PER_NODE
-
- 31 Mar, 2019 4 commits
-
-
Brian Christiansen authored
-
Brian Christiansen authored
Continuation of 2764f3fd Bug 6589
-
Brian Christiansen authored
Continuation of 9a243a1a Bug 6592
-
Brian Christiansen authored
-