- 07 Jul, 2015 16 commits
-
-
Trey Dockendorf authored
-
Trey Dockendorf authored
Add job record qos field and partition record allow_qos field.
-
Trey Dockendorf authored
-
Morris Jette authored
-
Trey Dockendorf authored
This patch moves the QOS update of an existing job to be before the partition update. This ensures a new QOS value is the value used when doing validations against things like a partition's AllowQOS and DenyQOS. Currently if a two partitions have AllowQOS that do not share any QOS, the order of updates prevents a job from being moved from one partition to another using something like the following: scontrol update job=<jobID> partition=<new part> qos=<new qos>
-
Morris Jette authored
-
Morris Jette authored
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
-
David Bigagli authored
-
Danny Auble authored
-
Danny Auble authored
value for booleans.
-
Danny Auble authored
Conflicts: src/common/slurm_step_layout.c
-
Morris Jette authored
Correct task layout with CR_Pack_Node option and more than 1 CPU per task. Previous logic would place one task per CPU launch too few tasks. bug 1781
-
Danny Auble authored
-
- 06 Jul, 2015 5 commits
-
-
Nathan Yee authored
-
Nathan Yee authored
-
Morris Jette authored
Backfill scheduler now considers OverTimeLimit and KillWait configuration parameters to estimate when running jobs will exit. Initially the job's end time is estimated based upon it's time limit. After the time limit is reached, the end time estimate is based upon the OverTimeLimit and KillWait configuration parameters. bug 1774
-
Morris Jette authored
Backfill scheduler: The configured backfill_interval value (default 30 seconds) is now interpretted as a maximum run time for the backfill scheduler. Once reached, the scheduler will build a new job queue and start over, even if not all jobs have been tested. bub 1774
-
Jason Coverston authored
of the size returned. This check is redundant though since getgrouplist will return -1 if it tries to use more than ngroups_max. Perhaps we should just take the check out.
-
- 03 Jul, 2015 1 commit
-
-
Morris Jette authored
-
- 02 Jul, 2015 18 commits
-
-
Morris Jette authored
No change in logic
-
Morris Jette authored
Original patch from LLNL assumed Munge was installed, which would result in a build error if the Munge development package was not installed
-
Adam Moody authored
-
Adam Moody authored
-
Adam Moody authored
-
Adam Moody authored
-
Don Lipari authored
If a user runs sreport and specifies a time period that stretches into the future, this patch sets the end of the period to the current time.
-
Mark A. Grondona authored
In order to successfully remove the freezer cgroup at the end of a job step, the slurmstepd process itself must first be moved outside of the cgroup, or removal will always fail. This fix moves the slurmstepd back to the root cgroup just before the rmdir operations are attempted.
-
Morris Jette authored
-
Mark A. Grondona authored
If the job prolog is running we can't send a signal to job step tasks, so return SLURM_FAILURE instead of ESLURM_INVALID_JOB_ID. This should cause the caller to retry, instead of assuming the job step is not running on the node.
-
Mark A. Grondona authored
If a job step request comes in while the slurm prolog is running, slurmd will happily launch the job step. This means that a user could run code before the prolog is complete, which could cause strange errors or in some cases a security issue. Instead return an error (EINPROGRESS for now) and do not allow job steps to run during prolog.
-
Mark A. Grondona authored
Create a new function, _prolog_is_running(), to determine if a job prolog is currently running from within slurmd, and use this new function in place of list_find_first in _wait_for_job_running_prolog.
-
Mark A. Grondona authored
Some cgroup code would retry continuously if somehow a cgroup didn't exist when xcgroup_delete() was called (See for instance proctrack/cgroup's slurm_container_plugin_wait()). To fix this for all callers, return SUCCESS from xcgroup_delete() if rmdir(2) return ENOENT, since the cgroup is already deleted.
-
Nathan Yee authored
Test of gres.conf configurations options test 913
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Add association usage information to "scontrol show cache" command output.
-