- 07 Jul, 2015 3 commits
-
-
Danny Auble authored
Conflicts: src/common/slurm_step_layout.c
-
Morris Jette authored
Correct task layout with CR_Pack_Node option and more than 1 CPU per task. Previous logic would place one task per CPU launch too few tasks. bug 1781
-
Danny Auble authored
-
- 06 Jul, 2015 5 commits
-
-
Nathan Yee authored
-
Nathan Yee authored
-
Morris Jette authored
Backfill scheduler now considers OverTimeLimit and KillWait configuration parameters to estimate when running jobs will exit. Initially the job's end time is estimated based upon it's time limit. After the time limit is reached, the end time estimate is based upon the OverTimeLimit and KillWait configuration parameters. bug 1774
-
Morris Jette authored
Backfill scheduler: The configured backfill_interval value (default 30 seconds) is now interpretted as a maximum run time for the backfill scheduler. Once reached, the scheduler will build a new job queue and start over, even if not all jobs have been tested. bub 1774
-
Jason Coverston authored
of the size returned. This check is redundant though since getgrouplist will return -1 if it tries to use more than ngroups_max. Perhaps we should just take the check out.
-
- 03 Jul, 2015 1 commit
-
-
Morris Jette authored
-
- 02 Jul, 2015 18 commits
-
-
Morris Jette authored
No change in logic
-
Morris Jette authored
Original patch from LLNL assumed Munge was installed, which would result in a build error if the Munge development package was not installed
-
Adam Moody authored
-
Adam Moody authored
-
Adam Moody authored
-
Adam Moody authored
-
Don Lipari authored
If a user runs sreport and specifies a time period that stretches into the future, this patch sets the end of the period to the current time.
-
Mark A. Grondona authored
In order to successfully remove the freezer cgroup at the end of a job step, the slurmstepd process itself must first be moved outside of the cgroup, or removal will always fail. This fix moves the slurmstepd back to the root cgroup just before the rmdir operations are attempted.
-
Morris Jette authored
-
Mark A. Grondona authored
If the job prolog is running we can't send a signal to job step tasks, so return SLURM_FAILURE instead of ESLURM_INVALID_JOB_ID. This should cause the caller to retry, instead of assuming the job step is not running on the node.
-
Mark A. Grondona authored
If a job step request comes in while the slurm prolog is running, slurmd will happily launch the job step. This means that a user could run code before the prolog is complete, which could cause strange errors or in some cases a security issue. Instead return an error (EINPROGRESS for now) and do not allow job steps to run during prolog.
-
Mark A. Grondona authored
Create a new function, _prolog_is_running(), to determine if a job prolog is currently running from within slurmd, and use this new function in place of list_find_first in _wait_for_job_running_prolog.
-
Mark A. Grondona authored
Some cgroup code would retry continuously if somehow a cgroup didn't exist when xcgroup_delete() was called (See for instance proctrack/cgroup's slurm_container_plugin_wait()). To fix this for all callers, return SUCCESS from xcgroup_delete() if rmdir(2) return ENOENT, since the cgroup is already deleted.
-
Nathan Yee authored
Test of gres.conf configurations options test 913
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Add association usage information to "scontrol show cache" command output.
-
- 01 Jul, 2015 8 commits
-
-
Dorian Krause authored
Dear all, we noticed that debugX() and error() calls in srun mess up the output if the --pty flag is used. The reason for this behavior is that srun sets the terminal in raw mode and disables output processing. This is fine for the output forwarded from the remote damon but not for local printf()s by the srun process. This problem can be circumented by re-enabling OPOST after the cfmakeraw() call in srun(). A patch is pasted at the bottom of the mail. It works nicely on Linux but I am not 100% sure it may not have some undesirable side effects on other platforms. Best regards, Dorian Example with current HEAD: [snip] srun: jobid 10: nodes(1):`node1', cpu counts: 1(x1) srun: launching 10.0 on host node1, 1 tasks: 0 srun: route default plugin loaded srun: Node node1, 1 tasks started srun: Received task exit notification for 1 task (status=0x0100). srun: error: node1: task 0: Exited with exit code 1 # Example with the patch applied: [snip] srun: jobid 11: nodes(1):`node1', cpu counts: 1(x1) srun: launching 11.0 on host node1, 1 tasks: 0 srun: route default plugin loaded srun: Node node1, 1 tasks started srun: Received task exit notification for 1 task (status=0x0100). srun: error: node1: task 0: Exited with exit code 1
-
Morris Jette authored
Add job, step and reservation TRES information to sview command
-
Morris Jette authored
-
Morris Jette authored
Prevent test failure if the compute node does not permit user control over CPU frequency (no "userspace" governor).
-
Morris Jette authored
-
Brian Christiansen authored
When submitting a job with srun -n# the job may be allocated more than # because the job was given the whole core or socket (eg. CR_CORE, CR_SOCKET). sacct showed only what the step used and not the allocation. This commit shows the job and the step if job and step cpus are different.
-
Thomas Cadeau authored
man page.
-
Morris Jette authored
Major re-write of the sreport command to support --tres job option and permit users to select specific tracable resources to generate reports for. For most reports, each TRES is listed on a separate line of output with its name. The default TRES type is "cpu" to minimize changes to output.
-
- 30 Jun, 2015 5 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
556c4ace
-
Danny Auble authored
-