- 01 Mar, 2016 5 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Morris Jette authored
This fixes a bug introduced in commit 52fe3de1 in the event the fork() call fails in slurmstepd.
-
Morris Jette authored
Insure that a job is completely launched before trying to suspend it. Previous logic would start suspend logic early in the life of the slurmstepd process, after it's listening socket was open but before the tasks were launched. This defers the suspend logic until after all prologs and setup completes and the tasks are launched. This is important in the case of gang scheduling, in which newly launched jobs can be immediately suspended. bug 2494
-
Morris Jette authored
-
- 29 Feb, 2016 1 commit
-
-
Danny Auble authored
Bug 1976
-
- 26 Feb, 2016 5 commits
-
-
Danny Auble authored
-
Tim Wickberg authored
Add note to slurm.conf man page about setting "--cpu_bind=no" as part of SallocDefaultCommand if a TaskPlugin is in use.
-
Maksym Planeta authored
-
Bjørn-Helge Mevik authored
Test 14.10 in the test suite (of slurm 15.08.8, at least) uses $sinfo -tidle -h -o%n to find idle nodes. This only works if NodeHostname == NodeName on the nodes. The following should work regardless of this: $scontrol show hostnames \$($sinfo -tidle -h -o%N)
-
Tim Wickberg authored
-
- 25 Feb, 2016 3 commits
-
-
Tim Wickberg authored
Since the function is inlined the single definition let GCC build everything properly, but debug builds (which disable inline) resulted in: slurmstepd: [465.0]: symbol lookup error: (trimmed path)/task_cgroup.so: undefined symbol: val_to_char when running srun --cpu_bind=v. task/affinity had this definition already, task/cgroup didn't.
-
Morris Jette authored
Reported by valgrind running test7.2, but shouldn't cause any real problem
-
Danny Auble authored
was also given.
-
- 24 Feb, 2016 9 commits
-
-
Danny Auble authored
a partition.
-
Danny Auble authored
This also reverts most of commit fa331e30 as well as commit bd9fa830 which would try to set the pn_min_cpus every time a job was updated. If a job didn't request node counts then they were hosed. This commit takes away the magic which was screwing things up. Now the person gets what they asked for without magic changing things. Bug 2302 Bug 2742 Bug 2478
-
Danny Auble authored
erroneously.
-
Morris Jette authored
Failure has never been observed, but initialize the used variable before calling the function so we don't re-use old data if the function returns an error.
-
Morris Jette authored
Rename an improperly named variable in the logic scontrol uses to print node information ("total_used" was really "idle_cpus"), so the logic looks the same as that used in sinfo to determine node state.
-
Morris Jette authored
Include warning for Cray simulation as reminder for developers to change code as needed.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
- 23 Feb, 2016 2 commits
-
-
Morris Jette authored
-
Danny Auble authored
This whole process could probably be done better by keeping track of old values and new values and only calling one function instead of a pre and post function, but that can probably wait for future generations of the code as it works now and is probably adequate for the time being. Bug 2352
-
- 19 Feb, 2016 8 commits
-
-
Tim Wickberg authored
-
Gennaro Oliva authored
Consistantly use American English for existant -> existent assocation -> association Correct some typos, and one grammatical mistake.
-
Morris Jette authored
BurstBuffer/cray - Defer job cancellation or time limit while "pre-run" operation in progress to avoid inconsistent state due to multiple calls to job termination functions. bug 2454
-
Tim Wickberg authored
Otherwise call fclose(NULL) iff the ClusterName is not set and the clustername file does not exist. Should not happen in production. Coverity #67041.
-
Morris Jette authored
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
-
- 18 Feb, 2016 7 commits
-
-
Danny Auble authored
-
Danny Auble authored
a new account and making it a default all at once. Bug 2428
-
Alejandro Sanchez authored
Match acct_gather_energy/rapl plugin. Bug 2397.
-
Tim Wickberg authored
Control whether the scheduler will continue to try to run jobs in a partition if a higher priority job is stuck due to an association limit. Can cause starvation for larger jobs, but will improve throughput and utilization for systems that have extensively divvyed up their resources through association/QOS limits. Bug 2388 and 2452.
-
Danny Auble authored
Bug 2453
-
Morris Jette authored
This should have no effect, but is a belt-and-suspenders approach to checking node state.
-
Jeff White authored
-