- 03 Dec, 2013 13 commits
-
-
Morris Jette authored
This is a correction in the logic of commit 3f4b2d51 on launch failures.
-
jette authored
Make it work for more system types Do not delete files if the build fails
-
Danny Auble authored
here instead of cp please change this :).
-
Danny Auble authored
-
Danny Auble authored
-
David Gloe authored
-
David Gloe authored
-
Morris Jette authored
Conflicts: src/slurmctld/job_scheduler.c
-
Morris Jette authored
Correct logic returning remaining job dependencies in job information reported by scontrol and squeue. Eliminates vestigial descriptors with no job ID values (e.g. "afterany"). As depdencies are removed, the job ID values were removed from the strings, but not the descriptors. This eliminates both. It also checks the full job ID to make sure we do not remove "afterany:1234" when job "123" completes.
-
Danny Auble authored
-
Danny Auble authored
handle any job_fail calls after the fact since it will result in deadlock otherwise.
-
Danny Auble authored
-
Danny Auble authored
-
- 02 Dec, 2013 22 commits
-
-
Morris Jette authored
-
Morris Jette authored
Conflicts: NEWS doc/man/man5/cgroup.conf.5
-
Morris Jette authored
Add a check to make sure that the job completion RPC from a slurmstepd match that node that the batch job is running on. This would not be the case of for a job started on a node if that node's slurmd fails, but the slurmstepd keeps running. The job could then be requeued and generate a completion RPC from both slurmstepd daemons (one per node). This logic will ignore the job complete RPC from the node NOT currently running the batch job.
-
Morris Jette authored
-
David Bigagli authored
-
Morris Jette authored
Fix race condition on batch job termination that could result in a job exit code of 0xfffffffe if the slurmd on node zero registers its active jobs at the same time that slurmstepd is recording the job's exit code. but 535
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
-
David Bigagli authored
-
David Bigagli authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Jason Sollom authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
This is needed primarily for native Cray systems to avoid hitting their LDAP or NIS server from every node in their system. This can also provide a more scalable model for other systems as well.
-
Morris Jette authored
-
- 29 Nov, 2013 5 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Conflicts: src/plugins/proctrack/cgroup/proctrack_cgroup.c
-
Morris Jette authored
There was already cgroup locking in the version 14.03 code base using different variable names and slighly different logic from that in commit 3f6d9e36. This commit is a variant of that commit in order to make the logic in version 2.6 match that of our next release (logic which is already pretty well tested). bug 447
-
Morris Jette authored
proctrack/cgroup - Add locking to prevent race condition where one job step is ending for a user or job at the same time another job stepsis starting and the user or job container is deleted from under the starting job step. bug 447
-