- 15 Apr, 2017 7 commits
-
-
Morris Jette authored
jobs have a priority that is lower than the selected job. Previous logic would permit other jobs with equal priority (no jobs with higher priority). Bug 3650
-
Tim Wickberg authored
-
Dominik Bartkiewicz authored
circumstances.
-
Tim Shaw authored
array element per line. Bug 3573
-
Bill Brophy authored
If the depend_list is NULL or has zero elements, the string should be cleared as well. Bug 3651.
-
Thomas Opfer authored
The field needs to have its own copy, otherwise the pointer will become invalid when xfree()'d by a separate array task. Bug 3665.
-
Alejandro Sanchez authored
So that it is the same max length as in src/common/env.c. Used for explicitly laying out tasks on large CPU count nodes (e.g., KNL). Bug 3675.
-
- 14 Apr, 2017 12 commits
-
-
Brian Christiansen authored
For use in federated environments.
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
in any case.
-
Dong Ahn authored
As specified in MPIR debug interface (https://www.mpi-forum.org/docs/mpir-specification-10-11-2010.pdf), the presence of the MPIR_partial_attach_ok symbol should inform the debugger that the initial startup synchronization is implemented in such a way that the tool need not attach nor continue MPI processes that the user is not interested in controlling. To implement this, SLURM chose to send SIGCONT to those processes that are not attached by the debugger. However, the old code does not reliably detect the condition in which a process is traced by the debugger, and this has lead to various side effects. On some systems (e.g., TOSS2), the old code sends SIGCONT to all of the target processes including those attached by the debugger. On newer systems (e.g., TOSS3), it does not send SIGCONT to the target processes at all. It seems that one of the reasons for such undefined behavior is the use of CLONE_PTRACE. @grondo found no documentation that indicates CLONE_PTRACE is for the case where the process is being attached by a debugger. More importantly, this code is matching clone(2) flags to proc(5) process flags, which are not the same, as task->flags defined as PF_* flags from kernel source include/linux/sched.h. This patch fixes these problems by replacing the old detection logic with ones based on the TracerPid field in /proc/<pid>/status. From proc(5), TracerPid: PID of process tracing this process (0 if not being traced).
-
Thomas Opfer authored
Improve job scheduling sort after sorting by priority we now sort by submit time and then by job id. We used to not consider submit time. This handles the case where the job_ids have rolled or we are doing federation scheduling. Bug 3524
-
Morris Jette authored
All problems introduced in the course of changing un/pack logic required for removing pack jobs logic
-
Morris Jette authored
-
Morris Jette authored
bug 926
-
Morris Jette authored
bug 926
-
Morris Jette authored
bug 926
-
Brian Christiansen authored
Display with -Ocluster Sort with -S[+|-]cluster
-
- 13 Apr, 2017 21 commits
-
-
Morris Jette authored
bug 926
-
Danny Auble authored
-
Danny Auble authored
least an admin for clearer code.
-
Danny Auble authored
We can used the authorized key else.
-
Danny Auble authored
-
Danny Auble authored
but don't give them admin privileges.
-
Tim Wickberg authored
-
Brian Christiansen authored
Add missing --local option Add note about -M implying --local.
-
Brian Christiansen authored
-
Brian Christiansen authored
environment variable and indentation.
-
Brian Christiansen authored
They could either be running on another cluster or not active on the cluster.
-
Brian Christiansen authored
-
Brian Christiansen authored
Continuation of 3a9970c0
-
Morris Jette authored
-
Morris Jette authored
If a task in a parallel job fails and it was launched with the --kill-on-bad-exit option then terminated the remaining tasks using the SIGCONT, SIGTERM and SIGKILL signals rather than just sending SIGKILL.
-
Morris Jette authored
Sinfo command with support for federated clusters
-
Brian Christiansen authored
-
Brian Christiansen authored
The job was completing before the test could find that the job was running. The script was running for 10 seconds and wait_for_fed_job can back off and check only every 10 seconds.
-
Brian Christiansen authored
Can't use the revoked state to determine if the job is pending or not since an origin job could be revoked if it doesn't have an active job on itself.
-
Brian Christiansen authored
-
Morris Jette authored
Coverity CID 45347
-