1. 14 Apr, 2017 8 commits
    • Dong Ahn's avatar
      Fix MPIR_partial_attach_ok issues for parallel debuggers. · 18e3d6fb
      Dong Ahn authored
      As specified in MPIR debug interface
      (https://www.mpi-forum.org/docs/mpir-specification-10-11-2010.pdf),
      the presence of the MPIR_partial_attach_ok symbol
      should inform the debugger that the initial startup synchronization
      is implemented in such a way that the tool need not attach
      nor continue MPI processes that the user is not interested in controlling.
      
      To implement this, SLURM chose to send SIGCONT to those processes that are
      not attached by the debugger.
      
      However, the old code does not reliably detect the condition
      in which a process is traced by the debugger, and this
      has lead to various side effects.
      
      On some systems (e.g., TOSS2), the old code sends SIGCONT to
      all of the target processes including those attached by the debugger.
      On newer systems (e.g., TOSS3), it does not send SIGCONT
      to the target processes at all.
      
      It seems that one of the reasons for such undefined behavior
      is the use of CLONE_PTRACE.
      @grondo found no documentation that indicates
      CLONE_PTRACE is for the case where the process is being attached
      by a debugger.
      More importantly, this code is matching clone(2) flags
      to proc(5) process flags, which are not the same, as task->flags
      defined as PF_* flags from kernel source include/linux/sched.h.
      
      This patch fixes these problems by replacing
      the old detection logic with ones based on the TracerPid field
      in /proc/<pid>/status.
      
      From proc(5), TracerPid: PID of process tracing this process (0 if not
      being traced).
      18e3d6fb
    • Thomas Opfer's avatar
      Include submit_time when doing the sort for job scheduling. · 030d9d4b
      Thomas Opfer authored
      Improve job scheduling sort after sorting by priority we now sort by
      submit time and then by job id.  We used to not consider submit time.  This
      handles the case where the job_ids have rolled or we are doing federation
      scheduling.
      
      Bug 3524
      030d9d4b
    • Morris Jette's avatar
      Fix problems reported in latest coverity report · a93b6a07
      Morris Jette authored
      All problems introduced in the course of changing un/pack logic
        required for removing pack jobs logic
      a93b6a07
    • Morris Jette's avatar
      Merge branch 'unpack' · 0cba10d4
      Morris Jette authored
      0cba10d4
    • Morris Jette's avatar
      Revert commit 133a4249 · 1fc38b96
      Morris Jette authored
      bug 926
      1fc38b96
    • Morris Jette's avatar
      Revert commit 1b010388 · 41749bf9
      Morris Jette authored
      bug 926
      41749bf9
    • Morris Jette's avatar
      Revert commits c6e3bc97 and ece58780 · ae3b2e78
      Morris Jette authored
      bug 926
      ae3b2e78
    • Brian Christiansen's avatar
      Display job's cluster name in squeue · 53e03c8a
      Brian Christiansen authored
      Display with -Ocluster
      Sort with -S[+|-]cluster
      53e03c8a
  2. 13 Apr, 2017 32 commits