1. 14 Jan, 2011 9 commits
  2. 13 Jan, 2011 3 commits
  3. 12 Jan, 2011 4 commits
  4. 11 Jan, 2011 11 commits
  5. 10 Jan, 2011 5 commits
    • Moe Jette's avatar
      31776ea0
    • Moe Jette's avatar
      -- Add scancel --reservation option to cancell all jobs associated with a · 2aba39da
      Moe Jette authored
          specific reservation.
      2aba39da
    • Moe Jette's avatar
    • Moe Jette's avatar
      fec68ac8
    • Moe Jette's avatar
      Patch from Gerrit: 01_salloc-Bug-Fix-nested-terminal-foreground-process.diff · 69e1d108
      Moe Jette authored
      
      salloc: notify terminal foreground process
      
      This fixes another bug observed in salloc child process cleanup. I found
      that some shells, e.g. zsh, do not forward all signals to their children.
      
      The patch fixes the problem that
       * command_pid is still active but does not equal tpgid,
       * tpgid is not the same as salloc's process group,
       * tpgid is very unlikely to come from another process, since we block
         the suspend/TSTP signal,
       * signalling command_pid does not automatically imply that the active
         terminal foreground process is also signalled,
       * hence send a HUP to signify "death of controlling process".
      
      This setup fixed the problem on zsh. I then went and tested a more complex setup:
      
      Before:
      -------
      palu2:0 ~>ps  f -o pid,pgid,tpgid,ppid,stat,tty,cmd
        PID  PGID TPGID  PPID STAT TT       CMD
      21117 21117 21597 21116 Ss   pts/9    -bash
      21260 21260 21597 21117 Sl   pts/9     \_ ./slurm_build/git/src/salloc/salloc -v --time=00:01:00 -N17 zsh
      21266 21266 21597 21260 S    pts/9         \_ zsh
      21323 21323 21597 21266 S    pts/9             \_ /bin/bash
      21397 21397 21597 21323 S    pts/9                 \_ -bin/tcsh
      21526 21526 21597 21397 S    pts/9                     \_ /bin/sh
      21597 21597 21597 21526 S+   pts/9                         \_ aprun -N1 -n17 sleep 12345
      21601 21597 21597 21597 S+   pts/9                             \_ aprun -N1 -n17 sleep 12345
      
      After the timeout:
      ------------------
      palu2:0 ~>ps  f -o pid,pgid,tpgid,ppid,stat,tty,cmd
        PID  PGID TPGID  PPID STAT TT       CMD
      21323 21323 21117     1 S    pts/9    /bin/bash
      21397 21397 21117 21323 S    pts/9     \_ -bin/tcsh
      21526 21526 21117 21397 S    pts/9         \_ /bin/sh
      
      ==> The 'dangerous' aprun terminal foreground process group 21597 has been removed, while the child
          subprocess groups 21323, 21397, and 21526 now exist as orph01_salloc-Bug-Fix-nested-terminal-foreground-process.diff
      aned groups, to be cleaned up by init.
      69e1d108
  6. 07 Jan, 2011 2 commits
  7. 06 Jan, 2011 3 commits
  8. 03 Jan, 2011 2 commits
  9. 29 Dec, 2010 1 commit