- 12 Jan, 2011 2 commits
- 11 Jan, 2011 15 commits
-
-
-
Danny Auble authored
BLUEGENE - better checking small blocks in dynamic mode whether a full midplane job could run or not.
-
Moe Jette authored
running to be more clear and only print when --verbose option is used.
-
Moe Jette authored
-
Moe Jette authored
-
Danny Auble authored
-
Moe Jette authored
it's partition is configured "Shared=EXCLUSIVE" (which is redundant).
-
Moe Jette authored
wrong order for slurm v2.2+
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
-
-
Moe Jette authored
after a job completed.
-
Moe Jette authored
-
-
- 10 Jan, 2011 6 commits
-
-
Moe Jette authored
size get queued).
-
Moe Jette authored
specific reservation.
-
https://eris.llnl.gov/svn/slurm/branches/v2.3-frontendMoe Jette authored
-- Added support for more than one front-end node to run slurmd on architectures where the slurmd does not execute on the compute nodes (e.g. BlueGene). New configuration paramters FrontendNode and FrontendAddr added. See "man slurm.conf" for more information.
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
salloc: notify terminal foreground process This fixes another bug observed in salloc child process cleanup. I found that some shells, e.g. zsh, do not forward all signals to their children. The patch fixes the problem that * command_pid is still active but does not equal tpgid, * tpgid is not the same as salloc's process group, * tpgid is very unlikely to come from another process, since we block the suspend/TSTP signal, * signalling command_pid does not automatically imply that the active terminal foreground process is also signalled, * hence send a HUP to signify "death of controlling process". This setup fixed the problem on zsh. I then went and tested a more complex setup: Before: ------- palu2:0 ~>ps f -o pid,pgid,tpgid,ppid,stat,tty,cmd PID PGID TPGID PPID STAT TT CMD 21117 21117 21597 21116 Ss pts/9 -bash 21260 21260 21597 21117 Sl pts/9 \_ ./slurm_build/git/src/salloc/salloc -v --time=00:01:00 -N17 zsh 21266 21266 21597 21260 S pts/9 \_ zsh 21323 21323 21597 21266 S pts/9 \_ /bin/bash 21397 21397 21597 21323 S pts/9 \_ -bin/tcsh 21526 21526 21597 21397 S pts/9 \_ /bin/sh 21597 21597 21597 21526 S+ pts/9 \_ aprun -N1 -n17 sleep 12345 21601 21597 21597 21597 S+ pts/9 \_ aprun -N1 -n17 sleep 12345 After the timeout: ------------------ palu2:0 ~>ps f -o pid,pgid,tpgid,ppid,stat,tty,cmd PID PGID TPGID PPID STAT TT CMD 21323 21323 21117 1 S pts/9 /bin/bash 21397 21397 21117 21323 S pts/9 \_ -bin/tcsh 21526 21526 21117 21397 S pts/9 \_ /bin/sh ==> The 'dangerous' aprun terminal foreground process group 21597 has been removed, while the child subprocess groups 21323, 21397, and 21526 now exist as orph01_salloc-Bug-Fix-nested-terminal-foreground-process.diff aned groups, to be cleaned up by init.
-
- 08 Jan, 2011 1 commit
-
-
Moe Jette authored
-
- 07 Jan, 2011 10 commits
- 06 Jan, 2011 6 commits
-
-
Moe Jette authored
I noticed that the S_{JOB,STEP}_ALLOC_* comments in spank.h don't document the return-by-value type as is done for the other spank_item enumerations.
-
Moe Jette authored
-
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-