- 11 Jan, 2011 4 commits
- 10 Jan, 2011 6 commits
-
-
Moe Jette authored
size get queued).
-
Moe Jette authored
specific reservation.
-
https://eris.llnl.gov/svn/slurm/branches/v2.3-frontendMoe Jette authored
-- Added support for more than one front-end node to run slurmd on architectures where the slurmd does not execute on the compute nodes (e.g. BlueGene). New configuration paramters FrontendNode and FrontendAddr added. See "man slurm.conf" for more information.
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
salloc: notify terminal foreground process This fixes another bug observed in salloc child process cleanup. I found that some shells, e.g. zsh, do not forward all signals to their children. The patch fixes the problem that * command_pid is still active but does not equal tpgid, * tpgid is not the same as salloc's process group, * tpgid is very unlikely to come from another process, since we block the suspend/TSTP signal, * signalling command_pid does not automatically imply that the active terminal foreground process is also signalled, * hence send a HUP to signify "death of controlling process". This setup fixed the problem on zsh. I then went and tested a more complex setup: Before: ------- palu2:0 ~>ps f -o pid,pgid,tpgid,ppid,stat,tty,cmd PID PGID TPGID PPID STAT TT CMD 21117 21117 21597 21116 Ss pts/9 -bash 21260 21260 21597 21117 Sl pts/9 \_ ./slurm_build/git/src/salloc/salloc -v --time=00:01:00 -N17 zsh 21266 21266 21597 21260 S pts/9 \_ zsh 21323 21323 21597 21266 S pts/9 \_ /bin/bash 21397 21397 21597 21323 S pts/9 \_ -bin/tcsh 21526 21526 21597 21397 S pts/9 \_ /bin/sh 21597 21597 21597 21526 S+ pts/9 \_ aprun -N1 -n17 sleep 12345 21601 21597 21597 21597 S+ pts/9 \_ aprun -N1 -n17 sleep 12345 After the timeout: ------------------ palu2:0 ~>ps f -o pid,pgid,tpgid,ppid,stat,tty,cmd PID PGID TPGID PPID STAT TT CMD 21323 21323 21117 1 S pts/9 /bin/bash 21397 21397 21117 21323 S pts/9 \_ -bin/tcsh 21526 21526 21117 21397 S pts/9 \_ /bin/sh ==> The 'dangerous' aprun terminal foreground process group 21597 has been removed, while the child subprocess groups 21323, 21397, and 21526 now exist as orph01_salloc-Bug-Fix-nested-terminal-foreground-process.diff aned groups, to be cleaned up by init.
-
- 08 Jan, 2011 1 commit
-
-
Moe Jette authored
-
- 07 Jan, 2011 10 commits
- 06 Jan, 2011 6 commits
-
-
Moe Jette authored
I noticed that the S_{JOB,STEP}_ALLOC_* comments in spank.h don't document the return-by-value type as is done for the other spank_item enumerations.
-
Moe Jette authored
-
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
- 05 Jan, 2011 3 commits
-
-
Danny Auble authored
Added flag to slurmdbd.conf TrackSlurmctldDown where if set will mark idle resources as down on a cluster when a slurmctld disconnects or is no longer reachable.
-
Moe Jette authored
all compute nodes, and add update functionality
-
Moe Jette authored
-
- 04 Jan, 2011 7 commits
- 03 Jan, 2011 3 commits