Fix for srun abort on SIGSTOP+SIGCONT
Avoid possibly aborting srun that gets simultaneous SIGSTOP+SIGCONT while creating the job step. The result is that the signal hanlder gets a argument (the signal received) of zero. Here's a log, window 1: $ srun hostname srun: Job step creation temporarily disabled, retrying srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 0 srun: Cancelled pending job step Window 2: $ kill -STOP 18696 ; kill -CONT 18696 $ kill -STOP 18696 ; kill -CONT 18696 $ kill -STOP 18696 ; kill -CONT 18696 .... bug 2494
Please register or sign in to comment