Commit 1ed38f26 authored by Morris Jette's avatar Morris Jette
Browse files

Fix for srun abort on SIGSTOP+SIGCONT

Avoid possibly aborting srun that gets simultaneous SIGSTOP+SIGCONT while
    creating the job step. The result is that the signal hanlder gets a
    argument (the signal received) of zero.

Here's a log, window 1:
$ srun hostname
srun: Job step creation temporarily disabled, retrying
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 0
srun: Cancelled pending job step

Window 2:
$  kill -STOP 18696 ; kill -CONT 18696
$  kill -STOP 18696 ; kill -CONT 18696
$  kill -STOP 18696 ; kill -CONT 18696
....

bug 2494
parent dd2324a7
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment