Commit db7002d1 authored by Christopher J. Morrone's avatar Christopher J. Morrone
Browse files

Fix for bug reported by Jim Garlick:

  "srun output overflow ("Need to rewind" in srun/_do_output_line)"

When srun's stdout is consuming data slowly, srun can receive notice that
the job has terminated before the output stream has been fully written.

The IO thread will receives a SIGHUP to kick it out of its blocking poll.
However in the slow stdout situation the SIGHUP can interrupt the
fflush.  When the fflush is interrupted, it appears to clear the stream
buffer even though the data wasn't written out to the file descriptor,
and we see data loss on stdout.

To avoid this situation, this change makes signals to the IO thread
go over a pipe rather than sending a signal.  Also, some extra return
code checking is done in io.c:_do_output_line().
parent e2f39fe7
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment