Track scripts run by slurmctld
We will store the pid of running scripts on a list in order to kill them when slurmctld is shutting down. This will make the thread running the script to end, so we will join this thread too, but only during a specified timeout, because if the pid forked by the caller thread is stuck in IO, slurmctld will be stuck too. We also need to diferentiate when a sigkill is send by a human or OOM, from the one send by slurmctld when killing it, so the one way wil be to set ctld_script_rec_t jobid to -1 when we are killing the process. Work to do includes adding this logic to the burst_buffer functions which use the run_command() function. Work to do also may need to add locks for accessing ctld_script_rec_t records. Bug 5913
Please register or sign in to comment