slurmstepd: Fix race in run_script_as_user
As reported by Sam Lang on slurm-dev, task_epilog scripts are not held before exec, and thus there is a race condition between when the task_epilog is launched and slurmstepd calls slurm_container_add() during which the task_epilog script could either run to completion, or launch other processes that escape any job container defined by configuration. Use the new "exec_wait" api to have the child wait before exec just as is done in fork_all_tasks. Based on an original idea by Sam Lang <samlang@gmail.com>.
Please register or sign in to comment