- 15 Sep, 2017 33 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
Also fix a wrong type use in sizeof calculation fed to xmalloc.
-
Tim Wickberg authored
Also fix a wrong type use in sizeof calculation fed to xmalloc.
-
Tim Wickberg authored
-
Tim Wickberg authored
Also fix a wrong type use in sizeof calculation fed to xmalloc.
-
Tim Wickberg authored
-
Tim Wickberg authored
Also fix a wrong type use in sizeof calculation fed to xmalloc.
-
Tim Wickberg authored
Also fix a wrong type use in sizeof calculation fed to xmalloc.
-
Tim Wickberg authored
From the comment block in checkpoint_poe.c: This is based upon checkpoint support of poe in the 2005 time frame for the ASCI Purple computer. It does not work with current versions of POE. From Gary Mincher (IBM, Sept 6 2012): "Checkpoint/restart on Linux is only supported for user-space parallel jobs with a maximum of 512 tasks that are run on Power 775 nodes, but jobs that use a resource manager or scheduler other than LoadLeveler are not supported." N
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
Also change signature of io_thread_start() and cleanup the error handling in the one calling location.
-
Tim Wickberg authored
-
Tim Wickberg authored
All are detached threads, both the slurm_attr_init and pthread_attr_setdetachstate calls are in _cancel_jobs and apply throughout.
-
Tim Wickberg authored
-
Tim Wickberg authored
Which fixes yet another missing slurm_attr_destroy.
-
Tim Wickberg authored
Change function signature as the only error path out of here was if the thread creation failed. Cleanup code in fed_mgr.c trying to manage that event. (The one other calling path just ignored the error anyways.)
-
Tim Wickberg authored
And another missing slurm_attr_destroy call.
-
Tim Wickberg authored
Interestingly enough there was a missing slurm_attr_destroy() call here.
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
While here fix two function signatures, as the only error path was if pthread_create failed, and the calling locations ignored the return code anyways.
-
Tim Wickberg authored
Bug 4163.
-
Tim Wickberg authored
Set to fatal on pthread_create failure, as this is the LCD of all existing behaviors, and as such if something were stopping thread creation from succeeding you could expect some code path to trip that eventually. Rather than gamble on when/if that is triggered, just immediately die. Bug 4163.
-
Dominik Bartkiewicz authored
-
Tim Wickberg authored
CID 44819.
-
- 14 Sep, 2017 7 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
A second PMI2_Init() within the same step is invalid, and cannot succeed. Return an error code back to the client end, and close the fd to force the step to terminate immediately. Due to a bug in our libpmi code, just returning a cmd=response_to_init with an appropriate rc number will not tear down the connection properly, so send back something else that will trigger the error path. Bug 3520.
-
Tim Wickberg authored
Need to make a local copy of the user_name field, otherwise you'll xfree() the cache's version and eventually crash or fatal() slurmctld.
-
Morris Jette authored
-
Morris Jette authored
An error message was changed to report job ID. Modify the output parsing to avoid confusing the job ID with a task id label.
-
Morris Jette authored
A request to cancel a pack step leader will result in that step being cancelled on all pack job components. Needed by MPI.
-
Tim Wickberg authored
CID 45178.
-