- 10 Aug, 2017 3 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
- 09 Aug, 2017 4 commits
-
-
Morris Jette authored
This is needed to run a single MPI_COMM_WORLD spread across multiple job allocations. The work is incomplete, but well underway in this commit (much of the code in place, test suite runs sucessfully).
-
Morris Jette authored
If the system configuration prevents the tests job from starting, terminate the test with a warning.
-
Morris Jette authored
If submit failed, tried to cancel job ID 0
-
Morris Jette authored
Systems with less than 3 CPUs per node would fail the test
-
- 07 Aug, 2017 1 commit
-
-
Morris Jette authored
-
- 04 Aug, 2017 7 commits
-
-
Morris Jette authored
Logic recently introduced would cancel an entire job allocation if a step launch failed, even if the srun command did not create the allocation (running under salloc or sbatch).
-
Morris Jette authored
-
Morris Jette authored
The srun message changed for pack jobs and the test failed. Tweak both message and test for better clarity.
-
Morris Jette authored
Modify the signal handling and termination functions to operate on a list of job steps (pack job) rather than a single record Add job ID to some srun log messages for pack jobs
-
Morris Jette authored
Add the pack count argument to slurm_step_launch() function in test suite and perl api. The argument was added in commit 71a34f56
-
Morris Jette authored
Modify launch/slurm plugin to signal all components of a pack job rather than just the one (modify to use a list of step context records).
-
Morris Jette authored
If prolog is running when attempting to signal a step, then return EAGAIN and retry rather than simply returning SLURM_ERROR and aborting.
-
- 03 Aug, 2017 11 commits
-
-
Morris Jette authored
Fix I/O race condition on step termination for srun launching multiple pack job groups. Without this change application output might be lost and/or the srun command might hang after some tasks exit.
-
Morris Jette authored
-
Morris Jette authored
All of these were pre-existing Coverity errors, but I changed nearby code, variable names, etc. so they looked like new errors.
-
Morris Jette authored
-
Morris Jette authored
Coverity reported problem, CID 45194
-
Morris Jette authored
CID 44936
-
Morris Jette authored
-
Morris Jette authored
Coverity CID 171494
-
Morris Jette authored
-
Morris Jette authored
-
- 02 Aug, 2017 13 commits
-
-
Tim Wickberg authored
Bug 3956.
-
Tim Shaw authored
Add translation code for the RPCs as well. Bug 3956.
-
Morris Jette authored
-
Morris Jette authored
Add pack_job_id and pack_job_offset to accounting database. Modified sacct to accept pack job ID specification using "#+#" notation. Modified sstat to accept pack job ID specification using "#+#" notation.
-
Morris Jette authored
-
Tim Wickberg authored
-
Dominik Bartkiewicz authored
NULL is returned if the token is not found, testing against '\0' is wrong (although does work okay in older compilers). Fixes new GCC 7.1 warning.
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
was matching more than expected.
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Danny Auble authored
error messages that aren't really errors. Related to Bug 3997
-
- 01 Aug, 2017 1 commit
-
-
Morris Jette authored
Without this change, each component would generate separate email at job begin, end, etc.
-