- 15 Aug, 2017 2 commits
-
-
Morris Jette authored
This insures that the batch script can have appropriate environment variables set for all components (i.e. the node list, cpu count, etc. for all pack groups).
-
Morris Jette authored
Previous logic was not properly setting env vars for pack jobs. In some cases the env vars were being set in slurmctld and NOT only if the parameter was set on sbatch command line.
-
- 14 Aug, 2017 2 commits
-
-
Morris Jette authored
-
Morris Jette authored
When the "scontrol update jobid=#" command specifies a pack job leader, then modify all components of the pack job. To modiify only the pack job leader, specify "scontrol update jobid=#+0".
-
- 12 Aug, 2017 1 commit
-
-
Morris Jette authored
Modify scontrol job hold/release and update to operate with heterogeneous job id specification (e.g. "scontrol hold 123+4").
-
- 11 Aug, 2017 20 commits
-
-
Morris Jette authored
-
Morris Jette authored
If scancel is called only with job filter options (e.g. --user=..) and that results in an attempt to cancel a pending pack job component then do not log the ESLURM_NOT_PACK_WHOLE error code.
-
Morris Jette authored
Prior logic would report "No error" rather than "already complete"
-
Morris Jette authored
-
Morris Jette authored
Doing so would break the current scheduling logic.
-
Morris Jette authored
Coverity CID 44817
-
Morris Jette authored
-
Morris Jette authored
Coverity CID 44731
-
Morris Jette authored
Coverity CID 45279, 44884
-
Morris Jette authored
-
Morris Jette authored
Coverity CID 44976
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Coverity CID 45198
-
Morris Jette authored
Coverity CID 44820
-
Morris Jette authored
Coverity CID 45161
-
Morris Jette authored
Coverity CID 45166
-
Morris Jette authored
Coverity CID 44847
-
Morris Jette authored
Properly set up debugger symbols if an srun command tries to launch multiple, different applications in different components of a heterogeneous job (i.e. a different application in each pack job component).
-
Morris Jette authored
Modify srun logic so each pack job component can have a different executable program and arguments.
-
- 10 Aug, 2017 4 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
- 09 Aug, 2017 4 commits
-
-
Morris Jette authored
This is needed to run a single MPI_COMM_WORLD spread across multiple job allocations. The work is incomplete, but well underway in this commit (much of the code in place, test suite runs sucessfully).
-
Morris Jette authored
If the system configuration prevents the tests job from starting, terminate the test with a warning.
-
Morris Jette authored
If submit failed, tried to cancel job ID 0
-
Morris Jette authored
Systems with less than 3 CPUs per node would fail the test
-
- 07 Aug, 2017 1 commit
-
-
Morris Jette authored
-
- 04 Aug, 2017 6 commits
-
-
Morris Jette authored
Logic recently introduced would cancel an entire job allocation if a step launch failed, even if the srun command did not create the allocation (running under salloc or sbatch).
-
Morris Jette authored
-
Morris Jette authored
The srun message changed for pack jobs and the test failed. Tweak both message and test for better clarity.
-
Morris Jette authored
Modify the signal handling and termination functions to operate on a list of job steps (pack job) rather than a single record Add job ID to some srun log messages for pack jobs
-
Morris Jette authored
Add the pack count argument to slurm_step_launch() function in test suite and perl api. The argument was added in commit 71a34f56
-
Morris Jette authored
Modify launch/slurm plugin to signal all components of a pack job rather than just the one (modify to use a list of step context records).
-