- 14 Jan, 2013 9 commits
-
-
Morris Jette authored
-
Morris Jette authored
Correction to CPU allocation count logic in for cores without hyperthreading.
-
Hongjia Cao authored
With jobs launched using srun directly which end abnormally, there will be a step-killed-message(slurmd[cn123]: *** 1234.0 KILLED AT ... WITH SIGNAL 9 ***) from each node. And/or there will be a task-exit-message(srun: error: task[0-1]: Terminated) for each node. For large scale jobs, these messages become tedious and the other error messages will be buried. The attached two patches(for slurm-2.5.1) introduce two environment variables to control the output of such messages: SLURM_STEP_KILLED_MSG_NODE_ID: if set, only the specified node will print the step-killed-message; SLURM_SRUN_REDUCE_TASK_EXIT_MSG: if set and non-zero, successive task exit messages with the same exit code will be printed only once.
-
Hongjia Cao authored
With jobs launched using srun directly which end abnormally, there will be a step-killed-message(slurmd[cn123]: *** 1234.0 KILLED AT ... WITH SIGNAL 9 ***) from each node. And/or there will be a task-exit-message(srun: error: task[0-1]: Terminated) for each node. For large scale jobs, these messages become tedious and the other error messages will be buried. The attached two patches(for slurm-2.5.1) introduce two environment variables to control the output of such messages: SLURM_STEP_KILLED_MSG_NODE_ID: if set, only the specified node will print the step-killed-message; SLURM_SRUN_REDUCE_TASK_EXIT_MSG: if set and non-zero, successive task exit messages with the same exit code will be printed only once.
-
Morris Jette authored
-
Morris Jette authored
-
Yair Yarom authored
-
Morris Jette authored
-
Morris Jette authored
-
- 11 Jan, 2013 10 commits
-
-
https://github.com/SchedMD/slurmjette authored
-
jette authored
User root or SlurmUser don't need valid sbcast credential
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
This can be useful for testing purposes
-
Morris Jette authored
-
jette authored
-
jette authored
-
Morris Jette authored
-
Morris Jette authored
-
- 10 Jan, 2013 15 commits
-
-
jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Used to specify the communication protocol to be used for ALPS/BASIL.
-
Morris Jette authored
-
Morris Jette authored
-
jette authored
-
Danny Auble authored
-
jette authored
-
jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Danny Auble authored
-
Danny Auble authored
-
- 09 Jan, 2013 6 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Nathan Yee authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-