- 27 Jun, 2011 1 commit
-
-
Morris Jette authored
Change use of pointer in a step launch RPC to a copy of the data structure which will insure that the step data does not change while the RPC message is being built. No problems have been observed, but this will be safer.
-
- 25 Jun, 2011 3 commits
-
-
Morris Jette authored
"scontrol show config" was reporting both per CPU and per node memory limits as being per CPU limits. Change to report per node limits with the proper key name.
-
-
Morris Jette authored
Correct values mainted for suspended job count (sus_job_cnt) by node and run job count (job_cnt_run) by front-end node when reconfiguring the slurmctld daemon while there are suspended jobs on a front-end architecture.
-
- 24 Jun, 2011 13 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
scheduling
-
Morris Jette authored
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Add select_jobinfo to the task launch RPC so that all nodes have access to the information and not job the head node. Based upon patch by Andriy Grytsenko (Massive Solutions Limited).
-
Morris Jette authored
-
Danny Auble authored
Needed for gang scheduling.
-
Morris Jette authored
Fix possible invalid memory reference in sched/backfill. Patch by Andriy Grytsenko (Massive Solutions Limited).
-
Morris Jette authored
-
Morris Jette authored
Add flag to the select APIs for job suspend/resume indicating if the action is for gang scheduling or an explicit job suspend/resume by the user. Only an explicit job suspend/resume will reset the job's priority and make resources exclusively held by the job available to other jobs. This change is also needed for Cray systems with ALPS.
-
- 23 Jun, 2011 4 commits
-
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
Based upon recent experience with smap/sview, we are reverting a change that would display each node at a separate location even if they share a common ZXYZ coordinate.
-
Danny Auble authored
-
- 22 Jun, 2011 10 commits
-
-
Morris Jette authored
Add squeue support to display a job's license information. Patch by Andy Roosen (University of Deleware).
-
Morris Jette authored
If an salloc allocation is revoked and the job is in a suspended state, send the child processes a SIGCONT before sending SIGHUP or SIGTERM so that the processes can terminate immediately.
-
Morris Jette authored
For front-end architectures on which job steps are run (emulated Cray and BlueGene systems only), fix bug that would free memory still in use.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
-
Morris Jette authored
Processes suspended and resumed are determined by using process group ID and parent process ID, so some processes may be missed. Since salloc runs as a normal user, it's ability to identify processes associated with a job is limited.
-
- 21 Jun, 2011 9 commits
-
-
Morris Jette authored
Only suspend the salloc command's children when suspending the job for cray systems. This is required to prevent additional aprun commands from being spawned.
-
Danny Auble authored
-
Danny Auble authored
-
Moe Jette authored
Modify srun_job_suspend() to return status (message sent or not)
-
Moe Jette authored
-
Moe Jette authored
Modify slurmctld logic to send SRUN_REQUEST_SUSPEND so that it does not wait for a reply from salloc or srun.
-
Moe Jette authored
This initializes the job_suspend callback function in srun.
-
Moe Jette authored
Improve efficiency of select/linear plugin with topology/tree plugin configured, Patch by Andriy Grytsenko (Massive Solutions Limited).
-
-