- 23 May, 2011 6 commits
-
-
Moe Jette authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
Describe how to get dakota application to work with slurm and add contributor to the slurm team list
-
Morris Jette authored
Improve how the cray srun/aprun wrapper handles the ntasks and nnodes options to better distribute tasks over the allocated nodes. The support for these options is imperfect for resource allocations in which the number of tasks per node is not uniform, but that can not be properly handled due to differences between srun and aprun.
-
- 20 May, 2011 3 commits
-
-
Moe Jette authored
Add optimal starting point and block length to bluegene geometry data structures used for job placement logic.
-
Danny Auble authored
-
Danny Auble authored
BLUEGENE - added system bitmap to be used for using the ba_geo_tables instead of brute force method of finding blocks
-
- 19 May, 2011 14 commits
-
-
Moe Jette authored
Fix bug in GraceTime support for preempted jobs that prevented proper operation when more than one job was being preempted. Based on patch from Bill Brophy, Bull.
-
Moe Jette authored
Add optional argument to srun's --kill-on-bad-exit so that user can set its value to zero and override a SLURM configuration parameter of KillOnBadExit.
-
Danny Auble authored
-
Danny Auble authored
-
Moe Jette authored
Conflicts: NEWS auxdir/x_ac_munge.m4
-
Moe Jette authored
Add support for multiple sets of DEFAULT node, partition, and frontend specifications in slurm.conf. New DEFAULT options overwrite old options, but those not explicitly changed are preserved.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
Added Logic to make it so you can emulate a cray system and not have to enforce the things you do on a real one.
-
Moe Jette authored
-
Moe Jette authored
Transfer all of the node fields to reorder them by node_rank. Old logic only transferred some fields, which caused problems on heterogeneous clusters.
-
Moe Jette authored
Fix some typos and improve technical content of the SLURM design documents for job launch and gres support.
-
Moe Jette authored
Added a web page to describe the job launch and termination process and made a minor enhancement to the GRES design document.
-
- 18 May, 2011 17 commits
-
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
Conflicts: src/slurmctld/node_mgr.c src/slurmctld/node_scheduler.c
-
Moe Jette authored
Patch from Andriy Grytsenko (Massive Solutions Limited).
-
Moe Jette authored
Synchronize power-save module better with scheduler. Without this change, returning a node to service was typically delayed longer than necessary. Patch from Andriy Grytsenko (Massive Solutions Limited).
-
Moe Jette authored
Report scontrol job job PreemptTime=None rather than PreemptTime=NO_VAL if not set. Patch from Bill Brophy, Bull
-
Moe Jette authored
-
Moe Jette authored
This expands the description of how to build slurm using a git repository.
-
Moe Jette authored
Modify job expansion logic to support licenses, generic resources, and currently running job steps in the job which is expanding.
-
Danny Auble authored
-
Morris Jette authored
This improves the initial configuration code: a) Better handling of DownNodes lines The previous basil_geometry() would set the node Reason field on failure, irrespective of whether that node has been marked using a DownNode line. b) Check all cases of nodes being invisible to ALPS Up until now basil_geometry() had to be fixed each time a new source of discrepancy between ALPS and SDB state had been discovered (most recent case was NULL coordinates when taking out a blade). Depending on ALPS interface changes, there may be other possibilities. Instead of fixing the SLURM code for each new case, it is better to check whether SLURM and ALPS agree. The price is some tiny delay at SLURM initialisation time (since each node is first looked up in the ALPS inventory), but it pays well off as it eases system administration by pointing to the source of error. Any node that has suddenly disappeared from ALPS horizon will now show up in the logs, and also in marked down in sinfo. c) At initialisation time, give a summary as to how many ALPS nodes are online. d) Turn ALPS-node-invisibility error into warning message, since such nodes may already have been covered in a DownNodes statement. By merging basil_get_initial_state() into basil_geometry(), the previously separate knowledge about system state (database state, ALPS inventory) is combined, allowing to more easily identify sources of failure. Patch from Gerrit Renker, CSCS.
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
-
Danny Auble authored
-
Danny Auble authored
-