- 21 Jun, 2011 1 commit
-
-
Moe Jette authored
Modify smap and sview to display all nodes even if multiple nodes exist at each coordinate.
-
- 20 Jun, 2011 3 commits
-
-
Moe Jette authored
Cray systems: Add support to suspend/resume salloc command to insure that aprun does not get initiated when the job is suspended.
-
moe authored
With regard to forthcoming Accelerator support in Basil 1.2/Alps 4.0, this adds interface support for passing the following Accelerator parameters: * accelerator type (currently only "GPU" is supported), * model/rank information (uninterpreted "family" string), * amount of on-board memory in MB. 02_Cray-Accelerator-params.diff Patch from Gerrit Renker and Stephen Trofinoff, CSCS.
-
moe authored
This adds support to parse Basil 1.2/Alps 4.0 per-node accelerator information. 01_Cray-Accelerator-basic-support.diff Patch from Gerrit Renker and Stephen Trofinoff, CSCS
-
- 17 Jun, 2011 3 commits
-
-
Moe Jette authored
-
Moe Jette authored
NOTE: THERE HAS BEEN A NEW FIELD ADDED TO THE CONFIGURATION RESPONSE RPC AS SHOWN BY "SCONTROL SHOW CONFIG". THIS FUNCTION WILL ONLY WORK WHEN THE SERVER AND CLIENT ARE BOTH RUNNING SLURM VERSION 2.3.0.pre6
-
Moe Jette authored
Fix bug in layout of job step with --nodelist option plus node count. Old code could allocate too few nodes by double counting some nodes.
-
- 16 Jun, 2011 1 commit
-
-
Danny Auble authored
-
- 15 Jun, 2011 1 commit
-
-
Moe Jette authored
The original logic had a problem if you shrank a job and later grew it. Nodes previously released would reappear when the job grows, but have zero CPUs associated with them. The problem was due to the original nodes list of a job being preserved in the job_resources data structure. The new logic confirms that those nodes are still in the job's allocation before rebuilding the job_resources data structure.
-
- 14 Jun, 2011 2 commits
-
-
Danny Auble authored
UMBC.
-
Moe Jette authored
Prevent background salloc disconnecting terminal at termination. Patch by Don Albert, Bull.
-
- 10 Jun, 2011 1 commit
-
-
Moe Jette authored
-
- 09 Jun, 2011 2 commits
- 08 Jun, 2011 2 commits
-
-
Moe Jette authored
Avoid clearing a node's Arch, OS, BootTime and SlurmdStartTime when "scontrol reconfig" is run. Patch from Martin Perry, Bull.
-
Morris Jette authored
-
- 07 Jun, 2011 2 commits
-
-
Danny Auble authored
-
Moe Jette authored
Added scontrol ability to increment or decrement a job or step time limit.
-
- 06 Jun, 2011 1 commit
-
-
Danny Auble authored
would not be set correctly in the added child association.
-
- 02 Jun, 2011 1 commit
-
-
Moe Jette authored
With default configuration on non-Cray systems, enable salloc to be spawned as a background process. Based upon work by Don Albert (Bull) and Gerrit Renker (CSCS).
-
- 01 Jun, 2011 3 commits
-
-
Moe Jette authored
Add support to salloc for a new environment variable SALLOC_KILL_CMD, which is equivalent to the -K/--kill-command option.
-
Moe Jette authored
This fixes a bug which is thanks to a report by Don Albert. The problem is that whenever salloc exits with a child process in stopped state (suspended or stopped on terminal input/output), a zombie process is generated, since this case is not caught by the code evaluating the child status. This patch adds the missing case. It uses SIGKILL, which is the only signal that changes the state of a stopped process. It was decided not to try and re-awken the process using SIGCONT, since (a) this happens during session clean-up and (b) if the condition is due to SIGTTIN, the process immediately becomes stopped again. Patch from Gerrit Renker, CSCS.
-
Moe Jette authored
Treat the specification of multiple cluster names as a fatal error.
-
- 31 May, 2011 3 commits
- 28 May, 2011 3 commits
-
-
Moe Jette authored
Improve accuracy of REQUEST_JOB_WILL_RUN start time with respect to higher priority pending jobs.
-
Moe Jette authored
Expand explanation of multiple DEFAULT values in slurm.conf
-
Moe Jette authored
Propagate DebugFlags changes by scontrol to the various plugins and other modules. DebugFlags is cached in some places and the changes cause the cache value to be reset as needed.
-
- 27 May, 2011 2 commits
- 26 May, 2011 2 commits
-
-
Don Lipari authored
-
Danny Auble authored
association/wckey to be set incorrectly as a default the new object was added after an original default object already existed. Before the slurmctld would need to be restarted to fix the issue.
-
- 25 May, 2011 1 commit
-
-
Moe Jette authored
-
- 23 May, 2011 2 commits
-
-
Moe Jette authored
If job's TMPDIR environment is not set or is not usable, reset to "/tmp". Patch from Andriy Grytsenko (Massive Solutions Limited).
-
Danny Auble authored
Regression from pre5.
-
- 19 May, 2011 4 commits
-
-
Moe Jette authored
Fix bug in GraceTime support for preempted jobs that prevented proper operation when more than one job was being preempted. Based on patch from Bill Brophy, Bull.
-
Moe Jette authored
Add optional argument to srun's --kill-on-bad-exit so that user can set its value to zero and override a SLURM configuration parameter of KillOnBadExit.
-
Danny Auble authored
-
Moe Jette authored
Add support for multiple sets of DEFAULT node, partition, and frontend specifications in slurm.conf. New DEFAULT options overwrite old options, but those not explicitly changed are preserved.
-