- 15 Jun, 2011 1 commit
-
-
Moe Jette authored
The original logic had a problem if you shrank a job and later grew it. Nodes previously released would reappear when the job grows, but have zero CPUs associated with them. The problem was due to the original nodes list of a job being preserved in the job_resources data structure. The new logic confirms that those nodes are still in the job's allocation before rebuilding the job_resources data structure.
-
- 14 Jun, 2011 2 commits
-
-
Danny Auble authored
UMBC.
-
Moe Jette authored
Prevent background salloc disconnecting terminal at termination. Patch by Don Albert, Bull.
-
- 10 Jun, 2011 1 commit
-
-
Moe Jette authored
-
- 09 Jun, 2011 2 commits
- 08 Jun, 2011 2 commits
-
-
Moe Jette authored
Avoid clearing a node's Arch, OS, BootTime and SlurmdStartTime when "scontrol reconfig" is run. Patch from Martin Perry, Bull.
-
Morris Jette authored
-
- 07 Jun, 2011 2 commits
-
-
Danny Auble authored
-
Moe Jette authored
Added scontrol ability to increment or decrement a job or step time limit.
-
- 06 Jun, 2011 1 commit
-
-
Danny Auble authored
would not be set correctly in the added child association.
-
- 02 Jun, 2011 1 commit
-
-
Moe Jette authored
With default configuration on non-Cray systems, enable salloc to be spawned as a background process. Based upon work by Don Albert (Bull) and Gerrit Renker (CSCS).
-
- 01 Jun, 2011 3 commits
-
-
Moe Jette authored
Add support to salloc for a new environment variable SALLOC_KILL_CMD, which is equivalent to the -K/--kill-command option.
-
Moe Jette authored
This fixes a bug which is thanks to a report by Don Albert. The problem is that whenever salloc exits with a child process in stopped state (suspended or stopped on terminal input/output), a zombie process is generated, since this case is not caught by the code evaluating the child status. This patch adds the missing case. It uses SIGKILL, which is the only signal that changes the state of a stopped process. It was decided not to try and re-awken the process using SIGCONT, since (a) this happens during session clean-up and (b) if the condition is due to SIGTTIN, the process immediately becomes stopped again. Patch from Gerrit Renker, CSCS.
-
Moe Jette authored
Treat the specification of multiple cluster names as a fatal error.
-
- 31 May, 2011 3 commits
- 28 May, 2011 3 commits
-
-
Moe Jette authored
Improve accuracy of REQUEST_JOB_WILL_RUN start time with respect to higher priority pending jobs.
-
Moe Jette authored
Expand explanation of multiple DEFAULT values in slurm.conf
-
Moe Jette authored
Propagate DebugFlags changes by scontrol to the various plugins and other modules. DebugFlags is cached in some places and the changes cause the cache value to be reset as needed.
-
- 27 May, 2011 2 commits
- 26 May, 2011 2 commits
-
-
Don Lipari authored
-
Danny Auble authored
association/wckey to be set incorrectly as a default the new object was added after an original default object already existed. Before the slurmctld would need to be restarted to fix the issue.
-
- 25 May, 2011 1 commit
-
-
Moe Jette authored
-
- 23 May, 2011 2 commits
-
-
Moe Jette authored
If job's TMPDIR environment is not set or is not usable, reset to "/tmp". Patch from Andriy Grytsenko (Massive Solutions Limited).
-
Danny Auble authored
Regression from pre5.
-
- 19 May, 2011 5 commits
-
-
Moe Jette authored
Fix bug in GraceTime support for preempted jobs that prevented proper operation when more than one job was being preempted. Based on patch from Bill Brophy, Bull.
-
Moe Jette authored
Add optional argument to srun's --kill-on-bad-exit so that user can set its value to zero and override a SLURM configuration parameter of KillOnBadExit.
-
Danny Auble authored
-
Moe Jette authored
Add support for multiple sets of DEFAULT node, partition, and frontend specifications in slurm.conf. New DEFAULT options overwrite old options, but those not explicitly changed are preserved.
-
Danny Auble authored
-
- 18 May, 2011 4 commits
-
-
Moe Jette authored
Patch from Andriy Grytsenko (Massive Solutions Limited).
-
Moe Jette authored
Synchronize power-save module better with scheduler. Without this change, returning a node to service was typically delayed longer than necessary. Patch from Andriy Grytsenko (Massive Solutions Limited).
-
Moe Jette authored
Modify job expansion logic to support licenses, generic resources, and currently running job steps in the job which is expanding.
-
Moe Jette authored
-
- 16 May, 2011 1 commit
-
-
Danny Auble authored
BLUEGENE - if a block goes into an error state. Fix issue where accounting wasn't updated correctly when the block was resumed.
-
- 13 May, 2011 2 commits
-
-
Danny Auble authored
When enforcing accounting, fix polling for unknown uids for users after the slurmctld started. Previously one would have to issue a reconfigure to the slurmctld to have it look for new uids.
-
Moe Jette authored
-