Commits · a73012bcd3e69da3c1a5fe4d1094c1bdd10ed9e6 · Manuel G. Marciani / ces_slurm_simulator

13 May, 2014 7 commits

More gracefully handle batch launch failure · a73012bc

Morris Jette authored May 13, 2014

If a batch job launch request can not be built (the script file
is missing, a credential can not be created, or the user does
not exist on the selected compute node), then cancel the job
in a graceful fashion. Previously, the bad RPC would be sent to
the compute node and that node DRAINED.
see bug 807

a73012bc

Correct CR_LLN with node selection by job · 899561b1

Morris Jette authored May 13, 2014

Correct SelectTypeParameters=CR_LLN with job selecition of specific nodes.
Previous logic would in most instances allocate resources on all nodes
to the job.

899561b1

Correct squeue job node & CPU counts on requeue · 4f97cae8

Morris Jette authored May 13, 2014

Correct squeue's job node and CPU counts for requeued jobs.
Previously, when a job was requeued, its CPU count reported
was that of the previous execution. When combined with the
--ntasks-per-node option, squeue would compute the expected
node count. If the --exclusive option is also used, the node
count reported by squeue could be off by a large margin (e.g.
"sbatch --exclusive --ntasks-per-node=1 -N1 .." on requeue
would use the number of CPUs on the allocated node to recompute
the expected node count).
bug 756

4f97cae8

Fix issue where batch cpuset wasn't looked at correctly in · c5728294
Danny Auble authored May 13, 2014
```
jobacct_gather/cgroup.
```
c5728294
Support non-standard slurm.conf path · 3bf2adcd
Morris Jette authored May 13, 2014
```
Support SLURM_CONF path which does not have "slurm.conf" as the file name.
bug 803
```
3bf2adcd
Expand log messages · 0f457b94
Morris Jette authored May 13, 2014

0f457b94
Add limits hierachy documentation · b2cbe311
Morris Jette authored May 13, 2014

b2cbe311

12 May, 2014 7 commits
- Retry step create if node not responding · ffad3102
  Morris Jette authored May 12, 2014
```
If a job has non-responding node, retry job step create rather than
returning with DOWN node error.
bug 734
```
  ffad3102
- Cosmetic mods to NEWS · e17ffc1b
  Morris Jette authored May 12, 2014
  
  e17ffc1b
- Merge branch 'slurm-2.6' into slurm-14.03 · f2372034
  Morris Jette authored May 12, 2014
  
  f2372034
- Fix support for job --profile=none option · 043e1b08
  Puenlap Lee authored May 12, 2014
```
Also correct related documentation
```
  043e1b08
- Make test suite more robust · 6e0ac7dd
  Nathan Yee authored May 12, 2014
```
Add force option to all file removals ("rm ..." to "rm -f ...").
bug 673
```
  6e0ac7dd
- Merge branch 'slurm-2.6' into slurm-14.03 · 455f94f4
  Morris Jette authored May 12, 2014
  
  455f94f4
- fix of comp nodes causing backfill to end early · d508ea95
  Hongjia Cao authored May 12, 2014
```
Completing nodes is removed when calling _try_sched() for a job, which
is the case in select_nodes(). If _try_sched() thinks the job can run
now but select_nodes() returns ESLURM_NODES_BUSY, the backfill loop will
be ended.
```
  d508ea95
09 May, 2014 5 commits
- Merge remote-tracking branch 'origin/slurm-2.6' into slurm-14.03 · 476a97dc
  Danny Auble authored May 09, 2014
  
  476a97dc
- CRAY - make job_container/cncu default when running on a Cray natively · dbf03e40
  Danny Auble authored May 08, 2014
  
  dbf03e40
- If an invalid assoc_ptr comes in don't use the id to verify it. · 2261d393
  Danny Auble authored May 09, 2014
  
  2261d393
- Fix two memory leaks in jobacct_gather_cgroup.c · 2a0197cb
  Martin Perry authored May 09, 2014
  
  2a0197cb
- Document default SelectTypeParameter value · ad2826a9
  Morris Jette authored May 09, 2014
```
Related to bug 795
```
  ad2826a9
08 May, 2014 4 commits
- Add init/fini fucnction defs to plugin web pages · 03af10f0
  jette authored May 08, 2014
  
  03af10f0
- Fix sinfo -R to print each node once · b5ace9a8
  Morris Jette authored May 07, 2014
```
Fix sinfo -R to print each down/drained node once, rather than once per
partition. This was broken in the sinfo change to process each partition's
information in a separate pthread.
```
  b5ace9a8
- Merge branch 'slurm-2.6' into slurm-14.03 · b3ef449c
  Morris Jette authored May 07, 2014
```
Conflicts:
	src/sinfo/sort.c
```
  b3ef449c
- Correct sinfo sort fields options · ff518ad1
  Morris Jette authored May 07, 2014
```
Correct sinfo --sort fields to match documentation: E => Reason,
H -> Reason Time (new), R -> Partition Name, u/U -> Reason user (new)
```
  ff518ad1
07 May, 2014 8 commits
- Fix slurm.spec for SLES11 SP3 · 73192e1e
  David Gloe authored May 07, 2014
```
it turns out SLES 11 SP3 (at least) defines it with a newline, so
this will be a problem for anyone building RPMs on that OS.
```
  73192e1e
- enforce job preemption GraceTime · b8d55249
  Morris Jette authored May 07, 2014
```
Without this patch, jobs with an infinite time limit would have
their preemption GraceTime ignored.
```
  b8d55249
- Add signal option to job submit --help · 5008145c
  Morris Jette authored May 07, 2014
```
For the salloc, sbatch and srun commands, report usage of the
--signal option when the user requests command help.
```
  5008145c
- Disable time limit reset for job being preempted · 52de11ac
  Morris Jette authored May 07, 2014
```
related to bug 789
```
  52de11ac
- CRAY - make switch/cray default when running on a Cray natively · 1c2200db
  Danny Auble authored May 07, 2014
  
  1c2200db
- Fix issue where not enforcing QOS but a partition either allows or denies · b6333a12
  Danny Auble authored May 06, 2014
```
them.
```
  b6333a12
- Remove vestigial xassert · e5fde679
  Morris Jette authored May 07, 2014
  
  e5fde679
- fix squeue job array combine logic · 1b02b5a1
  Morris Jette authored May 07, 2014
```
commit 8ddadea5 combined all
pending jobs, even if they had the special exit flag set. This
treats pending and special_exit state jobs differently. Only those
jobs with state pending (and NOT special_exit) are combined in
squeue.
```
  1b02b5a1
06 May, 2014 9 commits
- Start NEWS for v14.03.4 · b4f3f38d
  Morris Jette authored May 06, 2014
  
  b4f3f38d
- Update META for v14.03/3-2 tag · 26c17558
  Morris Jette authored May 06, 2014
  
  26c17558
- Improve documentation for prolog/epilog env vars · fadc5610
  Morris Jette authored May 06, 2014
  
  fadc5610
- update news for tag · 70d1e809
  Danny Auble authored May 06, 2014
  
  70d1e809
- Merge remote-tracking branch 'origin/slurm-2.6' into slurm-14.03 · 8b782849
  Danny Auble authored May 06, 2014
  
  8b782849
- BGQ - Fix issue with uninitialized variable. · 950a3fd6
  Danny Auble authored May 06, 2014
  
  950a3fd6
- No need to have this here, it is NULL · c3736a1b
  Danny Auble authored May 06, 2014
  
  c3736a1b
- Start NEWS for v14.03.4 · 3e95dc32
  Morris Jette authored May 05, 2014
  
  3e95dc32
- Update META for v14.03.3 · 67d83e66
  Morris Jette authored May 05, 2014
  
  67d83e66