Commits · 4f97cae845a71305e2d3fc3116b0350288f6919c · Manuel G. Marciani / ces_slurm_simulator

13 May, 2014 3 commits

Correct squeue job node & CPU counts on requeue · 4f97cae8

Morris Jette authored May 13, 2014

Correct squeue's job node and CPU counts for requeued jobs.
Previously, when a job was requeued, its CPU count reported
was that of the previous execution. When combined with the
--ntasks-per-node option, squeue would compute the expected
node count. If the --exclusive option is also used, the node
count reported by squeue could be off by a large margin (e.g.
"sbatch --exclusive --ntasks-per-node=1 -N1 .." on requeue
would use the number of CPUs on the allocated node to recompute
the expected node count).
bug 756

4f97cae8

Fix issue where batch cpuset wasn't looked at correctly in · c5728294
Danny Auble authored May 13, 2014
```
jobacct_gather/cgroup.
```
c5728294
Support non-standard slurm.conf path · 3bf2adcd
Morris Jette authored May 13, 2014
```
Support SLURM_CONF path which does not have "slurm.conf" as the file name.
bug 803
```
3bf2adcd

12 May, 2014 4 commits

Retry step create if node not responding · ffad3102

Morris Jette authored May 12, 2014

If a job has non-responding node, retry job step create rather than
returning with DOWN node error.
bug 734

ffad3102

Cosmetic mods to NEWS · e17ffc1b
Morris Jette authored May 12, 2014

e17ffc1b
Fix support for job --profile=none option · 043e1b08
Puenlap Lee authored May 12, 2014
```
Also correct related documentation
```
043e1b08

fix of comp nodes causing backfill to end early · d508ea95

Hongjia Cao authored May 12, 2014

Completing nodes is removed when calling _try_sched() for a job, which
is the case in select_nodes(). If _try_sched() thinks the job can run
now but select_nodes() returns ESLURM_NODES_BUSY, the backfill loop will
be ended.

d508ea95

09 May, 2014 2 commits
- CRAY - make job_container/cncu default when running on a Cray natively · dbf03e40
  Danny Auble authored May 08, 2014
  
  dbf03e40
- If an invalid assoc_ptr comes in don't use the id to verify it. · 2261d393
  Danny Auble authored May 09, 2014
  
  2261d393
08 May, 2014 2 commits

Fix sinfo -R to print each node once · b5ace9a8

Morris Jette authored May 07, 2014

Fix sinfo -R to print each down/drained node once, rather than once per
partition. This was broken in the sinfo change to process each partition's
information in a separate pthread.

b5ace9a8

Correct sinfo sort fields options · ff518ad1

Morris Jette authored May 07, 2014

Correct sinfo --sort fields to match documentation: E => Reason,
H -> Reason Time (new), R -> Partition Name, u/U -> Reason user (new)

ff518ad1

07 May, 2014 4 commits
- enforce job preemption GraceTime · b8d55249
  Morris Jette authored May 07, 2014
```
Without this patch, jobs with an infinite time limit would have
their preemption GraceTime ignored.
```
  b8d55249
- Disable time limit reset for job being preempted · 52de11ac
  Morris Jette authored May 07, 2014
```
related to bug 789
```
  52de11ac
- CRAY - make switch/cray default when running on a Cray natively · 1c2200db
  Danny Auble authored May 07, 2014
  
  1c2200db
- Fix issue where not enforcing QOS but a partition either allows or denies · b6333a12
  Danny Auble authored May 06, 2014
```
them.
```
  b6333a12
06 May, 2014 5 commits
- Start NEWS for v14.03.4 · b4f3f38d
  Morris Jette authored May 06, 2014
  
  b4f3f38d
- update news for tag · 70d1e809
  Danny Auble authored May 06, 2014
  
  70d1e809
- BGQ - Fix issue with uninitialized variable. · 950a3fd6
  Danny Auble authored May 06, 2014
  
  950a3fd6
- Start NEWS for v14.03.4 · 3e95dc32
  Morris Jette authored May 05, 2014
  
  3e95dc32
- in slurm.spec, remove cray-mysql-devel requirement · f85e362c
  Morris Jette authored May 05, 2014
```
In slurm.spec file, replace "Requires cray-MySQL-devel-enterprise" with
"Requires mysql-devel" per David Gloe.
```
  f85e362c
05 May, 2014 5 commits
- Fix perlapi to compile correctly with perl 5.18 · 21ebf585
  Danny Auble authored May 05, 2014
  
  21ebf585
- Handle node ranges better when dealing with accounting max node limits. · d849aadb
  Danny Auble authored May 05, 2014
  
  d849aadb
- BGQ - Move code to only start job on a block after limits are checked. · 3a4246cc
  Danny Auble authored May 05, 2014
```
Related to bug 771
```
  3a4246cc
- Correct default batch job output file name · 4334ab7d
  Morris Jette authored May 05, 2014
```
In version 14.03.2 was using "slurm_<jobid>_4294967294.out" due to
error in job array logic.
```
  4334ab7d
- BGQ - Fix issue where limits were checked on midplane counts instead of · 836b654f
  Danny Auble authored May 05, 2014
```
cnode counts.
```
  836b654f
02 May, 2014 4 commits
- Update NEWS for next version · 87080f15
  Danny Auble authored May 02, 2014
  
  87080f15
- Handle node ranges better when dealing with accounting max node limits. · c6833796
  Danny Auble authored May 02, 2014
```
This is for bug 775
```
  c6833796
- BGQ - Temp fix issue where job could be left on job_list after it finished. · e4f1a099
  Danny Auble authored May 02, 2014
  
  e4f1a099
- Fix issue where user is requesting --acctg-freq=0 and no memory limits. · 17e4e2ac
  Danny Auble authored May 02, 2014
  
  17e4e2ac
01 May, 2014 4 commits
- Fix allowgroup on bad group seg fault with the controller. · 76846134
  Danny Auble authored May 01, 2014
```
regression from 2a674aee
```
  76846134
- Temporary fix for handling our typemap for the perl api with newer perl. · bffdc7e2
  Danny Auble authored May 01, 2014
  
  bffdc7e2
- Fix issue with GrpCPURunMins if a job's timelimit is altered while the job · 98de72e4
  Danny Auble authored Apr 30, 2014
```
is running.
```
  98de72e4
- Fix issue where user is requesting --acctg-freq=0 and no memory limits. · 0018cdf4
  Danny Auble authored Apr 30, 2014
  
  0018cdf4
30 Apr, 2014 3 commits

Correct squeue to not merge jobs with state pending and completing · 8ddadea5
David Bigagli authored Apr 30, 2014
```
together.
```
8ddadea5

switch/nrt - CAU and RMDA tracking correction · 6f66fdef

Morris Jette authored Apr 30, 2014

Switch/nrt - Properly track usage of CAU and RDMA resources with multiple
tasks per compute node. Previous logic would allocate resources once per
task and then deallocate once per node, leaking CMA and RDMA resources
and preventing their use by future jobs.

6f66fdef

ignore prio reset on held jobs · cbcea672

Morris Jette authored Apr 30, 2014

If a job is held, then only release it with the "scontrol release <jobid>"
command rather than a simple reset of the job's priority. This is needed to
support job arrays better. Otherwise a priority reset of a job array
would free all requeued/held jobs from that job array rather than
leaving them held.

cbcea672

28 Apr, 2014 3 commits
- Fix segfault of sacct -c if spaces are in the variables. · 61641594
  Danny Auble authored Apr 28, 2014
  
  61641594
- Fix sacct -c when using jobcomp/filetxt to read variables that were added · d6ab20b7
  Danny Auble authored Apr 28, 2014
```
in 2.0 :)
```
  d6ab20b7
- Honor partition priorities over job priorities. · b36f83cf
  Morris Jette authored Apr 28, 2014
```
Previously partition priority was only considered when used as a
component of a job's priority with the priority/multifactor plugin.
Now the partition priority is considered first, as documented,
and the job priority is considered second.
bug 764
```
  b36f83cf
26 Apr, 2014 1 commit
- Add --priority to job submit commands · 71aca8a8
  Stuart Midgley authored Apr 25, 2014
```
Add --priority option to the salloc, sbatch and srun commands.
```
  71aca8a8