Commits · 5e40f62719d5bff23b7cae5598a4a352a9f462e3 · Manuel G. Marciani / ces_slurm_simulator

12 May, 2014 5 commits
- Add core-spec count to job suspend/resume · 5e40f627
  Morris Jette authored May 12, 2014
  
  5e40f627
- Retry step create if node not responding · ffad3102
  Morris Jette authored May 12, 2014
```
If a job has non-responding node, retry job step create rather than
returning with DOWN node error.
bug 734
```
  ffad3102
- Cosmetic mods to NEWS · e17ffc1b
  Morris Jette authored May 12, 2014
  
  e17ffc1b
- Fix support for job --profile=none option · 043e1b08
  Puenlap Lee authored May 12, 2014
```
Also correct related documentation
```
  043e1b08
- fix of comp nodes causing backfill to end early · d508ea95
  Hongjia Cao authored May 12, 2014
```
Completing nodes is removed when calling _try_sched() for a job, which
is the case in select_nodes(). If _try_sched() thinks the job can run
now but select_nodes() returns ESLURM_NODES_BUSY, the backfill loop will
be ended.
```
  d508ea95
09 May, 2014 3 commits
- CRAY - make job_container/cncu default when running on a Cray natively · dbf03e40
  Danny Auble authored May 08, 2014
  
  dbf03e40
- If an invalid assoc_ptr comes in don't use the id to verify it. · 2261d393
  Danny Auble authored May 09, 2014
  
  2261d393
- Test for overlapping core-spec on job resume · 603ec844
  Morris Jette authored May 09, 2014
```
Do not resume a job with specialized cores on a node running another job
with specialized cores (only one can run at a time).
bug 792
```
  603ec844
08 May, 2014 2 commits

Fix sinfo -R to print each node once · b5ace9a8

Morris Jette authored May 07, 2014

Fix sinfo -R to print each down/drained node once, rather than once per
partition. This was broken in the sinfo change to process each partition's
information in a separate pthread.

b5ace9a8

Correct sinfo sort fields options · ff518ad1

Morris Jette authored May 07, 2014

Correct sinfo --sort fields to match documentation: E => Reason,
H -> Reason Time (new), R -> Partition Name, u/U -> Reason user (new)

ff518ad1

07 May, 2014 5 commits
- enforce job preemption GraceTime · b8d55249
  Morris Jette authored May 07, 2014
```
Without this patch, jobs with an infinite time limit would have
their preemption GraceTime ignored.
```
  b8d55249
- Disable time limit reset for job being preempted · 52de11ac
  Morris Jette authored May 07, 2014
```
related to bug 789
```
  52de11ac
- CRAY - make switch/cray default when running on a Cray natively · 1c2200db
  Danny Auble authored May 07, 2014
  
  1c2200db
- Fix issue where not enforcing QOS but a partition either allows or denies · b6333a12
  Danny Auble authored May 06, 2014
```
them.
```
  b6333a12
- Add ChosLoc configuration parameter · ffd393ba
  Morris Jette authored May 06, 2014
```
Added ChosLoc configuration parameter in slurm.conf (Chroot OS tool
location).
bug 685
```
  ffd393ba
06 May, 2014 6 commits
- Start NEWS for v14.03.4 · b4f3f38d
  Morris Jette authored May 06, 2014
  
  b4f3f38d
- update news for tag · 70d1e809
  Danny Auble authored May 06, 2014
  
  70d1e809
- BGQ - Fix issue with uninitialized variable. · 950a3fd6
  Danny Auble authored May 06, 2014
  
  950a3fd6
- Start NEWS for v14.03.4 · 3e95dc32
  Morris Jette authored May 05, 2014
  
  3e95dc32
- in slurm.spec, remove cray-mysql-devel requirement · f85e362c
  Morris Jette authored May 05, 2014
```
In slurm.spec file, replace "Requires cray-MySQL-devel-enterprise" with
"Requires mysql-devel" per David Gloe.
```
  f85e362c
- Permit all cpu_bind options with core_spec · 0acc7567
  Morris Jette authored May 05, 2014
```
Permit jobs steps full control over cpu_bind options if specialized cores
are included in the job allocation.
bug 782
```
  0acc7567
05 May, 2014 5 commits
- Fix perlapi to compile correctly with perl 5.18 · 21ebf585
  Danny Auble authored May 05, 2014
  
  21ebf585
- Handle node ranges better when dealing with accounting max node limits. · d849aadb
  Danny Auble authored May 05, 2014
  
  d849aadb
- BGQ - Move code to only start job on a block after limits are checked. · 3a4246cc
  Danny Auble authored May 05, 2014
```
Related to bug 771
```
  3a4246cc
- Correct default batch job output file name · 4334ab7d
  Morris Jette authored May 05, 2014
```
In version 14.03.2 was using "slurm_<jobid>_4294967294.out" due to
error in job array logic.
```
  4334ab7d
- BGQ - Fix issue where limits were checked on midplane counts instead of · 836b654f
  Danny Auble authored May 05, 2014
```
cnode counts.
```
  836b654f
02 May, 2014 4 commits
- Update NEWS for next version · 87080f15
  Danny Auble authored May 02, 2014
  
  87080f15
- Handle node ranges better when dealing with accounting max node limits. · c6833796
  Danny Auble authored May 02, 2014
```
This is for bug 775
```
  c6833796
- BGQ - Temp fix issue where job could be left on job_list after it finished. · e4f1a099
  Danny Auble authored May 02, 2014
  
  e4f1a099
- Fix issue where user is requesting --acctg-freq=0 and no memory limits. · 17e4e2ac
  Danny Auble authored May 02, 2014
  
  17e4e2ac
01 May, 2014 6 commits
- Fix allowgroup on bad group seg fault with the controller. · 76846134
  Danny Auble authored May 01, 2014
```
regression from 2a674aee
```
  76846134
- Fix issue with GrpCPURunMins if a job's timelimit is altered while the job · 8553f674
  Danny Auble authored Apr 30, 2014
```
is running.
```
  8553f674
- Fix issue where user is requesting --acctg-freq=0 and no memory limits. · 903161c8
  Danny Auble authored Apr 30, 2014
  
  903161c8
- Temporary fix for handling our typemap for the perl api with newer perl. · bffdc7e2
  Danny Auble authored May 01, 2014
  
  bffdc7e2
- Fix issue with GrpCPURunMins if a job's timelimit is altered while the job · 98de72e4
  Danny Auble authored Apr 30, 2014
```
is running.
```
  98de72e4
- Fix issue where user is requesting --acctg-freq=0 and no memory limits. · 0018cdf4
  Danny Auble authored Apr 30, 2014
  
  0018cdf4
30 Apr, 2014 3 commits

Correct squeue to not merge jobs with state pending and completing · 8ddadea5
David Bigagli authored Apr 30, 2014
```
together.
```
8ddadea5

switch/nrt - CAU and RMDA tracking correction · 6f66fdef

Morris Jette authored Apr 30, 2014

Switch/nrt - Properly track usage of CAU and RDMA resources with multiple
tasks per compute node. Previous logic would allocate resources once per
task and then deallocate once per node, leaking CMA and RDMA resources
and preventing their use by future jobs.

6f66fdef

ignore prio reset on held jobs · cbcea672

Morris Jette authored Apr 30, 2014

If a job is held, then only release it with the "scontrol release <jobid>"
command rather than a simple reset of the job's priority. This is needed to
support job arrays better. Otherwise a priority reset of a job array
would free all requeued/held jobs from that job array rather than
leaving them held.

cbcea672

29 Apr, 2014 1 commit

slurmd to cache launched job IDs · 653b247b

Morris Jette authored Apr 29, 2014

Modify slurmd to keep track of which jobs have already been launched.
It the launch is complete, then process suspend requests immediately.
Previously the suspend request was always delayed by 1 second, which
adversely impacts gang scheduling performance. If the job can't be
found (say after a slurmd restart), then delay the suspend by up
to 3 seconds, but only once.

653b247b