Commits · 799c7372841440f3709ae09434cf96716e527f26 · Manuel G. Marciani / ces_slurm_simulator

14 May, 2014 4 commits
- Merge branch 'slurm-14.03' · 799c7372
  Morris Jette authored May 14, 2014
```
Conflicts:
	src/slurmctld/job_scheduler.c
```
  799c7372
- Run EpilogSlurmctld for jobs killed in reconfig · 87128cf0
  Morris Jette authored May 14, 2014
```
Run EpilogSlurmctld for a job is killed during slurmctld reconfiguration.
bug 806
```
  87128cf0
- Cosmetic mods · d76b4a60
  Morris Jette authored May 14, 2014
  
  d76b4a60
- Jobs hidden only if ALL partitions are hidden · 5fc21da2
  Morris Jette authored May 14, 2014
```
Only if ALL of their partitions are hidden will a job be hidden by default.
bug 812
```
  5fc21da2
13 May, 2014 12 commits

More gracefully handle batch launch failure · a73012bc

Morris Jette authored May 13, 2014

If a batch job launch request can not be built (the script file
is missing, a credential can not be created, or the user does
not exist on the selected compute node), then cancel the job
in a graceful fashion. Previously, the bad RPC would be sent to
the compute node and that node DRAINED.
see bug 807

a73012bc

Merge branch 'slurm-14.03' · b5d855f4
Morris Jette authored May 13, 2014

b5d855f4

Correct CR_LLN with node selection by job · 899561b1

Morris Jette authored May 13, 2014

Correct SelectTypeParameters=CR_LLN with job selecition of specific nodes.
Previous logic would in most instances allocate resources on all nodes
to the job.

899561b1

Correct squeue job node & CPU counts on requeue · 4f97cae8

Morris Jette authored May 13, 2014

Correct squeue's job node and CPU counts for requeued jobs.
Previously, when a job was requeued, its CPU count reported
was that of the previous execution. When combined with the
--ntasks-per-node option, squeue would compute the expected
node count. If the --exclusive option is also used, the node
count reported by squeue could be off by a large margin (e.g.
"sbatch --exclusive --ntasks-per-node=1 -N1 .." on requeue
would use the number of CPUs on the allocated node to recompute
the expected node count).
bug 756

4f97cae8

when compiling with -Werror and -Warray-bounds: · f528358f

David Gloe authored May 13, 2014

req.c: In function ‘_launch_complete_rm’:
req.c:5372: error: array subscript is above array bounds
req.c: In function ‘_launch_complete_add’:
req.c:5328: error: array subscript is above array bounds

The lines are
if (job_id != active_job_id[j]) {

after the for loops in those functions.

If no match is found in the loop, j will be JOB_STATE_CNT and overflow the
array by one.

f528358f

Merge branch 'slurm-14.03' · d22a6b48
Morris Jette authored May 13, 2014

d22a6b48
Fix issue where batch cpuset wasn't looked at correctly in · c5728294
Danny Auble authored May 13, 2014
```
jobacct_gather/cgroup.
```
c5728294
Support non-standard slurm.conf path · 3bf2adcd
Morris Jette authored May 13, 2014
```
Support SLURM_CONF path which does not have "slurm.conf" as the file name.
bug 803
```
3bf2adcd
Expand log messages · 0f457b94
Morris Jette authored May 13, 2014

0f457b94
Add limits hierachy documentation · b2cbe311
Morris Jette authored May 13, 2014

b2cbe311

Avoid sending node id -1 for nested batch job · 07794026

Morris Jette authored May 12, 2014

For a nested batch job (within an salloc, run "sbatch --jobid=$SLURM_JOBID ..."),
report the completing node rank as 0, rather than -1

07794026

Cosmetic changes · e20b3574
Morris Jette authored May 12, 2014

e20b3574

12 May, 2014 13 commits
- parallelize suspend/resume of job steps · 4db7a6ba
  Morris Jette authored May 12, 2014
  
  4db7a6ba
- Add core-spec count to job suspend/resume · 5e40f627
  Morris Jette authored May 12, 2014
  
  5e40f627
- Merge branch 'slurm-14.03' · 8280c768
  Morris Jette authored May 12, 2014
  
  8280c768
- Retry step create if node not responding · ffad3102
  Morris Jette authored May 12, 2014
```
If a job has non-responding node, retry job step create rather than
returning with DOWN node error.
bug 734
```
  ffad3102
- Cosmetic mods to NEWS · e17ffc1b
  Morris Jette authored May 12, 2014
  
  e17ffc1b
- Merge branch 'slurm-14.03' · ff45c551
  Morris Jette authored May 12, 2014
  
  ff45c551
- Merge branch 'slurm-2.6' into slurm-14.03 · f2372034
  Morris Jette authored May 12, 2014
  
  f2372034
- Fix support for job --profile=none option · 043e1b08
  Puenlap Lee authored May 12, 2014
```
Also correct related documentation
```
  043e1b08
- Merge branch 'slurm-14.03' · 551e556d
  Morris Jette authored May 12, 2014
  
  551e556d
- Make test suite more robust · 6e0ac7dd
  Nathan Yee authored May 12, 2014
```
Add force option to all file removals ("rm ..." to "rm -f ...").
bug 673
```
  6e0ac7dd
- Merge branch 'slurm-14.03' · e5c1a897
  Morris Jette authored May 12, 2014
  
  e5c1a897
- Merge branch 'slurm-2.6' into slurm-14.03 · 455f94f4
  Morris Jette authored May 12, 2014
  
  455f94f4
- fix of comp nodes causing backfill to end early · d508ea95
  Hongjia Cao authored May 12, 2014
```
Completing nodes is removed when calling _try_sched() for a job, which
is the case in select_nodes(). If _try_sched() thinks the job can run
now but select_nodes() returns ESLURM_NODES_BUSY, the backfill loop will
be ended.
```
  d508ea95
09 May, 2014 10 commits
- Merge remote-tracking branch 'origin/slurm-14.03' · a774c5d3
  Danny Auble authored May 09, 2014
  
  a774c5d3
- Merge remote-tracking branch 'origin/slurm-2.6' into slurm-14.03 · 476a97dc
  Danny Auble authored May 09, 2014
  
  476a97dc
- CRAY - make job_container/cncu default when running on a Cray natively · dbf03e40
  Danny Auble authored May 08, 2014
  
  dbf03e40
- If an invalid assoc_ptr comes in don't use the id to verify it. · 2261d393
  Danny Auble authored May 09, 2014
  
  2261d393
- Merge branch 'slurm-14.03' · 2c87f902
  Morris Jette authored May 09, 2014
  
  2c87f902
- Fix two memory leaks in jobacct_gather_cgroup.c · 2a0197cb
  Martin Perry authored May 09, 2014
  
  2a0197cb
- Test for overlapping core-spec on job resume · 603ec844
  Morris Jette authored May 09, 2014
```
Do not resume a job with specialized cores on a node running another job
with specialized cores (only one can run at a time).
bug 792
```
  603ec844
- Merge pull request #70 from RPI-HPC/fixes-for-slurm-14.03 · b2ef026f
  Morris Jette authored May 09, 2014
```
Fix dead initialization and memory leak in nonstop
```
  b2ef026f
- Merge branch 'slurm-14.03' · e6a96399
  Morris Jette authored May 09, 2014
```
Conflicts:
	doc/man/man5/slurm.conf.5
```
  e6a96399
- Document default SelectTypeParameter value · ad2826a9
  Morris Jette authored May 09, 2014
```
Related to bug 795
```
  ad2826a9
08 May, 2014 1 commit
- Merge branch 'slurm-14.03' · c84f70db
  jette authored May 08, 2014
  
  c84f70db