Commits · 9cddcaf9eaa12ddf647d74a49ef6a4896eab47d6 · Manuel G. Marciani / ces_slurm_simulator

23 Aug, 2013 2 commits

Clarify equivalent sacct options in man page · 9cddcaf9
Morris Jette authored Aug 23, 2013

9cddcaf9

Correct value of min_nodes returned by loading job info · 98e24b0d

Morris Jette authored Aug 23, 2013

This is a correction of a bug introduced in commit
https://github.com/SchedMD/slurm/commit/ac44db862c8d1f460e55ad09017d058942ff6499
That commit eliminated the need of reading the node state information
from squeue for performance reasons (mostly for large parallel systems
in which the Prolog ran squeue, which generates a lot of simultaneous
RPCs, slowing down the job launch process). It also assumed 1 CPU per
node. If a pending job specified a node count of 1 and a task count
larger than one, squeue was reporting the node count of the job as
the same as the task count. This patch moves that same calculation
of a pending job's minimum node count into slurmctld, so the squeue
still does not need to read the node information, but can report the
correct node count for pending jobs with minimal overhead.

98e24b0d

22 Aug, 2013 8 commits
- Avoid same name in 2 different header files. run_backup is defined · 0f94cba0
  Danny Auble authored Aug 22, 2013
```
in slurmctld.h which is included by slurm_accounting_storage.h which is
included by slurmdbd.c which would cause confusion at the very least.
```
  0f94cba0
- BackupController - Make sure we have a connection to the DBD first thing · 8e3ab25f
  Danny Auble authored Aug 22, 2013
```
to avoid it thinking we don't have a cluster name.
```
  8e3ab25f
- Merge branch 'slurm-2.6' of https://github.com/SchedMD/slurm into slurm-2.6 · 4b027a88
  jette authored Aug 22, 2013
  
  4b027a88
- Enable faster termination of backup controller on SIGTERM · 86baeefb
  jette authored Aug 22, 2013
```
Previously there was a sleep(5) during which the backup controller
was non responsive during its startup mode or returning from primary
mode.
```
  86baeefb
- Clear some pthread values after join · c962ef6e
  jette authored Aug 22, 2013
```
This will prevent possible confusion for the backup controller
when it switches from primary back to backup modes since those
pthread IDs are no longer value. Note the thred_id_rpc could be
used by the backup controller after returning to backup mode
```
  c962ef6e
- Add description of SSSD use with Slurm on FAQ · 6c339b2c
  Morris Jette authored Aug 21, 2013
  
  6c339b2c
- News for last update · 7da8e149
  Danny Auble authored Aug 21, 2013
  
  7da8e149
- Allow users who are coordinators update their own limits in the accounts · 673e5f40
  Danny Auble authored Aug 21, 2013
```
they are coordinators over.
```
  673e5f40
21 Aug, 2013 2 commits

Fix of wrong node/job state problem after reconfig · d80c8667

Hongjia Cao authored Aug 21, 2013

If there are completing jobs, a reconfigure will set wrong job/node
state: all nodes of the completing job will be set allocated, and the
job will not be removed even if the completing nodes are released. The
state can only be restored by restarting slurmctld after the completing
nodes released.

d80c8667

Add links to showq command from TACC · 934cbbbf
Morris Jette authored Aug 21, 2013

934cbbbf

20 Aug, 2013 4 commits
- Fix issue with reconfig and GrpCPURunMins · 6d793189
  Danny Auble authored Aug 20, 2013
  
  6d793189
- Remove unneeded checks (code was remove a while ago) · 6e22ae26
  Danny Auble authored Aug 20, 2013
  
  6e22ae26
- Remove some trailing spaces on web page · 770fb392
  Morris Jette authored Aug 19, 2013
  
  770fb392
- Add link for Alan Orth's interactive script · 896608ad
  jette authored Aug 19, 2013
  
  896608ad
19 Aug, 2013 1 commit
- Add upgrade section to FAQ · 4be8a9ca
  jette authored Aug 18, 2013
  
  4be8a9ca
18 Aug, 2013 1 commit
- Major updates to admin guide for upgrades · c594a2b0
  jette authored Aug 18, 2013
  
  c594a2b0
17 Aug, 2013 5 commits
- Start NEWS for v2.6.2 · 9f334c91
  Morris Jette authored Aug 16, 2013
  
  9f334c91
- Update META for v2.6.1 tag · fe46ab52
  Morris Jette authored Aug 16, 2013
  
  fe46ab52
- Add FAQ about use of FOSS · 3e497a6f
  Morris Jette authored Aug 16, 2013
  
  3e497a6f
- Correct name in publication web page · 90da197b
  Morris Jette authored Aug 16, 2013
  
  90da197b
- Slightly better debugging · 124f9aaa
  Danny Auble authored Aug 16, 2013
  
  124f9aaa
16 Aug, 2013 5 commits
- Protocol compatibility fix. · 94c67627
  David Bigagli authored Aug 16, 2013
  
  94c67627
- Sanity check · d1056efb
  Danny Auble authored Aug 16, 2013
  
  d1056efb
- Minor spelling corrections · 07a4c03c
  Danny Auble authored Aug 16, 2013
  
  07a4c03c
- Fix issue with last patch dealing with deadlock from 2.5 slurmstepd · 890b6d38
  Danny Auble authored Aug 16, 2013
```
-> 2.6 slurmd
```
  890b6d38
- Fix issue with a 2.5 slurmstepd locking up when talking to a 2.6 slurmd. · e804c9bb
  Danny Auble authored Aug 15, 2013
  
  e804c9bb
15 Aug, 2013 5 commits
- CRAY - fix issue with accelerators on a cray when parsing BASIL 1.3 XML. · c30fe1b3
  Danny Auble authored Aug 15, 2013
  
  c30fe1b3
- Fix issue with potentially referencing past an array in parse_time() · 2833c19a
  Danny Auble authored Aug 15, 2013
  
  2833c19a
- Fix in accounting_storage/filetxt to correct start times which sometimes · 9eba4384
  Danny Auble authored Aug 15, 2013
```
could end up before the job started. Bug 371
```
  9eba4384
- Fixed deadlock issue with new priority fix · 9dd3e445
  Danny Auble authored Aug 15, 2013
  
  9dd3e445
- Fix CPURunMins if a job is requeued from a failed launch. · 8aaa817e
  Danny Auble authored Aug 14, 2013
  
  8aaa817e
14 Aug, 2013 7 commits
- Merge branch 'slurm-2.6' of https://github.com/SchedMD/slurm into slurm-2.6 · 27bf1bbd
  jette authored Aug 14, 2013
  
  27bf1bbd
- Change test due to accounting frequency change · 2f65854c
  jette authored Aug 14, 2013
```
We now reject jobs with an invalid accounting frequency at
submit time rather than launch time, so the error is slightly
different and the test needs to change for that.
```
  2f65854c
- Fairly major update to Consumable Resources web page · d97bd588
  Morris Jette authored Aug 14, 2013
  
  d97bd588
- Validate a job's accounting frequency at submission time · 26560fa5
  Morris Jette authored Aug 14, 2013
```
This avoids waiting for the job's initiation to fail.
```
  26560fa5
- Do not drain a node if a job's accounting frequency is bad · df4507cc
  Morris Jette authored Aug 14, 2013
```
Only cancel the job.
```
  df4507cc
- Fix job state recovery logic for accounting frequency · 6d878aa7
  Morris Jette authored Aug 14, 2013
```
Fix job state recovery logic in which a job's accounting frequency was
not set. This would result in a value of 65534 seconds being used (the
equivalent of NO_VAL in uint16_t), which could result in the job being
requeued or aborted.
```
  6d878aa7
- Merge branch 'slurm-2.5' into slurm-2.6 · aece6880
  Morris Jette authored Aug 14, 2013
  
  aece6880