Commits · 195d4cc844ba7542ab66d28f0064b95d937fd7a7 · Manuel G. Marciani / ces_slurm_simulator

16 Jul, 2015 3 commits
- Add NEWS for previous commit · 195d4cc8
  Morris Jette authored Jul 16, 2015
  
  195d4cc8
- Prevent a job array task_id being set to NO_VAL · b5d988a4
  Morris Jette authored Jul 16, 2015
```
Under some conditions if an attempt to schedule the last task of
  a job array (the meta-record of the job array) fails, it's
  task ID will be changed from the appropriate value to NO_VAL.
bug 1790
```
  b5d988a4
- Improved job array info logging · 351fb480
  Morris Jette authored Jul 16, 2015
  
  351fb480
15 Jul, 2015 7 commits

squeue: Enable filtering for job state SPECIAL_EXIT · d4d51de7
Morris Jette authored Jul 15, 2015

d4d51de7
squeue: Removed the new line from job array ID · 8247cb26
Nathan Yee authored Jul 15, 2015

8247cb26
Remove whitespace. · 855ab97c
Nathan Yee authored Jul 15, 2015

855ab97c
Fix plane distribution to allocate in blocks rather than cyclically. · 9d7f1507
Nathan Yee authored Jul 15, 2015
```
Bug 1798
```
9d7f1507

Preemption logic could hold job · 93efb1ec

Morris Jette authored Jul 15, 2015

If a job can only be started by preempting other jobs, the old logic
  could report the error:
  "cons_res: sync loop not progressing, holding job #"
  due to the usable CPUs and GRES needed by the pending job not
  matching. This change prevents the error message and job hold
  when job preemption logic is being used. The error message and
  job hold still take place for job scheduling outside of preemption,
  which will match CPUs and GRES at the beginning.
bug 1750

93efb1ec

Prevent changing job HOLD reason set by select plugin · 8e8d80b3

Morris Jette authored Jul 14, 2015

Under some conditions the select/cons_res plugin will hold a job,
  setting it's priority to zero and reason to HELD. The logic in
  slurmctld's main scheduling loop previously kept its priority
  at zero, but changed the reason from HELD to RESOURCES. This
  change leaves the proper job state as set by the select plugin.
This may be related to bug 1750

8e8d80b3

Prevent backfill scheduler overriding job hold · 54b258ec

Morris Jette authored Jul 14, 2015

The backfill scheduler will periodically release locks for other
  actions. If a job is held during the time that locks were released,
  that job might still have been scheduled by the backfill scheduler
  (i.e. it failed to check for a job with a priority of zero).
could be a root cause for bug 1750

54b258ec

14 Jul, 2015 3 commits
- CRAY - Fix seg fault if a blade is replaced and slurmctld is restarted. · 43d0ad6f
  Danny Auble authored Jul 14, 2015
  
  43d0ad6f
- Job array update fix · e2987cf8
  Morris Jette authored Jul 14, 2015
```
Previous logic could fail to update some tasks of a job array for
  some fields.
bug 1777
```
  e2987cf8
- Increase topology info logging detail · 959982dd
  Morris Jette authored Jul 13, 2015
```
Add level to switch table information logged by select plugin
```
  959982dd
13 Jul, 2015 3 commits

Don't purge completing job · c7226213

Morris Jette authored Jul 13, 2015

Old logic could purge a job record for a job that was in
  completing state (if there was also a lot of agent threads).
  This change prevents purging job records for completing jobs.

c7226213

job array update results in bad task ID · 29a52f60

Morris Jette authored Jul 13, 2015

Fix to job array update logic that can result in a task ID of 4294967294.
To reproduce:
$ sbatch --exclusive -a 1,3,5 tmp
Submitted batch job 11825
$ scontrol update jobid=11825_[3,4,5] timelimit=3
$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           11825_3     debug      tmp    jette PD       0:00      1 (None)
           11825_4     debug      tmp    jette PD       0:00      1 (None)
           11825_5     debug      tmp    jette PD       0:00      1 (None)
             11825     debug      tmp    jette PD       0:00      1 (Resources)
A new job array entry was created for task ID 4 and the "master" job
array record now has a task ID of 4294967294.
The logic with the bug was using the wrong variable in a test.
bug 1790

29a52f60

Fix segfault when updating timelimit on jobarray task. · 0560d8b2
Gene Soudlenkov authored Jul 13, 2015
```
Bug 1799
```
0560d8b2

10 Jul, 2015 4 commits

Remove additiional slurmctld parallelism · e0e15234

Morris Jette authored Jul 10, 2015

remove new capabilities added in comit ad9c2413
Leave the new logic only in version 15.08, which has related
performance improvements in the slurmctld agent code, see commit
53534f49

e0e15234

Correct sdiag backfill cycle time · bee8cd21

Morris Jette authored Jul 10, 2015

Correct "sdiag" backfill cycle time calculation if it yields locks. A
    microsecond value was being treated as a second value resulting in an
    overflow in the calcuation.
bug 1788

bee8cd21

change web page format · 3933be5c
Morris Jette authored Jul 10, 2015

3933be5c
Add link to EDF Slurm-web · 0f1195e5
Morris Jette authored Jul 10, 2015

0f1195e5

09 Jul, 2015 1 commit

Change slurmctld threads count against limit · ad9c2413

Morris Jette authored Jul 09, 2015

The slurmctld logic throttles some RPCs so that only one of them
can execute at a time in order to reduce contention for the job,
partition and node locks (only one of the effected RPCs can execute
at any time anyway and this lets other RPC types run). While an
RPC is stuck in the throttle function, do not count that thread
against the slurmctld thread limit.
but 1794

ad9c2413

08 Jul, 2015 7 commits
- Update typo in SLUG15 agenda · f4d36b88
  Morris Jette authored Jul 08, 2015
  
  f4d36b88
- Remove unused (confusing) enum · e4a1e239
  Danny Auble authored Jul 08, 2015
  
  e4a1e239
- Start NEWS for v14.11.9 · 527b61ec
  Morris Jette authored Jul 07, 2015
  
  527b61ec
- Update META for v14.11.8 tag · b341cac6
  Morris Jette authored Jul 07, 2015
  
  b341cac6
- Add SLUG15 agenda page · 14e99d36
  Morris Jette authored Jul 07, 2015
  
  14e99d36
- Remove SLUG registration page, redundant · 65763d75
  Morris Jette authored Jul 07, 2015
  
  65763d75
- Add job update comment about ordering · e4144bc2
  Morris Jette authored Jul 07, 2015
  
  e4144bc2
07 Jul, 2015 6 commits

Fix test to use new function format · bd266f87
Danny Auble authored Jul 07, 2015

bd266f87

Update job's QOS before partition · f2faa213

Trey Dockendorf authored Jul 07, 2015

This patch moves the QOS update of an existing job to be before the
partition update. This ensures a new QOS value is the value used when
doing validations against things like a partition's AllowQOS and DenyQOS.

Currently if a two partitions have AllowQOS that do not share any QOS,
the order of updates prevents a job from being moved from one partition
to another using something like the following:

scontrol update job=<jobID> partition=<new part> qos=<new qos>

f2faa213

Fix typo from last commit · 941cd847
Danny Auble authored Jul 07, 2015

941cd847
Update documentation about the 2 locations and functions of TrackWCKey. · df1e032d
Danny Auble authored Jul 07, 2015

df1e032d
Fix the scontrol man page describing the release argument · 2053cbb3
David Bigagli authored Jul 07, 2015

2053cbb3

Correct pack node logic · 0e0c64de

Morris Jette authored Jul 06, 2015

Correct task layout with CR_Pack_Node option and more than 1 CPU per task.
Previous logic would place one task per CPU launch too few tasks.
bug 1781

0e0c64de

06 Jul, 2015 4 commits

Fix test to work if some nodes are allocated to other nodes. · c4a7b4d6
Nathan Yee authored Jul 06, 2015

c4a7b4d6
add available_nodes_hostnames function to globals. · f6da81b2
Nathan Yee authored Jul 06, 2015

f6da81b2

scheduler/backfill enhancements · edfbabe6

Morris Jette authored Jul 06, 2015

Backfill scheduler now considers OverTimeLimit and KillWait configuration
parameters to estimate when running jobs will exit. Initially the job's
end time is estimated based upon it's time limit. After the time limit
is reached, the end time estimate is based upon the OverTimeLimit and
KillWait configuration parameters.
bug 1774

edfbabe6

Add backfill scheduler timeout · 7e944220

Morris Jette authored Jul 06, 2015

Backfill scheduler: The configured backfill_interval value (default 30
    seconds) is now interpretted as a maximum run time for the backfill
    scheduler. Once reached, the scheduler will build a new job queue and
    start over, even if not all jobs have been tested.
bub 1774

7e944220

03 Jul, 2015 1 commit
- clarify bf_min_age_reserve config parameter · 8fef7dff
  Morris Jette authored Jul 02, 2015
  
  8fef7dff
30 Jun, 2015 1 commit
- Display error message when attempting to modify priority of a held job. · d798caa9
  Thomas Cadeau authored Jun 30, 2015
```
Bug 1745
```
  d798caa9