1. 15 Jul, 2015 4 commits
    • Nathan Yee's avatar
      9d7f1507
    • Morris Jette's avatar
      Preemption logic could hold job · 93efb1ec
      Morris Jette authored
      If a job can only be started by preempting other jobs, the old logic
        could report the error:
        "cons_res: sync loop not progressing, holding job #"
        due to the usable CPUs and GRES needed by the pending job not
        matching. This change prevents the error message and job hold
        when job preemption logic is being used. The error message and
        job hold still take place for job scheduling outside of preemption,
        which will match CPUs and GRES at the beginning.
      bug 1750
      93efb1ec
    • Morris Jette's avatar
      Prevent changing job HOLD reason set by select plugin · 8e8d80b3
      Morris Jette authored
      Under some conditions the select/cons_res plugin will hold a job,
        setting it's priority to zero and reason to HELD. The logic in
        slurmctld's main scheduling loop previously kept its priority
        at zero, but changed the reason from HELD to RESOURCES. This
        change leaves the proper job state as set by the select plugin.
      This may be related to bug 1750
      8e8d80b3
    • Morris Jette's avatar
      Prevent backfill scheduler overriding job hold · 54b258ec
      Morris Jette authored
      The backfill scheduler will periodically release locks for other
        actions. If a job is held during the time that locks were released,
        that job might still have been scheduled by the backfill scheduler
        (i.e. it failed to check for a job with a priority of zero).
      could be a root cause for bug 1750
      54b258ec
  2. 14 Jul, 2015 3 commits
  3. 13 Jul, 2015 3 commits
    • Morris Jette's avatar
      Don't purge completing job · c7226213
      Morris Jette authored
      Old logic could purge a job record for a job that was in
        completing state (if there was also a lot of agent threads).
        This change prevents purging job records for completing jobs.
      c7226213
    • Morris Jette's avatar
      job array update results in bad task ID · 29a52f60
      Morris Jette authored
      Fix to job array update logic that can result in a task ID of 4294967294.
      To reproduce:
      $ sbatch --exclusive -a 1,3,5 tmp
      Submitted batch job 11825
      $ scontrol update jobid=11825_[3,4,5] timelimit=3
      $ squeue
                   JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                 11825_3     debug      tmp    jette PD       0:00      1 (None)
                 11825_4     debug      tmp    jette PD       0:00      1 (None)
                 11825_5     debug      tmp    jette PD       0:00      1 (None)
                   11825     debug      tmp    jette PD       0:00      1 (Resources)
      A new job array entry was created for task ID 4 and the "master" job
      array record now has a task ID of 4294967294.
      The logic with the bug was using the wrong variable in a test.
      bug 1790
      29a52f60
    • Gene Soudlenkov's avatar
      Fix segfault when updating timelimit on jobarray task. · 0560d8b2
      Gene Soudlenkov authored
      Bug 1799
      0560d8b2
  4. 10 Jul, 2015 4 commits
  5. 09 Jul, 2015 1 commit
    • Morris Jette's avatar
      Change slurmctld threads count against limit · ad9c2413
      Morris Jette authored
      The slurmctld logic throttles some RPCs so that only one of them
      can execute at a time in order to reduce contention for the job,
      partition and node locks (only one of the effected RPCs can execute
      at any time anyway and this lets other RPC types run). While an
      RPC is stuck in the throttle function, do not count that thread
      against the slurmctld thread limit.
      but 1794
      ad9c2413
  6. 08 Jul, 2015 7 commits
  7. 07 Jul, 2015 6 commits
  8. 06 Jul, 2015 4 commits
  9. 03 Jul, 2015 1 commit
  10. 30 Jun, 2015 3 commits
  11. 29 Jun, 2015 1 commit
  12. 25 Jun, 2015 3 commits