1. 05 Aug, 2014 5 commits
    • Morris Jette's avatar
      step record purge fix · daa1ccf9
      Morris Jette authored
      This corrects logic introduced yesterday in commit
      6f89dc9d which introduced a double
      free of step records, at least on job requeue.
      bug 1012
      daa1ccf9
    • Morris Jette's avatar
      Added comments · 00d66a2a
      Morris Jette authored
      Describe restrictions on specific job and step record purging functions
      with respect to "cleaning" flag used for Node Health Check on Cray systems.
      00d66a2a
    • Morris Jette's avatar
      call select_g_step_finish() even for finished jobs · 6f89dc9d
      Morris Jette authored
      Always call select_g_step_finish() when terminating a job step,
      even if the job is also being terminated. This is needed for Cray
      systems.
      bug 1012
      6f89dc9d
    • Morris Jette's avatar
      requeue state mode · d040244d
      Morris Jette authored
      When a job is requeued, call deallocate_nodes() with a job state
      if COMPLETING. Previously it was called with a state of JOB_REQUEUE,
      which could be problematic for step complete function calls (which
      I am working on fixing now).
      d040244d
    • Morris Jette's avatar
      Refactor step complete logic · 6765d317
      Morris Jette authored
      Remove some duplicate code. No change in functionality.
      6765d317
  2. 04 Aug, 2014 6 commits
    • Morris Jette's avatar
      Simple purge of step list with job · 6fe300dd
      Morris Jette authored
      When a job record is purged, simply purge the step list rather than possibly invoking a node health check on Cray systems.
      6fe300dd
    • Morris Jette's avatar
      Add function that purges step list · fc2cc171
      Morris Jette authored
      No checking or other operations are performed on this list, just a purge.
      fc2cc171
    • Morris Jette's avatar
      Re-use of active job ID error · 2f399247
      Morris Jette authored
      If an attempt is made to submit a job explicitly using a job ID that already exists, then do not try to purge and re-use it, but return an error. The slow clean-up of job steps on Cray systems due to node health check makes me wary of preserving the existing code. Returning an error seems a safer option.
      2f399247
    • Morris Jette's avatar
      refactor job step delete logic · 4f2b7d3d
      Morris Jette authored
      Call delete_step_records() before clearing the job's JOB_COMPLETING
      state flag. This would make a difference in the case of jobs automatically
      requeued based upon their exit code, but probably not in other cases.
      Also in the select plugins, check not only for a job state of JOB_COMPLETING,
      but also FINISHED states. In either case, we are not in a position to
      gracefully clean up the step.
      4f2b7d3d
    • Morris Jette's avatar
      Purely cosmetic mods, comments, etc. · 67fd6876
      Morris Jette authored
      67fd6876
    • Morris Jette's avatar
      CPU frequency set race condition fix · 760d94a5
      Morris Jette authored
      Fix race condition in CPU frequency set with job preemption.
      When the preemptor job completed, it would notify the srun, which
      would notify the slurmctld, which could resume a preempted job.
      That preempted job could reset the CPU frequency before the
      preemptor. This change has the slurmstepd resetting a job's
      CPU frequency prior to notifying srun of completion, which
      eliminates the race condition.
      bug 1011
      760d94a5
  3. 02 Aug, 2014 1 commit
  4. 01 Aug, 2014 11 commits
  5. 31 Jul, 2014 5 commits
  6. 30 Jul, 2014 9 commits
  7. 29 Jul, 2014 3 commits