1. 16 Mar, 2016 2 commits
    • Morris Jette's avatar
      Send burst buffer teardown immediately · d85cdcc7
      Morris Jette authored
      Generate burst buffer use completion email immediately afer teardown
          completes rather than at job purge time (likely minutes later).
      bug 2539
      d85cdcc7
    • Morris Jette's avatar
      Modify burst buffer stage out message · fae4c3d3
      Morris Jette authored
      Change burst buffer use completion message from
          "SLURM Job_id=1360353 Name=tmp Staged Out, StageOut time 00:01:47" to
          "SLURM Job_id=1360353 Name=tmp StageOut/Teardown time 00:01:47"
      fae4c3d3
  2. 15 Mar, 2016 2 commits
  3. 14 Mar, 2016 2 commits
  4. 11 Mar, 2016 1 commit
  5. 10 Mar, 2016 1 commit
  6. 09 Mar, 2016 2 commits
    • Morris Jette's avatar
      cray job requeue bug · fec5e03b
      Morris Jette authored
      Fix Cray NHC spawning on job requeue. Previous logic would leave nodes
      allocated to a requeued job as non-usable on job termination.
      
      Specifically, each job has a "cleaning/cleaned" flag. Once a job
      terminates, the cleaning flag is set, then after the job node health
      check completes, the value gets set to cleaned. If the job is requeued,
      on its second (or subsequent) termination, the select/cray plugin
      is called to launch the NHC. The plugin sees the "cleaned" flag
      already set, it then logs:
      error: select_p_job_fini: Cleaned flag already set for job 1283858, this should never happen
      and returns, never launching the NHC. Since the termination of the
      job NHC triggers releasing job resources (CPUs, memory, and GRES),
      those resources are never released for use by other jobs.
      
      Bug 2384
      fec5e03b
    • David Gloe's avatar
      Correctly parse nids in slurmconfgen_smw.py · 88ccc111
      David Gloe authored
      An error in slurmconfgen_smw.py caused it to parse the nic as the nid.
      On some systems those values differ, causing the generated slurm.conf file to
      be incorrect.
      
      Bug 2532.
      88ccc111
  7. 08 Mar, 2016 2 commits
  8. 05 Mar, 2016 1 commit
  9. 04 Mar, 2016 1 commit
  10. 03 Mar, 2016 4 commits
  11. 02 Mar, 2016 2 commits
  12. 01 Mar, 2016 2 commits
    • Tim Wickberg's avatar
      Update NEWS as well. · a058ff4a
      Tim Wickberg authored
      a058ff4a
    • Morris Jette's avatar
      Defer suspend until launch completes · 52fe3de1
      Morris Jette authored
      Insure that a job is completely launched before trying to suspend it.
      Previous logic would start suspend logic early in the life of the
      slurmstepd process, after it's listening socket was open but before
      the tasks were launched. This defers the suspend logic until after
      all prologs and setup completes and the tasks are launched. This is
      important in the case of gang scheduling, in which newly launched
      jobs can be immediately suspended.
      bug 2494
      52fe3de1
  13. 26 Feb, 2016 2 commits
  14. 25 Feb, 2016 1 commit
  15. 24 Feb, 2016 5 commits
  16. 23 Feb, 2016 1 commit
    • Danny Auble's avatar
      Fix issue with resizing jobs and limits not be kept track of correctly. · 92ac0dcd
      Danny Auble authored
      This whole process could probably be done better by keeping track of
      old values and new values and only calling one function instead of a
      pre and post function, but that can probably wait for future generations
      of the code as it works now and is probably adequate for the time being.
      
      Bug 2352
      92ac0dcd
  17. 19 Feb, 2016 2 commits
  18. 18 Feb, 2016 5 commits
  19. 17 Feb, 2016 2 commits