1. 10 May, 2016 1 commit
  2. 09 May, 2016 2 commits
  3. 06 May, 2016 1 commit
    • John Thiltges's avatar
      Fix for slurmstepd setfault · db0fe22e
      John Thiltges authored
      With slurm-15.08.10, we're seeing occasional segfaults in slurmstepd. The logs point to the following line: slurm-15.08.10/src/slurmd/slurmstepd/mgr.c:2612
      
      On that line, _get_primary_group() is accessing the results of getpwnam_r():
          *gid = pwd0->pw_gid;
      
      If getpwnam_r() cannot find a matching password record, it will set the result (pwd0) to NULL, but still return 0. When the pointer is accessed, it will cause a segfault.
      
      Checking the result variable (pwd0) to determine success should fix the issue.
      db0fe22e
  4. 05 May, 2016 2 commits
  5. 03 May, 2016 4 commits
  6. 29 Apr, 2016 4 commits
  7. 28 Apr, 2016 3 commits
  8. 27 Apr, 2016 2 commits
  9. 26 Apr, 2016 2 commits
  10. 23 Apr, 2016 1 commit
  11. 20 Apr, 2016 1 commit
    • Morris Jette's avatar
      burst_buffer/cray - fix create/desroy buffer only · 1391d29a
      Morris Jette authored
      burst_buffer/cray - Don't call Datawarp "paths" function if script includes
          only create or destroy of persistent burst buffer. Some versions of Datawarp
          software return an error for such scripts, causing the job to be held.
      bug 2624
      1391d29a
  12. 13 Apr, 2016 2 commits
  13. 12 Apr, 2016 2 commits
  14. 11 Apr, 2016 4 commits
  15. 09 Apr, 2016 1 commit
    • Morris Jette's avatar
      backfill scheduling enhancement · e62a9270
      Morris Jette authored
      When determining when a pending job will be able to start, rather
        than testing after removing each running job and trying to schedule
        the pending jobs, remove multiple jobs that all end about the
        same time before testing. This reduces the number of calls to
        the job placement logic, which is time consuming.
      e62a9270
  16. 07 Apr, 2016 2 commits
  17. 06 Apr, 2016 5 commits
  18. 05 Apr, 2016 1 commit
    • Morris Jette's avatar
      Fix backfill scheduler race condition · d8b18ff8
      Morris Jette authored
      Fix backfill scheduler race condition that could cause invalid pointer in
          select/cons_res plugin. Bug introduced in 15.08.9, commit:
          efd9d35e
      
      The scenario is as follows
      1. Backfill scheduler is running, then releases locks
      2. Main scheduling loop starts a job "A"
      3. Backfill scheduler resumes, finds job "A" in its queue and
         resets it's partition pointer.
      4. Job "A" completes and tries to remove resource allocation record
         from select/cons_res data structure, but fails to find it because
         it is looking in the table for the wrong partition.
      5. Job "A" record gets purged from slurmctld
      6. Select/cons_res plugin attempts to operate on resource allocation
         data structure, finds pointer into the now purged data structure
         of job "A" and aborts or gets SEGV
      Bug 2603
      d8b18ff8