1. 04 Aug, 2017 6 commits
  2. 02 Aug, 2017 4 commits
  3. 01 Aug, 2017 4 commits
  4. 31 Jul, 2017 1 commit
  5. 28 Jul, 2017 5 commits
  6. 27 Jul, 2017 2 commits
    • Alejandro Sanchez's avatar
      Fix bug when tracking multiple simultaneous spawned ping cycles · f7463ef5
      Alejandro Sanchez authored
      When more than 1 ping cycle is spawned simultaneously (for instance
      REQUEST_PING + REQUEST_NODE_REGISTRATION_STATUS for the selected nodes),
      we do not track a separate ping_start time for each cycle. When ping_begin()
      is called, the information about the previous ping cycle is lost. Then when
      ping_end() is called for the first of the two cycles, we set ping_start=0,
      which is incorrectly used to see if the last cycle ran for more than
      PING_TIMEOUT seconds (100s), thus incorrectly triggering the:
      
       error("Node ping apparently hung, many nodes may be DOWN or configured "
             "SlurmdTimeout should be increased");
      
      Bug 3914
      f7463ef5
    • Tim Shaw's avatar
      04b431b4
  7. 26 Jul, 2017 5 commits
  8. 25 Jul, 2017 1 commit
  9. 24 Jul, 2017 3 commits
  10. 21 Jul, 2017 3 commits
  11. 19 Jul, 2017 4 commits
  12. 18 Jul, 2017 1 commit
    • Dominik Bartkiewicz's avatar
      Fix issue with multiple jobs from an array to start. · b40bd8d3
      Dominik Bartkiewicz authored
      By removing the real locks we can get into a race condition where the prolog
      starts and finishes before we get here and then we end up waiting forever.
      
      Making the mutex a static seemed to help in many cases, but didn't
      completely close the window.  Changing slurm_cond_wait to
      slurm_cond_timedwait fixed the scenario where we would hit the window, but
      not degrade performance the original commit provides.
      
      There were also spots where if the job or step didn't exist it wouldn't
      signal the conditional also providing a spot this could get stuck not
      starting the job.
      
      Fix regression from commit 52ce3ff0
      
      Bug 3977
      b40bd8d3
  13. 14 Jul, 2017 1 commit