1. 09 May, 2018 12 commits
  2. 08 May, 2018 3 commits
    • Brian Christiansen's avatar
      Prevent slurmd from launching steps if prolog fail · 3b029021
      Brian Christiansen authored
      Bug 5146
      3b029021
    • Tim Wickberg's avatar
      Fix issue with invalid protocol_version when using srun on ppc64. · 77d65f4f
      Tim Wickberg authored
      Caused by a corrupted protocol_version field value being received
      by the slurmstepd, as we cannot safely write/read a uint16_t across
      the pipe as if it was an int.
      
      Regression caused by commit 90b116c2.
      
      Bug 5133.
      77d65f4f
    • Brian Christiansen's avatar
      Fix checkpointing requeued jobs in a bad state · f9f395af
      Brian Christiansen authored
      Requeued jobs are marked as PENDING|COMPLETING until the epilog checks
      in. The issue is that if job_set_alloc_tres gets called while in the
      PENDING|COMPLETING state, the job's alloc_tres_str will be free'd. If
      this job then gets checkpointed in this state (PENDING|COMPLETING + no
      tres_alloc_str) on startup the controller would crash because it
      expected the job to have a tres_alloc_str/cnt when in the COMPLETING
      state. This could be triggered if starting the controller without the
      dbd up. When the dbd comes up, the assoc_cache_mgr calls
      _update_job_tres() which calls job_set_alloc_tres. It could also be
      triggered by adding new tres.
      
      This most likely started happening in 17.11.5 because of commit
      865b672f which introduced calling _update_job_tres() on each job
      after the dbd comes up.
      
      Bugs 5137,4522
      f9f395af
  3. 04 May, 2018 2 commits
  4. 03 May, 2018 6 commits
  5. 02 May, 2018 6 commits
  6. 01 May, 2018 2 commits
  7. 30 Apr, 2018 7 commits
  8. 28 Apr, 2018 2 commits
    • Brian Christiansen's avatar
      5ab8ab6f
    • Brian Christiansen's avatar
      Set node->last_idle to 0 when in power_save state · 242c7406
      Brian Christiansen authored
      In conjuction with previous commit (reconginizing nodes being powered up
      out of band) set node's last_idle to 0 when the node is in a power_save
      state. Additional meaning that the node isn't booted.
      
      Partially reverts da722a89. Checking for (last_idle > 0) when in
      power_save state isn't necessary because if the node is already in
      power_save state the node won't be resumed unless
      (node_ptr->last_idle > (now - SuspendTime)). And with the previous
      change, the node's last_idle time will be set when the node registers.
      242c7406