1. 23 Jan, 2017 8 commits
    • Morris Jette's avatar
      Fix for backfill launch job with reboot · d72b13f2
      Morris Jette authored
      This bug was likely the root cause of bug 3366. If the backfill scheduler
        allocates resources for a batch job and a node reboot is required, the
        batch launch RPC would be sent to the agent. At that point, there is a
        race condition between the agent and the job_time_limit() function
        testing for boot completion. If the job_time_limit() function ran
        first, it would trigger a second launch RPC request getting sent to
        the agent.
      bug 3366
      d72b13f2
    • Morris Jette's avatar
      Cleaner job configuring logic · f9804256
      Morris Jette authored
      Clean up logic to test if job is configuring
      bug 3366
      f9804256
    • Morris Jette's avatar
      Avoid launching batch step while configuring · e3a7bdcc
      Morris Jette authored
      Do not launch a batch step while the job is configuring. Previous
        logic checked for the PrologSlurmctld running, but not nodes
        booting. Checking the job's CONFIGURING state flag will validate
        both.
      bug 3366
      e3a7bdcc
    • Morris Jette's avatar
      Avoid duplicate configuration complete logic · db6acb8f
      Morris Jette authored
      Add check to avoid step allocation logic from executing job
        configuration completion logic multiple times (check if job
        is configurating before clearing flag and resetting time limit).
      bug 3366
      db6acb8f
    • Morris Jette's avatar
      fix slurmctld/agent race condition · 53784477
      Morris Jette authored
      slurmctld/agent race condition fix: Prevent job launch while PrologSlurmctld
          daemon is running or node boot in progress.
      bug 3366
      53784477
    • Morris Jette's avatar
      job write lock added to agent_retry() · 379007b8
      Morris Jette authored
      This is required to manage the configuration completion.
      bug 3366
      379007b8
    • Morris Jette's avatar
      Move agent_retry to separate pthread · ce9a2d79
      Morris Jette authored
      This will be required to lock the job structure
      bug 3366
      ce9a2d79
    • Morris Jette's avatar
      Remove return value from agent_retry() · bb94c6ce
      Morris Jette authored
      Remove the return value from the agent_retry() function. It is not
        used anywhere and needs to be removed to run as a pthread.
      bug 3366
      bb94c6ce
  2. 21 Jan, 2017 2 commits
  3. 20 Jan, 2017 1 commit
    • Brian Christiansen's avatar
      Fix mutlicluster options to work with newer ctlds · 8b430b6a
      Brian Christiansen authored
      If a lower version client would try to communicate with a higher version
      controller the dbd would return the controller's version and the client
      would use that version to talk to the controller. When the controller
      would respond, the client wouldn't know how to unpack the higher version
      msg.
      8b430b6a
  4. 19 Jan, 2017 4 commits
  5. 18 Jan, 2017 4 commits
  6. 17 Jan, 2017 5 commits
  7. 15 Jan, 2017 1 commit
  8. 12 Jan, 2017 4 commits
  9. 11 Jan, 2017 2 commits
  10. 10 Jan, 2017 1 commit
  11. 09 Jan, 2017 7 commits
  12. 05 Jan, 2017 1 commit