1. 18 Jan, 2017 2 commits
    • Aaron Knister's avatar
      ensure stepd -> srun client socket fully shutdown · c08f8922
      Aaron Knister authored
      ensure eio objects get explicitly shutdown when
      eio_handle_mainloop exits. currently depending on
      whether the order the eio_handle_mainloop and
      eio_signal_shutdown get called relative to each other
      
      when stepd is instructed to shut down the socket use
      SHUT_RDWR instead of SHUT_RD. just using SHUT_RD can
      cause srun to receive ECONNRESET if there's outstanding
      data that's been sent to stepd that the task has not
      read.
      
      bug 3166
      c08f8922
    • Morris Jette's avatar
      Prevent job timeout on node power up · 4114e6ce
      Morris Jette authored
      bug 3099
      4114e6ce
  2. 17 Jan, 2017 5 commits
  3. 15 Jan, 2017 1 commit
  4. 14 Jan, 2017 1 commit
    • Morris Jette's avatar
      Add infrastucture to report estimated boot time · e6f5cfc8
      Morris Jette authored
      Add BootTime configuration parameter to knl.conf file to optimize resource
          allocations with respect to required node reboots.
      Add node_features_p_boot_time() to node_features plugin to optimize
          scheduling with respect to node reboots.
      bug 3360
      e6f5cfc8
  5. 13 Jan, 2017 3 commits
  6. 12 Jan, 2017 3 commits
  7. 11 Jan, 2017 4 commits
    • Danny Auble's avatar
      CRAY - Fix deadlock issue when updating accounting in the slurmctld and · 69567910
      Danny Auble authored
      scheduling a Datawarp job.
      
      The assoc_mgr lock needs to happen before the bb_state.bb_mutex.  One place
      this could cause deadlock is from src/slurmctld/controller.c
      _accounting_cluster_ready() which calls clusteracct_storage_g_cluster_tres
      which inturn calls bb_g_job_set_tres_cnt which calls bb_p_job_set_tres_cnt
      which will lock the bb_muxtex after the assoc_mgr is already locked.
      
      Bug 3389
      69567910
    • Danny Auble's avatar
      Make it so bitstrings are handled as 64bits instead of 32. · 80c7da32
      Danny Auble authored
      Bug 3331
      80c7da32
    • Dominik Bartkiewicz's avatar
      Improve performance of cr_sort_part_rows. · dc6a5220
      Dominik Bartkiewicz authored
      Cache results of bit_set_count() calls.
      
      Bug 3393.
      dc6a5220
    • Morris Jette's avatar
      Fix srun/sattach race condtion · 38089f2b
      Morris Jette authored
      The old logic would result in test16.4 failing some of the time.
        The failure was caused by the sattach command attaching to a
        job step before the original srun command received a
        RESPONSE_LAUNCH_TASKS message. That messsage  would then be sent
        to the salloc command. Since srun never got the message, it
        would hang. This change does not mark the job step as RUNNING
        until after the original srun gets sent the RESPONSE_LAUNCH_TASKS
        message and sattach requests are blocked until that time.
      38089f2b
  8. 09 Jan, 2017 6 commits
  9. 06 Jan, 2017 1 commit
  10. 05 Jan, 2017 2 commits
  11. 04 Jan, 2017 5 commits
  12. 03 Jan, 2017 3 commits
  13. 29 Dec, 2016 3 commits
  14. 28 Dec, 2016 1 commit