1. 15 Sep, 2017 2 commits
  2. 14 Sep, 2017 2 commits
    • Tim Wickberg's avatar
      Prevent a second PMI2_Init call from leaving a hung slurmstepd process. · b2aa25d5
      Tim Wickberg authored
      A second PMI2_Init() within the same step is invalid, and cannot succeed.
      
      Return an error code back to the client end, and close the fd to force the
      step to terminate immediately.
      
      Due to a bug in our libpmi code, just returning a cmd=response_to_init with
      an appropriate rc number will not tear down the connection properly, so
      send back something else that will trigger the error path.
      
      Bug 3520.
      b2aa25d5
    • Morris Jette's avatar
      Pack step cancel work · 20e660e1
      Morris Jette authored
      A request to cancel a pack step leader will result in that step
        being cancelled on all pack job components. Needed by MPI.
      20e660e1
  3. 13 Sep, 2017 3 commits
  4. 12 Sep, 2017 7 commits
    • Danny Auble's avatar
      Fix default location for cgroup_allowed_devices_file.conf to use correct · 1e78c111
      Danny Auble authored
      default path.
      
      This makes it so you don't always have to put AllowedDevicesFile in your
      cgroup.conf file if your etc dir is anything other than /etc/slurm.
      1e78c111
    • Morris Jette's avatar
      Pack job debugger synchronization · a8d2a04f
      Morris Jette authored
      Don't flag a job as "SPAWNED" for debugger until all process
        information is available for all pack job components.
      a8d2a04f
    • Tim Wickberg's avatar
      Fix autoconf test for libcurl when clang is the compiler. · d670de2d
      Tim Wickberg authored
      Adding a newline prevents this error:
      conftest.c:154:8: error: if statement has empty body [-Werror,-Wempty-body]
      d670de2d
    • Alejandro Sanchez's avatar
      If creating/altering a core based reservation with scontrol/sview on a · 3b3e67e1
      Alejandro Sanchez authored
      remote cluster correctly determine the select type.
      
      Bug 2329
      3b3e67e1
    • Morris Jette's avatar
      Modify srun --mpi=list output · 83058057
      Morris Jette authored
      Modify "srun --mpi=list" output to match valid option input by removing the
          "mpi/" prefix on each line of output.
      83058057
    • Morris Jette's avatar
      Disable heterogeneous steps by default · fe2daac7
      Morris Jette authored
      Enable them onlyh with SchedulerParameters=enable_hetero_jobs OR
        MPI type is "none"
      fe2daac7
    • Brian Christiansen's avatar
      Speedup arbitrary distribution algorithm · 011db71c
      Brian Christiansen authored
      Do pointer comparisons rather than strcmps.
      ~80x speedup
      Bug 3529
      
      e.g.
      1000 nodes
      8000 tasks
      [Sep 11 14:24:15.873639 20992 srvcn        0x7f8c1cdda700] _task_layout_hostfile: hostfile processing took usec=2152678 (orig)
      [Sep 11 14:27:46.173424 20992 srvcn        0x7f8c1c6d3700] _task_layout_hostfile: hostfile processing took usec=2142997 (orig)
      [Sep 11 14:32:32.245420  4037 srvcn        0x7f12de4e4700] _task_layout_hostfile: hostfile processing took usec=26198 (node ptrs)
      [Sep 11 14:36:12.88769   4037 srvcn        0x7f12de6e6700] _task_layout_hostfile: hostfile processing took usec=25515 (node ptrs)
      [Sep 11 14:41:38.339162  4037 srvcn        0x7f132c8d5700] _task_layout_hostfile: hostfile processing took usec=27459 (node ptrs)
      [Sep 11 15:16:59.575189  1874 srvcn        0x7f3dae3f0700] _task_layout_hostfile: hostfile processing took usec=30129 (node ptrs)
      [Sep 11 15:20:50.365004  1874 srvcn        0x7f3dc8b34700] _task_layout_hostfile: hostfile processing took usec=29884 (node ptrs)
      011db71c
  5. 11 Sep, 2017 2 commits
  6. 08 Sep, 2017 6 commits
  7. 07 Sep, 2017 2 commits
  8. 05 Sep, 2017 1 commit
  9. 04 Sep, 2017 1 commit
    • Alejandro Sanchez's avatar
      Fix to test job mem against MaxMemPer[CPU|Node] limits at scheduling time. · 24365514
      Alejandro Sanchez authored
      Initially job mem limits were tested at submission time through
      _validate_min_mem_partition() -> _valid_pn_min_mem(), but not tested
      again at scheduling time, thus leading to jobs incorrectly being scheduled
      against partitions where the job exceeded their MaxMemPer* limit
      (which can in turn be inherited from the system wide limit too).
      
      NOTE: New WAIT_PN_MEM_LIMIT job_state_reason enum component added to support
      this new waiting reason.
      
      Bug 2291.
      24365514
  10. 02 Sep, 2017 1 commit
  11. 01 Sep, 2017 4 commits
  12. 31 Aug, 2017 1 commit
  13. 30 Aug, 2017 3 commits
  14. 29 Aug, 2017 5 commits