1. 29 Mar, 2016 5 commits
  2. 28 Mar, 2016 5 commits
    • Danny Auble's avatar
      8ee976b4
    • Danny Auble's avatar
      When a stepd is about to shutdown and send it's response to srun · ea470f71
      Danny Auble authored
      make the wait to return data only hit after 500 nodes and configurable
      based on the TcpTimeout value.
      ea470f71
    • Morris Jette's avatar
      Merge branch 'slurm-15.08' · 2d70778b
      Morris Jette authored
      2d70778b
    • Morris Jette's avatar
      task/cgroup - Fix task binding to CPUs bug · ddf6d9a4
      Morris Jette authored
      There was a subtle bug in how tasks were bound to CPUs which could result
      in an "infinite loop" error. The problem was various socket/core/threasd
      calculations were based upon the resources allocated to a step rather than
      all resources on the node and rounding errors could occur. Consider for
      example a node with 2 sockets, 6 cores per socket and 2 threads per core.
      On the idle node, a job requesting 14 CPUs is submitted. That job would
      be allocted 4 cores on the first socket and 3 cores on the second socket.
      The old logic would get the number of sockets for the job at 2 and the
      number of cores at 7, then calculate the number of cores per socket at
      7/2 or 3 (rounding down to an integer). The logic layouting out tasks
      would bind the first 3 cores on each socket to the job then not find any
      remaining cores, report the "infinite loop" error to the user, and run
      the job without one of the expected cores. The problem gets even worse
      when there are some allocated cores on a node. In a more extreme case,
      a job might be allocated 6 cores on one socket and 1 core on a second
      socket. In that case, 3 of that job's cores would be unused.
      bug 2502
      ddf6d9a4
    • Morris Jette's avatar
      Fix for srun signal handling threading problem · c8d36dba
      Morris Jette authored
      This is a revision to commit 1ed38f26
      The root problem is that a pthread is passed an argument which is
      a pointer to a variable on the stack. If that variable is over-written,
      the signal number recieved will be garbage, and that bad signal
      number will be interpretted by srun to possible abort the request.
      c8d36dba
  3. 26 Mar, 2016 5 commits
  4. 25 Mar, 2016 6 commits
  5. 24 Mar, 2016 4 commits
  6. 23 Mar, 2016 15 commits
    • Morris Jette's avatar
      Merge branch 'slurm-15.08' · 3028cfea
      Morris Jette authored
      Conflicts:
      	src/plugins/select/cons_res/job_test.c
      3028cfea
    • Morris Jette's avatar
      gang scheduling bug fix · 5f1e78f6
      Morris Jette authored
      Fix gang scheduling resource selection bug which could prevent multiple jobs
          from being allocated the same resources. Bug was introduced in 15.08.6,
          commit 44f491b8
      5f1e78f6
    • Tim Wickberg's avatar
      498624df
    • Tim Wickberg's avatar
      Cleanup Coverity errors from file_bcast work. · 54c9ac31
      Tim Wickberg authored
      Also ensure empty (0-length) files are handled properly.
      Remove a stray exit(1) call from _rpc_file_bcast() to avoid
      slurmd exiting on malformed data.
      54c9ac31
    • Danny Auble's avatar
    • Morris Jette's avatar
      task/cgroup: Fix for task binding anomaly · efa83a02
      Morris Jette authored
      Here's how to reproduce on smd-server with 2 sockets, 6 cores per
      socket and 2 threads per core, just run the following command line
      3 times in quick succession (all active at the same time):
      srun --cpus-per-task=4 -m block sleep 30
      What was happening is the first job would be allocated cores 0+1
      The second job would be allocated cores 2+3
      The thrid job would test use of cores 0-3 then exit because the
       job only needs 4 CPUs. The resulting core binding would include
       NO CPUs. The new logic tests that the core being considered for
       use actually has some resources available to the job before
       updating the counter which is being tested against the needed
       CPU counter.
      efa83a02
    • Morris Jette's avatar
      task/cgroup: Fix for task layout logic when disabled resources. · 6c14b969
      Morris Jette authored
      Specifically add the HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM flag when
      loading configuration from HWLOC library. Previous logic in
      task/cgroup did not do this, which was different behaviour from
      how slurmd gets configuration information. Here's the HWLOC
      documentation:
      HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM
      Detect the whole system, ignore reservations and offline settings.
      Gather all resources, even if some were disabled by the administrator.
      For instance, ignore Linux Cpusets and gather all processors and memory
      nodes, and ignore the fact that some resources may be offline.
      
      Without this flag, I was rarely observing a bad core count, which
      resulted in the logic layout out tasks wrong and generating an error:
      task/cgroup: task[0] infinite loop broken while trying to provision compute elements using cyclic
      
      bug 2502
      6c14b969
    • Danny Auble's avatar
      e7f12058
    • Danny Auble's avatar
      f1ef24e6
    • Tim Wickberg's avatar
      Revert "Fix expect tests that expect -lz for compilation." · 04c395f5
      Tim Wickberg authored
      With bcast split into its own directory -lz should not be
      required throughout.
      
      This reverts commit e7981406.
      04c395f5
    • Tim Wickberg's avatar
      Send file_size across as part of the RPC, will be needed for mmap. · ee826b96
      Tim Wickberg authored
      Remove unused struct and macro from file_bcast.h.
      
      Free file_bcast_info_t to prevent leak.
      ee826b96
    • Danny Auble's avatar
      7e2e8f88
    • Danny Auble's avatar
      2ab694cb
    • Tim Wickberg's avatar
      Restructure file_bcast mechanism to fork only on first block. · 7bac612c
      Tim Wickberg authored
      1) Add a new global file_bcast_list to store info on in-progress file
         transfers, cache FD there rather than reopening the file for every block.
      2) Restructure security mechanisms. First block will fork() and open the
         file, and pass the FD back to the thread. Thread then registers this file
         transfer in the file_bcast_list. Split fork() stuff into
         _file_bcast_register_file to keep _rpc_file_bcast readable.
      3) Successive blocks are handled within the thread. Security is handled by
         matching uid and file name to existing file transfer.
      
      TODO:
      
      1) Write transfer cleanup function to remove stalled transfers.
      2) Use mmap for file output.
      3) Allow for parallel block transfer. Current code assumes blocks will always
         arrive in order. Out of order blocks will result in corrupted output.
         (sbcast currently prevents this by requiring each message to be ack'd
         before continuing, but at a likely severe performance penalty.)
      4) Add stats on receive side.
      7bac612c
    • Tim Wickberg's avatar
      Revert "Handle sbcast output within the RPC thread instead of fork()'ing." · 90206f27
      Tim Wickberg authored
      This reverts commit 8c8c3407488fe3f0a552d2359ef5b487330ee8ba.
      
      Thread-only isn't portable, need to use fork() on first block to ensure
      file security and containers are handled correctly.
      90206f27