1. 05 Mar, 2016 2 commits
  2. 04 Mar, 2016 3 commits
  3. 03 Mar, 2016 5 commits
    • Thomas Hamel's avatar
      Defer slurmd registration until NodeHealthCheck · 7fb0c981
      Thomas Hamel authored
      We want to introduce a new behavior in the way slurmd uses the
      HealthCheckProgram. The idea is to avoid a race condition between the
      first HealthCheckProgram run and the node accepting jobs. The slurmd
      daemon will initialize and then loop on HealthCheckProgram execution
      before registering with slurmctld. It will stay in this loop until
      the HealthCheckProgram returns successfully (the node is still DOWN).
      
      On our clusters we are using NHC as an HealthCheckProgram. NHC drains
      the node if it fails and remove the drain if it is successfull, this
      behavior fits well with our purpose. This behavior permits us to start
      slurmd at boot without setting up a complex boot sequence in the init
      system, slurmd just wait for the node to be ready before registering.
      
      The HealthCheckProgram is not run during slurmd startup if
      HealthCheckInteval is 0.
      7fb0c981
    • Danny Auble's avatar
      72f13426
    • Brian Christiansen's avatar
      5c43d754
    • Morris Jette's avatar
      Increase step GRES variable size · 7f0bdc84
      Morris Jette authored
      Step GRES value changed from type "int" to "int64_t" to support larger
      values. Previous logic could fail in step allocation values over 32-bits.
      Other GRES values are 64-bit.
      7f0bdc84
    • Danny Auble's avatar
      Force close on exec on first 256 file descriptors when launching a · f502f1e5
      Danny Auble authored
      slurmstepd to close potential open ones.
      
      It was pointed out the slurmd using acct_gather_energy/ipmi links to
      freeipmi which could possibly open /dev/ipmi0 without the close on exec
      flag set as root while launching a step leaving it open in the users app.
      
      What this does is sets the flag on the first 256 to mitigate the concern.
      
      Reported by Maksym Planeta.
      
      Bug 2506
      f502f1e5
  4. 02 Mar, 2016 4 commits
  5. 01 Mar, 2016 4 commits
    • Tim Wickberg's avatar
      Remove BEGIN_C_DECLS and END_C_DECLS macros. · 1434364d
      Tim Wickberg authored
      src/common/mapping.h was the one place outside of slurm/*h that used this,
      just remove it from there.
      
      Replace macro with #ifdef __cplusplus in slurm/*h in case anyone is linking
      C++ against libslurm.
      1434364d
    • Tim Wickberg's avatar
      Remove PARAMS macro from function definitions. · 6ad00816
      Tim Wickberg authored
      Macro hasn't been used consistently for three+ years, and is protecting against
      compilation by non-ANSI C compilers which has not been a concern for quite some
      time. Cleanup formatting of function declarations while here.
      
      No change to logic.
      6ad00816
    • Tim Wickberg's avatar
      Update NEWS as well. · a058ff4a
      Tim Wickberg authored
      a058ff4a
    • Morris Jette's avatar
      Defer suspend until launch completes · 52fe3de1
      Morris Jette authored
      Insure that a job is completely launched before trying to suspend it.
      Previous logic would start suspend logic early in the life of the
      slurmstepd process, after it's listening socket was open but before
      the tasks were launched. This defers the suspend logic until after
      all prologs and setup completes and the tasks are launched. This is
      important in the case of gang scheduling, in which newly launched
      jobs can be immediately suspended.
      bug 2494
      52fe3de1
  6. 29 Feb, 2016 1 commit
  7. 27 Feb, 2016 1 commit
  8. 26 Feb, 2016 5 commits
  9. 25 Feb, 2016 2 commits
  10. 24 Feb, 2016 5 commits
  11. 23 Feb, 2016 1 commit
    • Danny Auble's avatar
      Fix issue with resizing jobs and limits not be kept track of correctly. · 92ac0dcd
      Danny Auble authored
      This whole process could probably be done better by keeping track of
      old values and new values and only calling one function instead of a
      pre and post function, but that can probably wait for future generations
      of the code as it works now and is probably adequate for the time being.
      
      Bug 2352
      92ac0dcd
  12. 22 Feb, 2016 1 commit
  13. 19 Feb, 2016 2 commits
  14. 18 Feb, 2016 4 commits