1. 03 Mar, 2016 2 commits
    • Morris Jette's avatar
      Increase step GRES variable size · 7f0bdc84
      Morris Jette authored
      Step GRES value changed from type "int" to "int64_t" to support larger
      values. Previous logic could fail in step allocation values over 32-bits.
      Other GRES values are 64-bit.
      7f0bdc84
    • Danny Auble's avatar
      Force close on exec on first 256 file descriptors when launching a · f502f1e5
      Danny Auble authored
      slurmstepd to close potential open ones.
      
      It was pointed out the slurmd using acct_gather_energy/ipmi links to
      freeipmi which could possibly open /dev/ipmi0 without the close on exec
      flag set as root while launching a step leaving it open in the users app.
      
      What this does is sets the flag on the first 256 to mitigate the concern.
      
      Reported by Maksym Planeta.
      
      Bug 2506
      f502f1e5
  2. 02 Mar, 2016 2 commits
  3. 01 Mar, 2016 2 commits
    • Tim Wickberg's avatar
      Update NEWS as well. · a058ff4a
      Tim Wickberg authored
      a058ff4a
    • Morris Jette's avatar
      Defer suspend until launch completes · 52fe3de1
      Morris Jette authored
      Insure that a job is completely launched before trying to suspend it.
      Previous logic would start suspend logic early in the life of the
      slurmstepd process, after it's listening socket was open but before
      the tasks were launched. This defers the suspend logic until after
      all prologs and setup completes and the tasks are launched. This is
      important in the case of gang scheduling, in which newly launched
      jobs can be immediately suspended.
      bug 2494
      52fe3de1
  4. 26 Feb, 2016 2 commits
  5. 25 Feb, 2016 1 commit
  6. 24 Feb, 2016 5 commits
  7. 23 Feb, 2016 1 commit
    • Danny Auble's avatar
      Fix issue with resizing jobs and limits not be kept track of correctly. · 92ac0dcd
      Danny Auble authored
      This whole process could probably be done better by keeping track of
      old values and new values and only calling one function instead of a
      pre and post function, but that can probably wait for future generations
      of the code as it works now and is probably adequate for the time being.
      
      Bug 2352
      92ac0dcd
  8. 19 Feb, 2016 2 commits
  9. 18 Feb, 2016 5 commits
  10. 17 Feb, 2016 2 commits
  11. 16 Feb, 2016 2 commits
  12. 12 Feb, 2016 1 commit
  13. 10 Feb, 2016 3 commits
  14. 09 Feb, 2016 2 commits
  15. 08 Feb, 2016 1 commit
  16. 04 Feb, 2016 1 commit
  17. 03 Feb, 2016 2 commits
  18. 02 Feb, 2016 3 commits
    • Tim Wickberg's avatar
      Fix build for sh5util on ppc64 by replacing printf formatters · b717bf5c
      Tim Wickberg authored
      Use PRIu64 instead of %ld for uint64_t types (libsh5util_old),
      %zu instead of PRIu64 for size_t.
      b717bf5c
    • Tim Wickberg's avatar
      update NEWS to mention FreeBSD fixes · e9982fa4
      Tim Wickberg authored
      e9982fa4
    • Didier GAZEN's avatar
      Fix support for AuthInfo in slurmdbd.conf · fa4222ec
      Didier GAZEN authored
      Support AuthInfo in slurmdbd.conf that is different from the value in
          slurm.conf.
      There is a possible bug in the slurm_get_auth_info function (src/common/slurm_protocol_api.c) that can cause the slurmdbd daemon to look for the AuthInfo parameter in slurm.conf instead of slurmdbd.conf when the auth/munge authentication method is used (AuthType=auth/munge).
      
      Here is the slurmdbd log revealing the problem (debug5() printing were added in the sources) :
      
      slurmdbd: slurmdbd version 15.08.7 started
      slurmdbd: debug2: running rollup at Tue Feb 02 14:20:14 2016
      slurmdbd: debug5: in ../../../src/slurmdbd/slurmdbd.c, _send_slurmctld_register_req (line 690)
      slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_send_node_msg (line 3601)
      slurmdbd: debug5: in ../../../../../src/plugins/auth/munge/auth_munge.c, slurm_auth_create (line 217)
      slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_get_auth_ttl (line 1732)
      slurmdbd: debug5: Entering ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info
      slurmdbd: debug:  Reading slurm.conf file: /usr/local/slurm-15-08-7-1/etc/slurm.conf
      slurmdbd: error: s_p_parse_file: unable to status file /usr/local/slurm-15-08-7-1/etc/slurm.conf: No such file or directory, retrying in 1sec up to 60sec
      ...
      
      Then 60 seconds later, the auth_info value returned by slurm_get_auth_info is NULL:
      
      slurmdbd: debug5: Leaving ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info, auth_info=(null)
      
      and slurmdbd continues without crashing, but I am not sure it is in a safe state.
      
      When applying this patch :
      
      diff --git a/src/common/slurm_protocol_api.c b/src/common/slurm_protocol_api.c
      index c5db879..be1dab6 100644
      --- a/src/common/slurm_protocol_api.c
      +++ b/src/common/slurm_protocol_api.c
      @@ -1703,9 +1703,13 @@ extern char *slurm_get_auth_info(void)
              char *auth_info;
              slurm_ctl_conf_t *conf;
      
      -       conf = slurm_conf_lock();
      -       auth_info = xstrdup(conf->authinfo);
      -       slurm_conf_unlock();
      +       if (slurmdbd_conf) {
      +                auth_info = xstrdup(slurmdbd_conf->auth_info);
      +        } else {
      +               conf = slurm_conf_lock();
      +               auth_info = xstrdup(conf->authinfo);
      +               slurm_conf_unlock();
      +       }
      
              return auth_info;
       }
      
      the auth_info value is now valid and consistent with the slurmdbd.conf setting:
      
      slurmdbd: slurmdbd version 15.08.7 started
      slurmdbd: debug2: running rollup at Tue Feb 02 14:47:37 2016
      slurmdbd: debug5: in ../../../src/slurmdbd/slurmdbd.c, _send_slurmctld_register_req (line 690)
      slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_send_node_msg (line 3600)
      slurmdbd: debug5: in ../../../../../src/plugins/auth/munge/auth_munge.c, slurm_auth_create (line 217)
      slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_get_auth_ttl (line 1731)
      slurmdbd: debug5: Entering ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info
      slurmdbd: debug5: Leaving ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info, auth_info=socket=/var/run/munge/munge_dbd.socket.2
      fa4222ec
  19. 01 Feb, 2016 1 commit