1. 02 Mar, 2016 2 commits
  2. 01 Mar, 2016 2 commits
    • Tim Wickberg's avatar
      Update NEWS as well. · a058ff4a
      Tim Wickberg authored
      a058ff4a
    • Morris Jette's avatar
      Defer suspend until launch completes · 52fe3de1
      Morris Jette authored
      Insure that a job is completely launched before trying to suspend it.
      Previous logic would start suspend logic early in the life of the
      slurmstepd process, after it's listening socket was open but before
      the tasks were launched. This defers the suspend logic until after
      all prologs and setup completes and the tasks are launched. This is
      important in the case of gang scheduling, in which newly launched
      jobs can be immediately suspended.
      bug 2494
      52fe3de1
  3. 26 Feb, 2016 2 commits
  4. 25 Feb, 2016 1 commit
  5. 24 Feb, 2016 5 commits
  6. 23 Feb, 2016 1 commit
    • Danny Auble's avatar
      Fix issue with resizing jobs and limits not be kept track of correctly. · 92ac0dcd
      Danny Auble authored
      This whole process could probably be done better by keeping track of
      old values and new values and only calling one function instead of a
      pre and post function, but that can probably wait for future generations
      of the code as it works now and is probably adequate for the time being.
      
      Bug 2352
      92ac0dcd
  7. 19 Feb, 2016 2 commits
  8. 18 Feb, 2016 5 commits
  9. 17 Feb, 2016 2 commits
  10. 16 Feb, 2016 2 commits
  11. 12 Feb, 2016 1 commit
  12. 10 Feb, 2016 3 commits
  13. 09 Feb, 2016 2 commits
  14. 08 Feb, 2016 1 commit
  15. 04 Feb, 2016 1 commit
  16. 03 Feb, 2016 2 commits
  17. 02 Feb, 2016 3 commits
    • Tim Wickberg's avatar
      Fix build for sh5util on ppc64 by replacing printf formatters · b717bf5c
      Tim Wickberg authored
      Use PRIu64 instead of %ld for uint64_t types (libsh5util_old),
      %zu instead of PRIu64 for size_t.
      b717bf5c
    • Tim Wickberg's avatar
      update NEWS to mention FreeBSD fixes · e9982fa4
      Tim Wickberg authored
      e9982fa4
    • Didier GAZEN's avatar
      Fix support for AuthInfo in slurmdbd.conf · fa4222ec
      Didier GAZEN authored
      Support AuthInfo in slurmdbd.conf that is different from the value in
          slurm.conf.
      There is a possible bug in the slurm_get_auth_info function (src/common/slurm_protocol_api.c) that can cause the slurmdbd daemon to look for the AuthInfo parameter in slurm.conf instead of slurmdbd.conf when the auth/munge authentication method is used (AuthType=auth/munge).
      
      Here is the slurmdbd log revealing the problem (debug5() printing were added in the sources) :
      
      slurmdbd: slurmdbd version 15.08.7 started
      slurmdbd: debug2: running rollup at Tue Feb 02 14:20:14 2016
      slurmdbd: debug5: in ../../../src/slurmdbd/slurmdbd.c, _send_slurmctld_register_req (line 690)
      slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_send_node_msg (line 3601)
      slurmdbd: debug5: in ../../../../../src/plugins/auth/munge/auth_munge.c, slurm_auth_create (line 217)
      slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_get_auth_ttl (line 1732)
      slurmdbd: debug5: Entering ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info
      slurmdbd: debug:  Reading slurm.conf file: /usr/local/slurm-15-08-7-1/etc/slurm.conf
      slurmdbd: error: s_p_parse_file: unable to status file /usr/local/slurm-15-08-7-1/etc/slurm.conf: No such file or directory, retrying in 1sec up to 60sec
      ...
      
      Then 60 seconds later, the auth_info value returned by slurm_get_auth_info is NULL:
      
      slurmdbd: debug5: Leaving ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info, auth_info=(null)
      
      and slurmdbd continues without crashing, but I am not sure it is in a safe state.
      
      When applying this patch :
      
      diff --git a/src/common/slurm_protocol_api.c b/src/common/slurm_protocol_api.c
      index c5db879..be1dab6 100644
      --- a/src/common/slurm_protocol_api.c
      +++ b/src/common/slurm_protocol_api.c
      @@ -1703,9 +1703,13 @@ extern char *slurm_get_auth_info(void)
              char *auth_info;
              slurm_ctl_conf_t *conf;
      
      -       conf = slurm_conf_lock();
      -       auth_info = xstrdup(conf->authinfo);
      -       slurm_conf_unlock();
      +       if (slurmdbd_conf) {
      +                auth_info = xstrdup(slurmdbd_conf->auth_info);
      +        } else {
      +               conf = slurm_conf_lock();
      +               auth_info = xstrdup(conf->authinfo);
      +               slurm_conf_unlock();
      +       }
      
              return auth_info;
       }
      
      the auth_info value is now valid and consistent with the slurmdbd.conf setting:
      
      slurmdbd: slurmdbd version 15.08.7 started
      slurmdbd: debug2: running rollup at Tue Feb 02 14:47:37 2016
      slurmdbd: debug5: in ../../../src/slurmdbd/slurmdbd.c, _send_slurmctld_register_req (line 690)
      slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_send_node_msg (line 3600)
      slurmdbd: debug5: in ../../../../../src/plugins/auth/munge/auth_munge.c, slurm_auth_create (line 217)
      slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_get_auth_ttl (line 1731)
      slurmdbd: debug5: Entering ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info
      slurmdbd: debug5: Leaving ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info, auth_info=socket=/var/run/munge/munge_dbd.socket.2
      fa4222ec
  18. 01 Feb, 2016 1 commit
  19. 29 Jan, 2016 1 commit
  20. 28 Jan, 2016 1 commit
    • Morris Jette's avatar
      Don't relocated multi-node core reservations · a801d264
      Morris Jette authored
      Do not automatically relocate an advanced reservation for individual cores
          that spans multiple nodes when nodes in that reservation go down (e.g.
          a 1 core reservation on node "tux1" will be moved if node "tux1" goes
          down, but a reservation containing 2 cores on node "tux1" and 3 cores on
          "tux2" will not be moved node "tux1" goes down). Advanced reservations for
          whole nodes will be moved by default for down nodes.
      bug 2326
      a801d264