1. 02 Feb, 2016 3 commits
    • Tim Wickberg's avatar
      Fix build for sh5util on ppc64 by replacing printf formatters · b717bf5c
      Tim Wickberg authored
      Use PRIu64 instead of %ld for uint64_t types (libsh5util_old),
      %zu instead of PRIu64 for size_t.
      b717bf5c
    • Tim Wickberg's avatar
      update NEWS to mention FreeBSD fixes · e9982fa4
      Tim Wickberg authored
      e9982fa4
    • Didier GAZEN's avatar
      Fix support for AuthInfo in slurmdbd.conf · fa4222ec
      Didier GAZEN authored
      Support AuthInfo in slurmdbd.conf that is different from the value in
          slurm.conf.
      There is a possible bug in the slurm_get_auth_info function (src/common/slurm_protocol_api.c) that can cause the slurmdbd daemon to look for the AuthInfo parameter in slurm.conf instead of slurmdbd.conf when the auth/munge authentication method is used (AuthType=auth/munge).
      
      Here is the slurmdbd log revealing the problem (debug5() printing were added in the sources) :
      
      slurmdbd: slurmdbd version 15.08.7 started
      slurmdbd: debug2: running rollup at Tue Feb 02 14:20:14 2016
      slurmdbd: debug5: in ../../../src/slurmdbd/slurmdbd.c, _send_slurmctld_register_req (line 690)
      slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_send_node_msg (line 3601)
      slurmdbd: debug5: in ../../../../../src/plugins/auth/munge/auth_munge.c, slurm_auth_create (line 217)
      slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_get_auth_ttl (line 1732)
      slurmdbd: debug5: Entering ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info
      slurmdbd: debug:  Reading slurm.conf file: /usr/local/slurm-15-08-7-1/etc/slurm.conf
      slurmdbd: error: s_p_parse_file: unable to status file /usr/local/slurm-15-08-7-1/etc/slurm.conf: No such file or directory, retrying in 1sec up to 60sec
      ...
      
      Then 60 seconds later, the auth_info value returned by slurm_get_auth_info is NULL:
      
      slurmdbd: debug5: Leaving ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info, auth_info=(null)
      
      and slurmdbd continues without crashing, but I am not sure it is in a safe state.
      
      When applying this patch :
      
      diff --git a/src/common/slurm_protocol_api.c b/src/common/slurm_protocol_api.c
      index c5db879..be1dab6 100644
      --- a/src/common/slurm_protocol_api.c
      +++ b/src/common/slurm_protocol_api.c
      @@ -1703,9 +1703,13 @@ extern char *slurm_get_auth_info(void)
              char *auth_info;
              slurm_ctl_conf_t *conf;
      
      -       conf = slurm_conf_lock();
      -       auth_info = xstrdup(conf->authinfo);
      -       slurm_conf_unlock();
      +       if (slurmdbd_conf) {
      +                auth_info = xstrdup(slurmdbd_conf->auth_info);
      +        } else {
      +               conf = slurm_conf_lock();
      +               auth_info = xstrdup(conf->authinfo);
      +               slurm_conf_unlock();
      +       }
      
              return auth_info;
       }
      
      the auth_info value is now valid and consistent with the slurmdbd.conf setting:
      
      slurmdbd: slurmdbd version 15.08.7 started
      slurmdbd: debug2: running rollup at Tue Feb 02 14:47:37 2016
      slurmdbd: debug5: in ../../../src/slurmdbd/slurmdbd.c, _send_slurmctld_register_req (line 690)
      slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_send_node_msg (line 3600)
      slurmdbd: debug5: in ../../../../../src/plugins/auth/munge/auth_munge.c, slurm_auth_create (line 217)
      slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_get_auth_ttl (line 1731)
      slurmdbd: debug5: Entering ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info
      slurmdbd: debug5: Leaving ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info, auth_info=socket=/var/run/munge/munge_dbd.socket.2
      fa4222ec
  2. 01 Feb, 2016 1 commit
  3. 29 Jan, 2016 1 commit
  4. 28 Jan, 2016 5 commits
    • Morris Jette's avatar
      Don't relocated multi-node core reservations · a801d264
      Morris Jette authored
      Do not automatically relocate an advanced reservation for individual cores
          that spans multiple nodes when nodes in that reservation go down (e.g.
          a 1 core reservation on node "tux1" will be moved if node "tux1" goes
          down, but a reservation containing 2 cores on node "tux1" and 3 cores on
          "tux2" will not be moved node "tux1" goes down). Advanced reservations for
          whole nodes will be moved by default for down nodes.
      bug 2326
      a801d264
    • Tim Wickberg's avatar
      srun - check that found file is not a directory · 15c4bcf1
      Tim Wickberg authored
      avoid attempting to execve() a directory with a name that
      happens to matching that of the desired command. bug 2392.
      15c4bcf1
    • Morris Jette's avatar
      Ignore a reserverations jobs when changing · b77666b5
      Morris Jette authored
      Allow an existing reservation with running jobs to be modified without
          Flags=IGNORE_JOBS.
      bug 2389
      b77666b5
    • Morris Jette's avatar
      burst_buffer/cray - avoid overflow · 214b3abe
      Morris Jette authored
      burst_buffer/cray - Increase size of intermediate variable used to store
          buffer byte size read from DW instance from 32 to 64-bits to avoid overflow
          and reporting invalid buffer sizes.
      bug 2378
      214b3abe
    • Danny Auble's avatar
      GRES - Fix minor typecast issues. · 6f94bb7f
      Danny Auble authored
      6f94bb7f
  5. 27 Jan, 2016 5 commits
  6. 25 Jan, 2016 2 commits
  7. 22 Jan, 2016 1 commit
  8. 21 Jan, 2016 7 commits
  9. 20 Jan, 2016 3 commits
  10. 17 Jan, 2016 1 commit
  11. 15 Jan, 2016 3 commits
  12. 14 Jan, 2016 2 commits
  13. 13 Jan, 2016 2 commits
    • Morris Jette's avatar
      backfill scheduling with group limits fix · 3ee1632f
      Morris Jette authored
      Backfill scheduling fix: If a job can't be started due to a "group" resource
          limit, rather than reserve resources for it when the next job ends, don't
          reserve any resources for it. The problem with the original logic is that
          if a lot of resources are reserved for such pending jobs, then jobs futher
          down the queue may defered when they really can and should be started. An
          ideal solution would track all of the TRES resources through time as jobs
          start and end, but we don't have that logic in the backfill scheduler and
          don't want that extra overhead in the backfill scheduler.
      bugs 2326 and 2282
      3ee1632f
    • Alejandro Sanchez's avatar
      Add more partition info to "scontrol write config" · f428705b
      Alejandro Sanchez authored
      bug 2303
      f428705b
  14. 12 Jan, 2016 4 commits