1. 10 Feb, 2016 1 commit
    • Brian Gilmer's avatar
      Revert "Disable Cray NHC by default" · 06e49a30
      Brian Gilmer authored
      This reverts commit 6c21c441.
      
      # Conflicts:
      #	NEWS
      
      This is being reverted because we have been informed turning off NHC by
      default is not what Cray wants.  We hope for a day when this will be
      the case and Slurm can run NHC on the compute node but until that day
      this is the way we will handle it.
      06e49a30
  2. 04 Feb, 2016 1 commit
  3. 03 Feb, 2016 2 commits
  4. 02 Feb, 2016 3 commits
    • Tim Wickberg's avatar
      Fix build for sh5util on ppc64 by replacing printf formatters · b717bf5c
      Tim Wickberg authored
      Use PRIu64 instead of %ld for uint64_t types (libsh5util_old),
      %zu instead of PRIu64 for size_t.
      b717bf5c
    • Tim Wickberg's avatar
      update NEWS to mention FreeBSD fixes · e9982fa4
      Tim Wickberg authored
      e9982fa4
    • Didier GAZEN's avatar
      Fix support for AuthInfo in slurmdbd.conf · fa4222ec
      Didier GAZEN authored
      Support AuthInfo in slurmdbd.conf that is different from the value in
          slurm.conf.
      There is a possible bug in the slurm_get_auth_info function (src/common/slurm_protocol_api.c) that can cause the slurmdbd daemon to look for the AuthInfo parameter in slurm.conf instead of slurmdbd.conf when the auth/munge authentication method is used (AuthType=auth/munge).
      
      Here is the slurmdbd log revealing the problem (debug5() printing were added in the sources) :
      
      slurmdbd: slurmdbd version 15.08.7 started
      slurmdbd: debug2: running rollup at Tue Feb 02 14:20:14 2016
      slurmdbd: debug5: in ../../../src/slurmdbd/slurmdbd.c, _send_slurmctld_register_req (line 690)
      slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_send_node_msg (line 3601)
      slurmdbd: debug5: in ../../../../../src/plugins/auth/munge/auth_munge.c, slurm_auth_create (line 217)
      slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_get_auth_ttl (line 1732)
      slurmdbd: debug5: Entering ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info
      slurmdbd: debug:  Reading slurm.conf file: /usr/local/slurm-15-08-7-1/etc/slurm.conf
      slurmdbd: error: s_p_parse_file: unable to status file /usr/local/slurm-15-08-7-1/etc/slurm.conf: No such file or directory, retrying in 1sec up to 60sec
      ...
      
      Then 60 seconds later, the auth_info value returned by slurm_get_auth_info is NULL:
      
      slurmdbd: debug5: Leaving ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info, auth_info=(null)
      
      and slurmdbd continues without crashing, but I am not sure it is in a safe state.
      
      When applying this patch :
      
      diff --git a/src/common/slurm_protocol_api.c b/src/common/slurm_protocol_api.c
      index c5db879..be1dab6 100644
      --- a/src/common/slurm_protocol_api.c
      +++ b/src/common/slurm_protocol_api.c
      @@ -1703,9 +1703,13 @@ extern char *slurm_get_auth_info(void)
              char *auth_info;
              slurm_ctl_conf_t *conf;
      
      -       conf = slurm_conf_lock();
      -       auth_info = xstrdup(conf->authinfo);
      -       slurm_conf_unlock();
      +       if (slurmdbd_conf) {
      +                auth_info = xstrdup(slurmdbd_conf->auth_info);
      +        } else {
      +               conf = slurm_conf_lock();
      +               auth_info = xstrdup(conf->authinfo);
      +               slurm_conf_unlock();
      +       }
      
              return auth_info;
       }
      
      the auth_info value is now valid and consistent with the slurmdbd.conf setting:
      
      slurmdbd: slurmdbd version 15.08.7 started
      slurmdbd: debug2: running rollup at Tue Feb 02 14:47:37 2016
      slurmdbd: debug5: in ../../../src/slurmdbd/slurmdbd.c, _send_slurmctld_register_req (line 690)
      slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_send_node_msg (line 3600)
      slurmdbd: debug5: in ../../../../../src/plugins/auth/munge/auth_munge.c, slurm_auth_create (line 217)
      slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_get_auth_ttl (line 1731)
      slurmdbd: debug5: Entering ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info
      slurmdbd: debug5: Leaving ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info, auth_info=socket=/var/run/munge/munge_dbd.socket.2
      fa4222ec
  5. 01 Feb, 2016 2 commits
  6. 29 Jan, 2016 2 commits
  7. 28 Jan, 2016 5 commits
    • Morris Jette's avatar
      Don't relocated multi-node core reservations · a801d264
      Morris Jette authored
      Do not automatically relocate an advanced reservation for individual cores
          that spans multiple nodes when nodes in that reservation go down (e.g.
          a 1 core reservation on node "tux1" will be moved if node "tux1" goes
          down, but a reservation containing 2 cores on node "tux1" and 3 cores on
          "tux2" will not be moved node "tux1" goes down). Advanced reservations for
          whole nodes will be moved by default for down nodes.
      bug 2326
      a801d264
    • Tim Wickberg's avatar
      srun - check that found file is not a directory · 15c4bcf1
      Tim Wickberg authored
      avoid attempting to execve() a directory with a name that
      happens to matching that of the desired command. bug 2392.
      15c4bcf1
    • Morris Jette's avatar
      Ignore a reserverations jobs when changing · b77666b5
      Morris Jette authored
      Allow an existing reservation with running jobs to be modified without
          Flags=IGNORE_JOBS.
      bug 2389
      b77666b5
    • Morris Jette's avatar
      burst_buffer/cray - avoid overflow · 214b3abe
      Morris Jette authored
      burst_buffer/cray - Increase size of intermediate variable used to store
          buffer byte size read from DW instance from 32 to 64-bits to avoid overflow
          and reporting invalid buffer sizes.
      bug 2378
      214b3abe
    • Danny Auble's avatar
      GRES - Fix minor typecast issues. · 6f94bb7f
      Danny Auble authored
      6f94bb7f
  8. 27 Jan, 2016 5 commits
  9. 26 Jan, 2016 2 commits
    • Morris Jette's avatar
      Add slurmd option to report node reboot · b31d4c33
      Morris Jette authored
      Add slurmd "-b" option to report node rebooted at daemon start time. Used
          for testing purposes.
      b31d4c33
    • Tim Wickberg's avatar
      cleanup output routines in job_info and node_info.c · 0f826c0b
      Tim Wickberg authored
      reduce reliance on fixed-sized buffers for output, helps reduce
      warnings from coverity et al.
      
      split up key/value pairs in preparation for JSON output work.
      xstrfmtcat exists and is cleaner than snprintf followed by xstrcat.
      
      use a consistent line ending rather than repeat conditional block.
      
      output format should be unchanged, and has been tested to match
      on common cases and passes all relevant regression tests.
      0f826c0b
  10. 25 Jan, 2016 3 commits
  11. 22 Jan, 2016 3 commits
  12. 21 Jan, 2016 7 commits
  13. 20 Jan, 2016 3 commits
  14. 18 Jan, 2016 1 commit