1. 02 Feb, 2016 2 commits
    • Morris Jette's avatar
      reservation relocation fix · 31ac01ce
      Morris Jette authored
      This fixes a bug introduced in commit a801d264
      Whole node resource allocations with REPLACE option were not working.
      Detected by test3.14 failure.
      31ac01ce
    • Didier GAZEN's avatar
      Fix support for AuthInfo in slurmdbd.conf · fa4222ec
      Didier GAZEN authored
      Support AuthInfo in slurmdbd.conf that is different from the value in
          slurm.conf.
      There is a possible bug in the slurm_get_auth_info function (src/common/slurm_protocol_api.c) that can cause the slurmdbd daemon to look for the AuthInfo parameter in slurm.conf instead of slurmdbd.conf when the auth/munge authentication method is used (AuthType=auth/munge).
      
      Here is the slurmdbd log revealing the problem (debug5() printing were added in the sources) :
      
      slurmdbd: slurmdbd version 15.08.7 started
      slurmdbd: debug2: running rollup at Tue Feb 02 14:20:14 2016
      slurmdbd: debug5: in ../../../src/slurmdbd/slurmdbd.c, _send_slurmctld_register_req (line 690)
      slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_send_node_msg (line 3601)
      slurmdbd: debug5: in ../../../../../src/plugins/auth/munge/auth_munge.c, slurm_auth_create (line 217)
      slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_get_auth_ttl (line 1732)
      slurmdbd: debug5: Entering ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info
      slurmdbd: debug:  Reading slurm.conf file: /usr/local/slurm-15-08-7-1/etc/slurm.conf
      slurmdbd: error: s_p_parse_file: unable to status file /usr/local/slurm-15-08-7-1/etc/slurm.conf: No such file or directory, retrying in 1sec up to 60sec
      ...
      
      Then 60 seconds later, the auth_info value returned by slurm_get_auth_info is NULL:
      
      slurmdbd: debug5: Leaving ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info, auth_info=(null)
      
      and slurmdbd continues without crashing, but I am not sure it is in a safe state.
      
      When applying this patch :
      
      diff --git a/src/common/slurm_protocol_api.c b/src/common/slurm_protocol_api.c
      index c5db879..be1dab6 100644
      --- a/src/common/slurm_protocol_api.c
      +++ b/src/common/slurm_protocol_api.c
      @@ -1703,9 +1703,13 @@ extern char *slurm_get_auth_info(void)
              char *auth_info;
              slurm_ctl_conf_t *conf;
      
      -       conf = slurm_conf_lock();
      -       auth_info = xstrdup(conf->authinfo);
      -       slurm_conf_unlock();
      +       if (slurmdbd_conf) {
      +                auth_info = xstrdup(slurmdbd_conf->auth_info);
      +        } else {
      +               conf = slurm_conf_lock();
      +               auth_info = xstrdup(conf->authinfo);
      +               slurm_conf_unlock();
      +       }
      
              return auth_info;
       }
      
      the auth_info value is now valid and consistent with the slurmdbd.conf setting:
      
      slurmdbd: slurmdbd version 15.08.7 started
      slurmdbd: debug2: running rollup at Tue Feb 02 14:47:37 2016
      slurmdbd: debug5: in ../../../src/slurmdbd/slurmdbd.c, _send_slurmctld_register_req (line 690)
      slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_send_node_msg (line 3600)
      slurmdbd: debug5: in ../../../../../src/plugins/auth/munge/auth_munge.c, slurm_auth_create (line 217)
      slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_get_auth_ttl (line 1731)
      slurmdbd: debug5: Entering ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info
      slurmdbd: debug5: Leaving ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info, auth_info=socket=/var/run/munge/munge_dbd.socket.2
      fa4222ec
  2. 01 Feb, 2016 1 commit
  3. 29 Jan, 2016 2 commits
    • Morris Jette's avatar
      Fix double free of memory in slurmctld background mode · b1d88c7b
      Morris Jette authored
      When the slurmctld is in background mode, it will issue double
        free calls on the incomming message buffers, likely leading
        an abort.
      b1d88c7b
    • Morris Jette's avatar
      For for possibly invalid xfree · 4b5ecf40
      Morris Jette authored
      If an invalid trigger message is received by slurmctld, it could
        result in a non-zero array counter and a NULL element array.
        If the element array is NULL, then clear the counter to avoid
        xfree calls of bad pointers.
      4b5ecf40
  4. 28 Jan, 2016 5 commits
    • Morris Jette's avatar
      Don't relocated multi-node core reservations · a801d264
      Morris Jette authored
      Do not automatically relocate an advanced reservation for individual cores
          that spans multiple nodes when nodes in that reservation go down (e.g.
          a 1 core reservation on node "tux1" will be moved if node "tux1" goes
          down, but a reservation containing 2 cores on node "tux1" and 3 cores on
          "tux2" will not be moved node "tux1" goes down). Advanced reservations for
          whole nodes will be moved by default for down nodes.
      bug 2326
      a801d264
    • Tim Wickberg's avatar
      srun - check that found file is not a directory · 15c4bcf1
      Tim Wickberg authored
      avoid attempting to execve() a directory with a name that
      happens to matching that of the desired command. bug 2392.
      15c4bcf1
    • Morris Jette's avatar
      Ignore a reserverations jobs when changing · b77666b5
      Morris Jette authored
      Allow an existing reservation with running jobs to be modified without
          Flags=IGNORE_JOBS.
      bug 2389
      b77666b5
    • Morris Jette's avatar
      burst_buffer/cray - avoid overflow · 214b3abe
      Morris Jette authored
      burst_buffer/cray - Increase size of intermediate variable used to store
          buffer byte size read from DW instance from 32 to 64-bits to avoid overflow
          and reporting invalid buffer sizes.
      bug 2378
      214b3abe
    • Danny Auble's avatar
      GRES - Fix minor typecast issues. · 6f94bb7f
      Danny Auble authored
      6f94bb7f
  5. 27 Jan, 2016 6 commits
  6. 26 Jan, 2016 1 commit
  7. 25 Jan, 2016 4 commits
  8. 22 Jan, 2016 1 commit
  9. 21 Jan, 2016 11 commits
  10. 20 Jan, 2016 7 commits