- 02 Feb, 2016 7 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
$((10#$SLURM_API_MAJOR)) is bash-specific. replace with portable ${SLURM_API_MAJOR#0} which accomplishes the same thing. The first forces bash to treat the value as base-10 even with a leading zero, the second portable format strips a leading zero off.
-
Tim Wickberg authored
Also remove checks for sys/termios.h from build system. Slurm directly includes the POSIX-required <termios.h> already, and the one use of this conditional is being removed here. Fixes one of several build errors on FreeBSD.
-
Morris Jette authored
-
Morris Jette authored
This fixes a bug introduced in commit a801d264 Whole node resource allocations with REPLACE option were not working. Detected by test3.14 failure.
-
Didier GAZEN authored
Support AuthInfo in slurmdbd.conf that is different from the value in slurm.conf. There is a possible bug in the slurm_get_auth_info function (src/common/slurm_protocol_api.c) that can cause the slurmdbd daemon to look for the AuthInfo parameter in slurm.conf instead of slurmdbd.conf when the auth/munge authentication method is used (AuthType=auth/munge). Here is the slurmdbd log revealing the problem (debug5() printing were added in the sources) : slurmdbd: slurmdbd version 15.08.7 started slurmdbd: debug2: running rollup at Tue Feb 02 14:20:14 2016 slurmdbd: debug5: in ../../../src/slurmdbd/slurmdbd.c, _send_slurmctld_register_req (line 690) slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_send_node_msg (line 3601) slurmdbd: debug5: in ../../../../../src/plugins/auth/munge/auth_munge.c, slurm_auth_create (line 217) slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_get_auth_ttl (line 1732) slurmdbd: debug5: Entering ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info slurmdbd: debug: Reading slurm.conf file: /usr/local/slurm-15-08-7-1/etc/slurm.conf slurmdbd: error: s_p_parse_file: unable to status file /usr/local/slurm-15-08-7-1/etc/slurm.conf: No such file or directory, retrying in 1sec up to 60sec ... Then 60 seconds later, the auth_info value returned by slurm_get_auth_info is NULL: slurmdbd: debug5: Leaving ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info, auth_info=(null) and slurmdbd continues without crashing, but I am not sure it is in a safe state. When applying this patch : diff --git a/src/common/slurm_protocol_api.c b/src/common/slurm_protocol_api.c index c5db879..be1dab6 100644 --- a/src/common/slurm_protocol_api.c +++ b/src/common/slurm_protocol_api.c @@ -1703,9 +1703,13 @@ extern char *slurm_get_auth_info(void) char *auth_info; slurm_ctl_conf_t *conf; - conf = slurm_conf_lock(); - auth_info = xstrdup(conf->authinfo); - slurm_conf_unlock(); + if (slurmdbd_conf) { + auth_info = xstrdup(slurmdbd_conf->auth_info); + } else { + conf = slurm_conf_lock(); + auth_info = xstrdup(conf->authinfo); + slurm_conf_unlock(); + } return auth_info; } the auth_info value is now valid and consistent with the slurmdbd.conf setting: slurmdbd: slurmdbd version 15.08.7 started slurmdbd: debug2: running rollup at Tue Feb 02 14:47:37 2016 slurmdbd: debug5: in ../../../src/slurmdbd/slurmdbd.c, _send_slurmctld_register_req (line 690) slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_send_node_msg (line 3600) slurmdbd: debug5: in ../../../../../src/plugins/auth/munge/auth_munge.c, slurm_auth_create (line 217) slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_get_auth_ttl (line 1731) slurmdbd: debug5: Entering ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info slurmdbd: debug5: Leaving ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info, auth_info=socket=/var/run/munge/munge_dbd.socket.2
-
Morris Jette authored
Reserve node weight value of INIFINITE for nodes which require reboot Avoid scheduling on nodes requiring reboot that are not IDLE (More work needed for backfill and will_run RPC).
-
- 01 Feb, 2016 8 commits
-
-
Tim Wickberg authored
parse_time - strlcpy not strncpy slurm_protocol_api - set tree width to 1 as a default, 0 leads coverity to warn about potential div/0 pmi2/setup.c - avoid strncpy entirely with a small rearrangement
-
David Gloe authored
contribs/cray/csm/slurmconfgen_smw.py - avoid erroneously including repurposed compute nodes in the list of nodes to start slurmd.
-
Danny Auble authored
-
Morris Jette authored
Added support for node features with or without counts
-
Tim Wickberg authored
$((10#$SLURM_API_MAJOR)) is bash-specific. replace with portable ${SLURM_API_MAJOR#0} which accomplishes the same thing. The first forces bash to treat the value as base-10 even with a leading zero, the second portable format strips a leading zero off.
-
Tim Wickberg authored
-
Morris Jette authored
Only do if job reboot requested
-
Morris Jette authored
-
- 30 Jan, 2016 1 commit
-
-
Morris Jette authored
Including setting of default MCDRAM and NUMA modes
-
- 29 Jan, 2016 4 commits
-
-
Morris Jette authored
-
Morris Jette authored
When the slurmctld is in background mode, it will issue double free calls on the incomming message buffers, likely leading an abort.
-
Morris Jette authored
If an invalid trigger message is received by slurmctld, it could result in a non-zero array counter and a NULL element array. If the element array is NULL, then clear the counter to avoid xfree calls of bad pointers.
-
Morris Jette authored
-
- 28 Jan, 2016 10 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Morris Jette authored
Conflicts: doc/html/reservations.shtml
-
Morris Jette authored
Do not automatically relocate an advanced reservation for individual cores that spans multiple nodes when nodes in that reservation go down (e.g. a 1 core reservation on node "tux1" will be moved if node "tux1" goes down, but a reservation containing 2 cores on node "tux1" and 3 cores on "tux2" will not be moved node "tux1" goes down). Advanced reservations for whole nodes will be moved by default for down nodes. bug 2326
-
Alejandro Sanchez authored
-
Tim Wickberg authored
avoid attempting to execve() a directory with a name that happens to matching that of the desired command. bug 2392.
-
Morris Jette authored
Allow an existing reservation with running jobs to be modified without Flags=IGNORE_JOBS. bug 2389
-
Morris Jette authored
burst_buffer/cray - Increase size of intermediate variable used to store buffer byte size read from DW instance from 32 to 64-bits to avoid overflow and reporting invalid buffer sizes. bug 2378
-
Danny Auble authored
-
- 27 Jan, 2016 10 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
gres types without a File.
-
Danny Auble authored
-
Danny Auble authored
to debug3 when trying to find the correct association. a continuation to commit 87d9370f
-
Alejandro Sanchez authored
-
Morris Jette authored
This enables node power save logic to be used in conjunction with node reboot (e.g. "sbatch --reboot ...")
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-