- 10 Feb, 2016 2 commits
-
-
Morris Jette authored
Add new PowerParameters options of get_timeout and set_timeout. The default set_timeout was increased from 5 seconds to 30 seconds. Also re-read current power caps periodically or after any failed "set" operation. bug 2332
-
Tim Wickberg authored
Build should work on non-glibc distributions by better POSIX conformance.
-
- 09 Feb, 2016 2 commits
-
-
Danny Auble authored
structure.
-
Alejandro Sanchez authored
-
- 08 Feb, 2016 1 commit
-
-
Danny Auble authored
-
- 04 Feb, 2016 1 commit
-
-
Brian Christiansen authored
Bug 2406 Continuations of a0bb9c1e and 4b0fc9e8
-
- 03 Feb, 2016 2 commits
-
-
Yu Watanabe authored
-
Danny Auble authored
-
- 02 Feb, 2016 3 commits
-
-
Tim Wickberg authored
Use PRIu64 instead of %ld for uint64_t types (libsh5util_old), %zu instead of PRIu64 for size_t.
-
Tim Wickberg authored
-
Didier GAZEN authored
Support AuthInfo in slurmdbd.conf that is different from the value in slurm.conf. There is a possible bug in the slurm_get_auth_info function (src/common/slurm_protocol_api.c) that can cause the slurmdbd daemon to look for the AuthInfo parameter in slurm.conf instead of slurmdbd.conf when the auth/munge authentication method is used (AuthType=auth/munge). Here is the slurmdbd log revealing the problem (debug5() printing were added in the sources) : slurmdbd: slurmdbd version 15.08.7 started slurmdbd: debug2: running rollup at Tue Feb 02 14:20:14 2016 slurmdbd: debug5: in ../../../src/slurmdbd/slurmdbd.c, _send_slurmctld_register_req (line 690) slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_send_node_msg (line 3601) slurmdbd: debug5: in ../../../../../src/plugins/auth/munge/auth_munge.c, slurm_auth_create (line 217) slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_get_auth_ttl (line 1732) slurmdbd: debug5: Entering ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info slurmdbd: debug: Reading slurm.conf file: /usr/local/slurm-15-08-7-1/etc/slurm.conf slurmdbd: error: s_p_parse_file: unable to status file /usr/local/slurm-15-08-7-1/etc/slurm.conf: No such file or directory, retrying in 1sec up to 60sec ... Then 60 seconds later, the auth_info value returned by slurm_get_auth_info is NULL: slurmdbd: debug5: Leaving ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info, auth_info=(null) and slurmdbd continues without crashing, but I am not sure it is in a safe state. When applying this patch : diff --git a/src/common/slurm_protocol_api.c b/src/common/slurm_protocol_api.c index c5db879..be1dab6 100644 --- a/src/common/slurm_protocol_api.c +++ b/src/common/slurm_protocol_api.c @@ -1703,9 +1703,13 @@ extern char *slurm_get_auth_info(void) char *auth_info; slurm_ctl_conf_t *conf; - conf = slurm_conf_lock(); - auth_info = xstrdup(conf->authinfo); - slurm_conf_unlock(); + if (slurmdbd_conf) { + auth_info = xstrdup(slurmdbd_conf->auth_info); + } else { + conf = slurm_conf_lock(); + auth_info = xstrdup(conf->authinfo); + slurm_conf_unlock(); + } return auth_info; } the auth_info value is now valid and consistent with the slurmdbd.conf setting: slurmdbd: slurmdbd version 15.08.7 started slurmdbd: debug2: running rollup at Tue Feb 02 14:47:37 2016 slurmdbd: debug5: in ../../../src/slurmdbd/slurmdbd.c, _send_slurmctld_register_req (line 690) slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_send_node_msg (line 3600) slurmdbd: debug5: in ../../../../../src/plugins/auth/munge/auth_munge.c, slurm_auth_create (line 217) slurmdbd: debug5: in ../../../src/common/slurm_protocol_api.c, slurm_get_auth_ttl (line 1731) slurmdbd: debug5: Entering ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info slurmdbd: debug5: Leaving ../../../src/common/slurm_protocol_api.c, slurm_get_auth_info, auth_info=socket=/var/run/munge/munge_dbd.socket.2
-
- 01 Feb, 2016 1 commit
-
-
David Gloe authored
contribs/cray/csm/slurmconfgen_smw.py - avoid erroneously including repurposed compute nodes in the list of nodes to start slurmd.
-
- 29 Jan, 2016 1 commit
-
-
Morris Jette authored
When the slurmctld is in background mode, it will issue double free calls on the incomming message buffers, likely leading an abort.
-
- 28 Jan, 2016 5 commits
-
-
Morris Jette authored
Do not automatically relocate an advanced reservation for individual cores that spans multiple nodes when nodes in that reservation go down (e.g. a 1 core reservation on node "tux1" will be moved if node "tux1" goes down, but a reservation containing 2 cores on node "tux1" and 3 cores on "tux2" will not be moved node "tux1" goes down). Advanced reservations for whole nodes will be moved by default for down nodes. bug 2326
-
Tim Wickberg authored
avoid attempting to execve() a directory with a name that happens to matching that of the desired command. bug 2392.
-
Morris Jette authored
Allow an existing reservation with running jobs to be modified without Flags=IGNORE_JOBS. bug 2389
-
Morris Jette authored
burst_buffer/cray - Increase size of intermediate variable used to store buffer byte size read from DW instance from 32 to 64-bits to avoid overflow and reporting invalid buffer sizes. bug 2378
-
Danny Auble authored
-
- 27 Jan, 2016 5 commits
-
-
Danny Auble authored
-
Danny Auble authored
gres types without a File.
-
Danny Auble authored
-
Danny Auble authored
to debug3 when trying to find the correct association. a continuation to commit 87d9370f
-
Alejandro Sanchez authored
-
- 25 Jan, 2016 2 commits
-
-
Morris Jette authored
Previously under some conditions that boot completion was ignored and the job kept pending.
-
Sergey Meirovich authored
-
- 22 Jan, 2016 1 commit
-
-
Danny Auble authored
-
- 21 Jan, 2016 7 commits
-
-
Danny Auble authored
Bug 2364
-
Danny Auble authored
Commit fa331e30 fixes this. The logic was bad to begin with... uint32_t new_cpus = detail_ptr->num_tasks / detail_ptr->cpus_per_task; The / should had been * this whole time. This was the reason we found this in the first place.
-
Morris Jette authored
If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. bug 2256
-
Morris Jette authored
-
Danny Auble authored
-
Morris Jette authored
Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. bug 2350
-
Danny Auble authored
-
- 20 Jan, 2016 3 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
Properly account for memory, CPUs and GRES when slurmctld is reconfigured while there is a suspended job. Previous logic would add the CPUs, but not memory or GPUs. This would result in underflow/overflow errors in select cons_res plugin. bug 2353
-
- 17 Jan, 2016 1 commit
-
-
jette authored
Fix backfill scheduling bug which could postpone the scheduling of jobs due to avoidance of nodes in COMPLETING state. bug 2350
-
- 15 Jan, 2016 3 commits
-
-
Brian Christiansen authored
Bug 2255
-
Morris Jette authored
-
Brian Christiansen authored
Bug 2343
-