- 14 Mar, 2018 5 commits
-
-
Danny Auble authored
(slurmdbd was down when slurmctld was started). Bug 4864
-
Danny Auble authored
Bug 4864
-
Morris Jette authored
Set scontrol exit code to 1 if attempting to update a node to DRAIN or DOWN without specifying a reason. Bug 4920.
-
Morris Jette authored
bug 4717
-
Justin Lecher authored
GCC-7 with -Wformat warns about node_features_knl_cray.c:2869:33: warning: ‘%d’ directive output may be truncated writing between 1 and 10 bytes into a region of size 8 [-Wformat-truncation=] snprintf(buf, sizeof(buf), "%d", i); ^~ node_features_knl_cray.c:2869:32: note: directive argument in the range [0, 2147483647] snprintf(buf, sizeof(buf), "%d", i); ^~~~ In file included from /usr/include/stdio.h:862:0, from ../../../../slurm/slurm.h:68, from node_features_knl_cray.c:66: /usr/include/bits/stdio2.h:64:10: note: ‘__builtin___snprintf_chk’ output between 2 and 11 bytes into a destination of size 8 return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ __bos (__s), __fmt, __va_arg_pack ()); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Increasing the buffer to 12 for this. Bug 4900. Signed-off-by: Justin Lecher <jlec@gentoo.org>
-
- 13 Mar, 2018 3 commits
-
-
Danny Auble authored
-
Danny Auble authored
when slurmctld was started) you could loose QOS usage information. Bug 4865
-
Danny Auble authored
To be used in a future commit for Bug 4865
-
- 12 Mar, 2018 3 commits
-
-
Felip Moll authored
6bc5dc30 6d4518fe
-
Felip Moll authored
set to not be shown as runaway. Bug 4847
-
Felip Moll authored
Otherwise incorrect status could be registered in the database and completed jobs be seen as pending, running, and so on. Bug 4847
-
- 08 Mar, 2018 2 commits
-
-
Dominik Bartkiewicz authored
-
Alejandro Sanchez authored
Looks like a regression accidentally introduced in d8770a66. Bug 4769.
-
- 07 Mar, 2018 6 commits
-
-
Doug Jacobsen authored
Bug 4549
-
Doug Jacobsen authored
Bug 4549
-
Doug Jacobsen authored
Bug 4549
-
Doug Jacobsen authored
Bug 4549
-
Doug Jacobsen authored
Bug 4549
-
Tim Wickberg authored
-
- 05 Mar, 2018 1 commit
-
-
Morris Jette authored
-
- 02 Mar, 2018 1 commit
-
-
Brian Christiansen authored
Could happen if SuspendTimeout was shorter than ResumeTimeout. Bug 4863 Continuation of 7d246784
-
- 01 Mar, 2018 1 commit
-
-
Morris Jette authored
bug 4849
-
- 28 Feb, 2018 5 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
Update slurm.spec and slurm.spec-legacy as well
-
Isaac Hartung authored
Also throws spurious errors of: "slurmd: error: Domain socket directory /var/spool/slurmd: No such file or directory" if you SlurmdSpoolDir is located elsewhere. Bug 4289.
-
Dominik Bartkiewicz authored
Add additional protection in slurmctld as well. Bug 4826.
-
Alejandro Sanchez authored
job_limits_check() uses the job desc to call _valid_pn_min_mem(). This second function might adjust the following values (up to date): cpus_per_task pn_min_memory min_cpus max_cpus pn_min_cpus If the function returns success, these adjusted members need to be copied back to the job_record. It turns out pn_min_cpus wasn't copied back, thus the logs claimed to automatically increase pn_min_cpus but actually the job record wasn't modified and the select plugin tried to allocate wrong amount of resources. Bug 4823.
-
- 27 Feb, 2018 1 commit
-
-
Tim Wickberg authored
No longer needed, and will cause errors on FreeBSD systems build with WITHOUT_KERBEROS. Bug 4805.
-
- 23 Feb, 2018 1 commit
-
-
Morris Jette authored
Bug 4783
-
- 22 Feb, 2018 7 commits
-
-
Morris Jette authored
"#DW destroy_persistent" directives available in Cray CLE6.0UP06. This will be supported in Slurm version 18.08. Use "#BB" directives until then. Bug 4302
-
Felip Moll authored
Only a single io_timeout_thread should be created for each sls struct. Creating multiple, while seemingly harmless in operation, can lead to fatal() messages when srun shuts down by destroying mutex locks that are in use by threads that srun doesn't expect to still have running. Regression caused by a1185f04. Bug 4596
-
Morris Jette authored
Bug 4806.
-
Felip Moll authored
This patch fixes the situation that makes features unrecognized where a node features plugin is active and features are defined to nodes in slurm.conf. It also preserves KNL node features when slurmctld daemons are reconfigured including active and available modes. Features not belonging to node features plugin are reset to what is in slurm.conf when restarting or reconfiguring. Bug 4734
-
Alejandro Sanchez authored
_resv_overlap function was only checking the flags for the updated reservation, but not for the rest of present ones. This implied that the allowed overlap derived from these flags only applied depending on the update order. Bug 4806.
-
Alejandro Sanchez authored
After commit b31fa177, we do not defer slurmd node registration if HealthCheckProgram fails. So at slurmd startup, slurmd executes: run_script_health_check(); _spawn_registration_engine(); And does not keeps spinning if NHC fails. Now if there are nodes managed by the Power Save logic, when they are requested to be POWER_UP because a job is allocated resources, then at slurmd startup NHC is executed before node registers. The problem comes when this NHC execution fails, if the NHC program decides to update the node to DRAIN, since the job was already allocated before this update, then the job will attempt to start RUNNING but might fail since NHC detected there's something wrong. So this change what it does is to detect DRAIN/FAIL node update requests, then check if node is ALLOC/MIXED and POWER_[SAVE|UP] and if so then force a requeue, so that the job doesn't start on a failed node. Bug 4689.
-
Felip Moll authored
Can frequently throw scary-sounding messages on short-lived processes that disappear while the stats are collected. Bug 4759.
-
- 21 Feb, 2018 4 commits
-
-
Brian Christiansen authored
-
Chris Samuel authored
Bug 4793
-
Brian Christiansen authored
Bug 4504
-
Brian Christiansen authored
submission option. Bug 4548
-