- 18 Jul, 2018 3 commits
-
-
Dominik Bartkiewicz authored
As reported by Avalon Johnson on slurm-users https://groups.google.com/forum/#!topic/slurm-users/BsMQ7Uk1PLw Bug 5287.
-
Brian Christiansen authored
srun was already fixed in b7053bda (Bug 3294). Bug 5126
-
Brian Christiansen authored
-
- 17 Jul, 2018 3 commits
-
-
Felip Moll authored
When printing arrays in squeue and setting the SLURM_BITSTR_LEN variable to 0 or to NULL, the length of the output defaulted to 64, when the documentation says it will default to "unlimited". This patch fixes this situation. Bug 5440
-
Marshall Garey authored
Documented, and code reads as needing, the node lock. But these were incorrectly set as the job locks. Bug 5394.
-
Dominik Bartkiewicz authored
Needs the job write lock, as it may change job status not just node status. Especially after commit 33e352a6. Bug 5406.
-
- 13 Jul, 2018 1 commit
-
-
Isaac Hartung authored
Add errno to info message in the SlurmDBD log, and pass the actual errno back to the sacctmgr process so the user can see it. Bug 5152.
-
- 12 Jul, 2018 3 commits
-
-
Boris Karasev authored
- avoid `abort()` when collective is failed - added logging of coll details for fail cases Bug 5067
-
Danny Auble authored
Note, this is setting it up so we can use defunct functions. It will probably need to be properly fixed in a future version so we don't do this.
-
Dominik Bartkiewicz authored
with preemption or when job requests a specific list of hosts. Bug 5293.
-
- 09 Jul, 2018 1 commit
-
-
Danny Auble authored
-
- 06 Jul, 2018 1 commit
-
-
Marshall Garey authored
Continuation of 923c9b37. There is a delay in the cgroup system when moving a PID from one cgroup to another. It is usually short, but if we don't wait for the PID to move before removing cgroup directories the PID previously belonged to, we could leak cgroups. This was previously fixed in the cpuset and devices subsystems. This uses the same logic to fix the freezer subsystem. Bug 5082.
-
- 04 Jul, 2018 1 commit
-
-
Morris Jette authored
So that multiple nodes changes will be reported on one line rather than one line per node. Otherwise this could lead to performance issues when reloading slurmctld in big systems. Bug4980
-
- 03 Jul, 2018 1 commit
-
-
Brian Christiansen authored
Currently, no caller checks the return code. Bug 5164
-
- 26 Jun, 2018 4 commits
-
-
Dominik Bartkiewicz authored
Some job fields can change in the course of scheduling. This patch reinitializes previously adjusted job fields to their original value when validating the job memory in multi-partition requests. Bug 4895.
-
Alejandro Sanchez authored
This reverts commit bf4cb0b1. Bug 5240, Bug 4895 and Bug 4976.
-
Felip Moll authored
When one asks for an inactive feature and also specifies the node with -w flag, the node will be rebooted despite it may contain running jobs. bug4821
-
Tim Wickberg authored
and avoid race condition calling task before proctrack can introduce. Bug 5319
-
- 25 Jun, 2018 1 commit
-
-
Morris Jette authored
delayed until the first job completes execution and it's burst buffer stage-out is completed. Bug 4675
-
- 22 Jun, 2018 1 commit
-
-
Dominik Bartkiewicz authored
Bug 5159.
-
- 20 Jun, 2018 1 commit
-
-
Alejandro Sanchez authored
Previously the function was only testing against the first partition in the job_record. Now it detects if the job request is multi partition and if so then loops through all of them until the job will run in any or until the end of the list, returning the error code from the last one if the job won't run in any partition. Bug 5185
-
- 19 Jun, 2018 2 commits
-
-
Isaac Hartung authored
When requesting specific jobids with sacct, the starttime of the request is 0, which will cause the time range to be outside of the MaxQueryTimeRange range -- if specified. When requesting specific jobids, sacct should be able to find the job whenever it started -- unless confined to a smaller range with -S and/or -E. Bug 5009
-
Felip Moll authored
-
- 18 Jun, 2018 1 commit
-
-
Danny Auble authored
Specifically due to SELECT ... FOR UPDATE ones. Bug 5086.
-
- 15 Jun, 2018 2 commits
-
-
Marshall Garey authored
Bug 5270.
-
Tim Wickberg authored
Instead of unintentionally rejecting the update from a non-Administrator if the job_submit plugin modified that field. Bug 5306.
-
- 12 Jun, 2018 3 commits
-
-
Danny Auble authored
-
Danny Auble authored
Bug 5286
-
Tim Wickberg authored
RHEL 6 (and related) use lua as the package name, test if that package exists with a version >= 5.1 if the other tests have already failed. Bug 5263.
-
- 08 Jun, 2018 2 commits
-
-
Tim Wickberg authored
And do not list each individual sensor reading but just the computed sum of each one grouped by key. Bug5274
-
Morris Jette authored
This is in anticipation of an upcoming change to the cgroup hierarchy on a future CLE release. Bug 5145.
-
- 06 Jun, 2018 1 commit
-
-
Brian Christiansen authored
which were marked down due to ResumeTimeout. If a cloud node was marked down due to not responding by ResumeTimeout, the code inadvertently added the node back to the avail_node_bitmap -- after being cleared by set_node_down_ptr(). The scheduler would then attempt to allocate the node again, which would cause a loop of hitting ResumeTimeout and allocating the downed node again. Bug 5264
-
- 05 Jun, 2018 1 commit
-
-
Killian authored
Bug 5206.
-
- 31 May, 2018 1 commit
-
-
Alejandro Sanchez authored
There were two code paths building an allocation response by calling its own static _build_alloc_msg() function: 1. src/slurmctld/proc_req.c 2. src/slurmctld/srun_comm.c These two functions diverged and both had members that were not filled in but were filled in the other. This patch makes it so we change the signature of the one in proc_req.c to make it extern and then in srun_comm.c we call this newly common function. Also added cpu_freq_[min|max|gov] members in the common one since these were the only members missing in proc_req.c function (the one in srun_comm.c had more members missing, like all the ntasks_per*, account, qos or resv_name). Bug 4999.
-
- 30 May, 2018 6 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Marshall Garey authored
Only trust MUNGE signed values, unless the RPC was signed by SlurmUser or root. CVE-2018-10995.
-
Tim Wickberg authored
Do not defer until later, and do not potentially miss out on proper validation of the user_name field which can lead to improper authentication handling. CVE-2018-10995.
-
Dominik Bartkiewicz authored
Bug 5038.
-
Tim Wickberg authored
Caused by pthread_cancel cleanup by commit e5f03971 in 17.11.6. Bug 5181.
-