- 30 May, 2018 9 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
Value of 2113 is where it fits in with 17.11, so pin it here.
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Michael Hinton authored
-
Tim Wickberg authored
Caused by pthread_cancel cleanup by commit e5f03971 in 17.11.6. Bug 5181.
-
Tim Wickberg authored
The race condition was created in a7c8964e in 17.11.6 when removing the (unsafe) pthread_cancel code handling thread termination. Bug 5164
-
- 24 May, 2018 1 commit
-
-
Brian Christiansen authored
Commits f18390e8 and eed76f85 modified the stepd so that if the stepd encountered an unkillable step timeout that the stepd would just exit the stepd. If the stepd is a batch step then it would reply back to the controller with a non-zero exit code which will drain the node. But if an srun allocation/step were to get into the unkillable step code, the steps wouldn't let the waiting srun or controller know about the step going away -- leaving a hanging srun and job. This patch enables the stepd to notify the waiting sruns and the ctld of the stepd being done and drains the node for srun'ed alloction and/or steps. Bug 5164
-
- 21 May, 2018 1 commit
-
-
Dominik Bartkiewicz authored
g_qos_count, g_qos_max_priority, must be call under qos write lock. Bug 5159.
-
- 19 May, 2018 2 commits
-
-
Brian Christiansen authored
Display correct path.
-
Bjørn-Helge Mevik authored
Bug 5151
-
- 18 May, 2018 2 commits
-
-
Brian Christiansen authored
Commits 4454316e and 76706b51 adjusted the updating of priority logic so that when a non-authorized user modifies the priority it will only be temporary -- in most cases the user will never see that change. Bug 5151
-
Marshall Garey authored
Clarification of c2c06468. Bug 5150
-
- 17 May, 2018 1 commit
-
-
Danny Auble authored
PriorityFlags=ACCRUE_ALWAYS is set. Bug 5186
-
- 16 May, 2018 2 commits
-
-
Alejandro Sanchez authored
Bug 5174.
-
Dan Barke authored
Since having 'nocreate' would override the following option: create 640 slurm root Bug 5174.
-
- 15 May, 2018 3 commits
-
-
Morris Jette authored
If ReturnToService=2 is configured, the test could generate an error changing node state to resume after setting it to down. The reason is if the node communicates with slurmctld, then its state will automatically be changed from down to idle and resuming an idle node triggers an error.
-
Alejandro Sanchez authored
Bug 5168.
-
Alejandro Sanchez authored
Previously the default paths continued to be tested even when new ones were requested. This had as a consequence that if any of the new paths was the same as any of the default ones (i.e. /usr or /usr/local), the configure script was incorrectly erroring out specifying that a version of PMIx was already found in a previous path. Bug 5168.
-
- 11 May, 2018 2 commits
-
-
Morris Jette authored
Gracefully fail if salloc does not get job allocation
-
Alejandro Sanchez authored
Introduced in bf4cb0b1.
-
- 10 May, 2018 2 commits
-
-
Morris Jette authored
-
Alejandro Sanchez authored
First issue was identified on multi partition requests. job_limits_check() was overriding the original memory requests, so the next partition Slurm validating limits against was not using the original values. The solution consists in adding three members to job_details struct to preserve the original requests. This issue is reported in bug 4895. Second issue was memory enforcement behavior being different depending on job the request issued against a reservation or not. Third issue had to do with the automatic adjustments Slurm did underneath when the memory request exceeded the limit. These adjustments included increasing pn_min_cpus (even incorrectly beyond the number of cpus available on the nodes) or different tricks increasing cpus_per_task and decreasing mem_per_cpu. Fourth issue was identified when requesting the special case of 0 memory, which was handled inside the select plugin after the partition validations and thus that could be used to incorrectly bypass the limits. Issues 2-4 were identified in bug 4976. Patch also includes an entire refactor on how and when job memory is is both set to default values (if not requested initially) and how and when limits are validated. Co-authored-by: Dominik Bartkiewicz <bart@schedmd.com>
-
- 09 May, 2018 15 commits
-
-
Morris Jette authored
If running without AccountingStorageEnforce but with the DBD and it isn't up when starting the slurmctld you could get into a corner case where you don't have a QOS list in the assoc_mgr. Thus no usage to transfer. Bug 5156
-
Tim Wickberg authored
-
Tim Wickberg authored
Update slurm.spec and slurm.spec-legacy as well
-
Tim Wickberg authored
Clang warns about a possible null dereference of job_part_ptr if the !job_ptr->priority_array part of the conditional is taken. Remove that part of the conditional, as it doesn't matter if that is set or not here. The jobs eligibility on one vs. multiple partition is not determined by that, but by the status of part_ptr_list and part_ptr. Bug 5136.
-
Morris Jette authored
-
Brian Christiansen authored
-
Felip Moll authored
-
Morris Jette authored
Try to fill up each socket completely before moving into additional sockets. This will minimize the number of sockets needed, improving packing especially alongside MaxCPUsPerNode. Bug 4995.
-
Tim Wickberg authored
My mistake on commit 602817c8. Bug 4922.
-
Felip Moll authored
Without this, gang scheduling would incorrectly kick in for these jobs since active_resmap has not been updated appropriately. Bug 4922.
-
Tim Wickberg authored
Code for this was removed in 2012. Bug 5126.
-
Marshall Garey authored
Bug 5026.
-
Tim Wickberg authored
Otherwise this will return the error message back to the next job submitter. Bug 5106.
-
Tim Wickberg authored
Bug 5106.
-
Tim Wickberg authored
Link to CRIU as well. Bug 4293.
-