- 09 May, 2018 27 commits
-
-
Tim Wickberg authored
-
Danny Auble authored
It turns out if we are running cache we need to call set_cluster_tres() again since the pointers in the list have changed and don't have any counts anymore. I opted to just modify the existing locks and call it inside the lock instead of having set_cluster_tres grab the locks.
-
Tim Wickberg authored
-
Tim Wickberg authored
Update slurm.spec and slurm.spec-legacy as well
-
Tim Wickberg authored
Clang warns about a possible null dereference of job_part_ptr if the !job_ptr->priority_array part of the conditional is taken. Remove that part of the conditional, as it doesn't matter if that is set or not here. The jobs eligibility on one vs. multiple partition is not determined by that, but by the status of part_ptr_list and part_ptr. Bug 5136.
-
Morris Jette authored
-
Morris Jette authored
-
Danny Auble authored
-
Danny Auble authored
advent of persistent connections.
-
Brian Christiansen authored
-
Felip Moll authored
-
Morris Jette authored
Try to fill up each socket completely before moving into additional sockets. This will minimize the number of sockets needed, improving packing especially alongside MaxCPUsPerNode. Bug 4995.
-
Tim Wickberg authored
My mistake on commit 602817c8. Bug 4922.
-
Felip Moll authored
Without this, gang scheduling would incorrectly kick in for these jobs since active_resmap has not been updated appropriately. Bug 4922.
-
Tim Wickberg authored
-
Tim Wickberg authored
Code for this was removed in 2012. Bug 5126.
-
Jessica Nettelblad authored
Bug 4563
-
Marshall Garey authored
Bug 5026.
-
Tim Wickberg authored
Otherwise this will return the error message back to the next job submitter. Bug 5106.
-
Tim Wickberg authored
Bug 5106.
-
Tim Wickberg authored
Link to CRIU as well. Bug 4293.
-
Tim Wickberg authored
-
Tim Wickberg authored
Related to fix from bug 4155.
-
Josh Samuelson authored
Bug 4155.
-
Tim Wickberg authored
Made obsolete by structural changes in 17.11. Bug 4953.
-
Alejandro Sanchez authored
job_ptr->part_ptr is NULL if the partition has been deleted. Crash only happens with PriorityFlags=CALCULATE_RUNNING enabled. Bug 5136.
-
Tim Wickberg authored
Partition is deleted immediately, not flagged. Tangentially related to bug 5136.
-
- 08 May, 2018 13 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
Bug 5133.
-
Brian Christiansen authored
Bug 5146
-
Tim Wickberg authored
-
Tim Wickberg authored
Caused by a corrupted protocol_version field value being received by the slurmstepd, as we cannot safely write/read a uint16_t across the pipe as if it was an int. Regression caused by commit 90b116c2. Bug 5133.
-
Danny Auble authored
-
Brian Christiansen authored
since it's not allocated anymore. Found while investigating Bugs 5137,4522.
-
Danny Auble authored
Regression caused in commit fa3a8ff1. Coverity issue 182984
-
Brian Christiansen authored
Requeued jobs are marked as PENDING|COMPLETING until the epilog checks in. The issue is that if job_set_alloc_tres gets called while in the PENDING|COMPLETING state, the job's alloc_tres_str will be free'd. If this job then gets checkpointed in this state (PENDING|COMPLETING + no tres_alloc_str) on startup the controller would crash because it expected the job to have a tres_alloc_str/cnt when in the COMPLETING state. This could be triggered if starting the controller without the dbd up. When the dbd comes up, the assoc_cache_mgr calls _update_job_tres() which calls job_set_alloc_tres. It could also be triggered by adding new tres. This most likely started happening in 17.11.5 because of commit 865b672f which introduced calling _update_job_tres() on each job after the dbd comes up. Bugs 5137,4522
-
Morris Jette authored
Coverity CID 185507
-
Morris Jette authored
Coverity CID 185506
-
Morris Jette authored
Coverity CID 185503
-
Morris Jette authored
Coverity CID 185505
-