- 09 May, 2018 23 commits
-
-
Danny Auble authored
It turns out if we are running cache we need to call set_cluster_tres() again since the pointers in the list have changed and don't have any counts anymore. I opted to just modify the existing locks and call it inside the lock instead of having set_cluster_tres grab the locks.
-
Morris Jette authored
-
Morris Jette authored
-
Danny Auble authored
-
Danny Auble authored
advent of persistent connections.
-
Brian Christiansen authored
-
Felip Moll authored
-
Morris Jette authored
Try to fill up each socket completely before moving into additional sockets. This will minimize the number of sockets needed, improving packing especially alongside MaxCPUsPerNode. Bug 4995.
-
Tim Wickberg authored
My mistake on commit 602817c8. Bug 4922.
-
Felip Moll authored
Without this, gang scheduling would incorrectly kick in for these jobs since active_resmap has not been updated appropriately. Bug 4922.
-
Tim Wickberg authored
-
Tim Wickberg authored
Code for this was removed in 2012. Bug 5126.
-
Jessica Nettelblad authored
Bug 4563
-
Marshall Garey authored
Bug 5026.
-
Tim Wickberg authored
Otherwise this will return the error message back to the next job submitter. Bug 5106.
-
Tim Wickberg authored
Bug 5106.
-
Tim Wickberg authored
Link to CRIU as well. Bug 4293.
-
Tim Wickberg authored
-
Tim Wickberg authored
Related to fix from bug 4155.
-
Josh Samuelson authored
Bug 4155.
-
Tim Wickberg authored
Made obsolete by structural changes in 17.11. Bug 4953.
-
Alejandro Sanchez authored
job_ptr->part_ptr is NULL if the partition has been deleted. Crash only happens with PriorityFlags=CALCULATE_RUNNING enabled. Bug 5136.
-
Tim Wickberg authored
Partition is deleted immediately, not flagged. Tangentially related to bug 5136.
-
- 08 May, 2018 14 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
Bug 5133.
-
Brian Christiansen authored
Bug 5146
-
Tim Wickberg authored
-
Tim Wickberg authored
Caused by a corrupted protocol_version field value being received by the slurmstepd, as we cannot safely write/read a uint16_t across the pipe as if it was an int. Regression caused by commit 90b116c2. Bug 5133.
-
Danny Auble authored
-
Brian Christiansen authored
since it's not allocated anymore. Found while investigating Bugs 5137,4522.
-
Danny Auble authored
Regression caused in commit fa3a8ff1. Coverity issue 182984
-
Brian Christiansen authored
Requeued jobs are marked as PENDING|COMPLETING until the epilog checks in. The issue is that if job_set_alloc_tres gets called while in the PENDING|COMPLETING state, the job's alloc_tres_str will be free'd. If this job then gets checkpointed in this state (PENDING|COMPLETING + no tres_alloc_str) on startup the controller would crash because it expected the job to have a tres_alloc_str/cnt when in the COMPLETING state. This could be triggered if starting the controller without the dbd up. When the dbd comes up, the assoc_cache_mgr calls _update_job_tres() which calls job_set_alloc_tres. It could also be triggered by adding new tres. This most likely started happening in 17.11.5 because of commit 865b672f which introduced calling _update_job_tres() on each job after the dbd comes up. Bugs 5137,4522
-
Morris Jette authored
Coverity CID 185507
-
Morris Jette authored
Coverity CID 185506
-
Morris Jette authored
Coverity CID 185503
-
Morris Jette authored
Coverity CID 185505
-
Morris Jette authored
Coverity CID 185504
-
- 07 May, 2018 3 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-