- 02 May, 2018 4 commits
-
-
Tim Wickberg authored
Can lead to deadlock within malloc depending on the exact timing. Rework thread startup and shutdown code so pthread_cancel is not needed. Bug 5119, 5103.
-
Tim Wickberg authored
happens. Bug 5108
-
Danny Auble authored
This reverts commit de5a4da2.
-
Danny Auble authored
happens. Bug 5108
-
- 01 May, 2018 2 commits
-
-
Danny Auble authored
Turns out the partititon's billing tres was working off the sum of the node_ptrs which contain the max of all partitions they are in. This isn't correct since each partition's billing can be different. Set it correctly here.
-
Tim Wickberg authored
No functional change.
-
- 30 Apr, 2018 7 commits
-
-
Tim Wickberg authored
The use in _watch_tasks needs to be removed as the switch to pthread_signal from pthread_cancel means this will not get interrupted and would keep the step alive for at least a second, potentially harming throughput. Since the call to _poll_data() happens after the first timer expires, this delay turns out to be unnecessary, so we won't be replacing it with a pthread_cond_timedwait() construct. The use jobacct_gather_stat_task() is unnecessary since the two locations this can happen take place after _fork_all_tasks() has setup the tasks, thus the delay should not be necessary. Bug 5103.
-
Tim Wickberg authored
These functions are not async-cancel-safe, and cannot safely be cancelled. This leads to potential deadlock, either between our own locks, or deep inside glibc when the thread held a malloc arena lock when canceled. Replace with pthread_signal to the appropriate cond to wake threads up at the appropriate time instead. Bug 5103.
-
Danny Auble authored
This will make it easier in a future commit to avoid the async pthread_cancel. Bug 5103
-
Alejandro Sanchez authored
Bug 5110.
-
Marshall Garey authored
Remove partition MaxTime limit at the beginning of the test, run the rest of the test, then restore the partition configuration with scontrol reconfigure. Bug 4994.
-
Marshall Garey authored
Otherwise the extern step will disappear after 11.5 days. Bug 5000.
-
Dominik Bartkiewicz authored
to be sure if it is created under job write lock. Bug 4901
-
- 28 Apr, 2018 4 commits
-
-
Brian Christiansen authored
-
Brian Christiansen authored
In conjuction with previous commit (reconginizing nodes being powered up out of band) set node's last_idle to 0 when the node is in a power_save state. Additional meaning that the node isn't booted. Partially reverts da722a89. Checking for (last_idle > 0) when in power_save state isn't necessary because if the node is already in power_save state the node won't be resumed unless (node_ptr->last_idle > (now - SuspendTime)). And with the previous change, the node's last_idle time will be set when the node registers.
-
Brian Christiansen authored
Bug 5053
-
Brian Christiansen authored
This allows the suspend script to be triggered even if Slurm has the node(s) in a power_save state. Bug 5053
-
- 27 Apr, 2018 1 commit
-
-
Danny Auble authored
-
- 26 Apr, 2018 2 commits
-
-
Morris Jette authored
The test was failing solidly on a Cray with NHC configured
-
Morris Jette authored
Disable the tests as needed
-
- 25 Apr, 2018 2 commits
-
-
Danny Auble authored
by hwloc_obj_type_snprintf. You will only see this if you have _DEBUG set to 1.
-
Danny Auble authored
-
- 24 Apr, 2018 3 commits
-
-
Morris Jette authored
The included lightweight corefile description is no longer valid, but misleading at best.
-
Christopher Bottoms authored
-
Isaac Hartung authored
-
- 23 Apr, 2018 2 commits
-
-
Morris Jette authored
-
Morris Jette authored
When any of these --exclusive modes couldn't be satisfied, Slurm was returning an incorrect ESLURM_NODE_NOT_AVAIL, having as a consequence scheduling problems as described in the bug. The fix makes it so the error code is properly set to ESLURM_NODES_BUSY, fixing also the scheduling problems and working over the correct share_node_bitmap. Continuation of commits from bug 4932: e2a14b8d fc4e5ac9 Bug 5047.
-
- 19 Apr, 2018 4 commits
-
-
Marshall Garey authored
Fix an issue in the bit manipulation log introduced in commit 892ffa89. Bug 4997.
-
Isaac Hartung authored
And related KillOnBadExit setting in slurm.conf. These only affect an individual job step, not the entire job. Bug 5023.
-
Tim Wickberg authored
Replace select_p_select_jobinfo_sprint() with the same NO-OP that the other plugins (except alps and bluegene) have implemented. Bug 5077.
-
Isaac Hartung authored
Bug 5049
-
- 17 Apr, 2018 3 commits
-
-
Morris Jette authored
-
Morris Jette authored
1. Identifies nodes which are unavailable to a specific job, adding a call to filter_by_node_owner() in select_nodes() where the node list is generated. 2. Removes the "unavail_node_str" argument to select_nodes() as it is no longer useful. This string originally was originally generated once at the start of the job scheduling logic for all jobs, but since each job can have a different set of unavailable nodes (dedicated to user, group, etc.) so the same string for all jobs can be misleading. Bug 4932.
-
Dominik Bartkiewicz authored
Prevent from wrongly returning, ESLURM_NODE_NOT_AVAIL from _pick_best_nodes when some jobs are using "--exclusive=user" Bug 4932.
-
- 16 Apr, 2018 4 commits
-
-
Tim Wickberg authored
-
Thomas HAMEL authored
Improve performance of 'squeue -u' when PrivateData=jobs is enabled by moving the UID filter code ahead of the more expensive PrivateData=job checks. Bug 5056.
-
Dominik Bartkiewicz authored
See commit 0dabf4e7. Bug 4932.
-
Dominik Bartkiewicz authored
regression from ef1f3e73. Bug 4885.
-
- 14 Apr, 2018 2 commits
-
-
Michael Hinton authored
-
Michael Hinton authored
-