- 03 May, 2018 13 commits
-
-
Brian Christiansen authored
-
Brian Christiansen authored
Bug 4274
-
Brian Christiansen authored
Verify that a tres weight is an integer.
-
Brian Christiansen authored
This allows job limits to be enforced at submission -- with QOS DenyOnLimit flag. Note that the values could be expanded at schedule time (e.g. request one task but get all cpus on a core). The expanded values are considered when scheduling.
-
Brian Christiansen authored
-
Isaac Hartung authored
-
Isaac Hartung authored
-
Brian Christiansen authored
Cont. Bug 4274
-
Isaac Hartung authored
Cont. Bug 4274
-
Brian Christiansen authored
being able to be set/updated on a partition. Bug 4274
-
Isaac Hartung authored
Bug 4274
-
Isaac Hartung authored
-
Isaac Hartung authored
-
- 02 May, 2018 9 commits
-
-
Danny Auble authored
Bug 5103
-
Danny Auble authored
-
Danny Auble authored
Bug 5103
-
Danny Auble authored
Bug 5103
-
Danny Auble authored
# Conflicts: # src/slurmctld/job_mgr.c
-
Tim Wickberg authored
Can lead to deadlock within malloc depending on the exact timing. Rework thread startup and shutdown code so pthread_cancel is not needed. Bug 5119, 5103.
-
Tim Wickberg authored
happens. Bug 5108
-
Danny Auble authored
This reverts commit de5a4da2.
-
Danny Auble authored
happens. Bug 5108
-
- 01 May, 2018 7 commits
-
-
Danny Auble authored
Turns out the partititon's billing tres was working off the sum of the node_ptrs which contain the max of all partitions they are in. This isn't correct since each partition's billing can be different. Set it correctly here.
-
Tim Wickberg authored
Noticed while auditing use of pthread_cancel.
-
Danny Auble authored
This is a followup to the last commit.
-
Danny Auble authored
thread. Otherwise you could get into a race where we don't have it running when the registration response is sent back which now leads us to not have any TRES to send to the slurmstepd.
-
Danny Auble authored
This will give us debug before we had it before. I see no reason to delay it until later.
-
Danny Auble authored
We found on most systems we would only need to wait < 20000 usecs for this to happen. This is much shorter of a time than the before 1 sec. We found we almost always (100% from my testings) for the step to finish in the first place though.
-
Tim Wickberg authored
No functional change.
-
- 30 Apr, 2018 10 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
The use in _watch_tasks needs to be removed as the switch to pthread_signal from pthread_cancel means this will not get interrupted and would keep the step alive for at least a second, potentially harming throughput. Since the call to _poll_data() happens after the first timer expires, this delay turns out to be unnecessary, so we won't be replacing it with a pthread_cond_timedwait() construct. The use jobacct_gather_stat_task() is unnecessary since the two locations this can happen take place after _fork_all_tasks() has setup the tasks, thus the delay should not be necessary. Bug 5103.
-
Tim Wickberg authored
These functions are not async-cancel-safe, and cannot safely be cancelled. This leads to potential deadlock, either between our own locks, or deep inside glibc when the thread held a malloc arena lock when canceled. Replace with pthread_signal to the appropriate cond to wake threads up at the appropriate time instead. Bug 5103.
-
Danny Auble authored
This will make it easier in a future commit to avoid the async pthread_cancel. Bug 5103
-
Alejandro Sanchez authored
Bug 5110.
-
Danny Auble authored
# Conflicts: # src/slurmctld/job_mgr.c
-
Marshall Garey authored
Remove partition MaxTime limit at the beginning of the test, run the rest of the test, then restore the partition configuration with scontrol reconfigure. Bug 4994.
-
Marshall Garey authored
Otherwise the extern step will disappear after 11.5 days. Bug 5000.
-
Dominik Bartkiewicz authored
to be sure if it is created under job write lock. Bug 4901
-
- 28 Apr, 2018 1 commit
-
-
Tim Wickberg authored
-