- 17 Mar, 2017 21 commits
-
-
Brian Christiansen authored
-
Brian Christiansen authored
Add and remove siblings jobs based off new cluster features.
-
Brian Christiansen authored
-
Brian Christiansen authored
Now viable siblings -- siblings where siblings job could run on (e.g. after requested cluster and cluster features applied) and active siblings are distinguished. The remote sibling jobs only need to know about the viable siblings and not the actual siblings. This simplies things a little bit by not having to update the remote sibling jobs when the active siblings change (e.g. cluster rejects the submission), only when the viable siblings are changed (scontrol update clusterfeatures).
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
To catch locks going negative while in development.
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
Don't need to check for the cluster pointer.
-
Brian Christiansen authored
For copying one char_list into another.
-
Brian Christiansen authored
And reset the iterator so that fb,fb doesn't get added twice.
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
I found today that SIGTERM'ed jobs show as OOM'ed in the database today. From this commit: commit 2a75b72d A job terminating with SIG_TERM (and others) will incorrectly report the job termination state as Out Of Memory.
-
- 16 Mar, 2017 19 commits
-
-
Danny Auble authored
-
Danny Auble authored
# Conflicts: # src/slurmctld/acct_policy.c
-
Danny Auble authored
-
Danny Auble authored
Association. This reverts commits 92d2c645 and 37be42ec. This caused incorrect behavior, original code was correct. This also corrects documentation additions in commit 4cfe6bde. This code caused the first clause never to be correct and if you were over the limit you would get the third clause reporting a huge number available where it should be a negative number. The reality is the first clause should had been triggered and handled correctly.
-
Danny Auble authored
This reverts commit af52111c.
-
Josh Samuelson authored
Association. This reverts commits 92d2c645 and 37be42ec. This caused incorrect behavior, original code was correct. This also corrects documentation additions in commit 4cfe6bde. This code caused the first clause never to be correct and if you were over the limit you would get the third clause reporting a huge number available where it should be a negative number. The reality is the first clause should had been triggered and handled correctly.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
conversion process easier.
-
Danny Auble authored
-
Danny Auble authored
a SUM of the steps to the job table.
-
Danny Auble authored
-
Danny Auble authored
What this does is removes the join on the step table during rollups. It turns out this was a very costly join if the tables were fairly large. We can now just grab the data directly from the job_table which already contains the data in the tres_alloc column.
-
Danny Auble authored
at the end of a job.
-
Danny Auble authored
We use the last step in the allocation (works for everything but salloc) to get the data to the job. We update the job as steps finish and send a new tres_alloc_str to the database which will update the job record with the info. We could not use the job_comp message for anything but an salloc since there is not guarantee it will get there after the last step does.
-
Danny Auble authored
to report the end about of energy used by all steps. It is not to be used to evaluate limits at the moment. If so we would need to change this. As this wasn't a thing when this was written we felt this was the easiest way to get the total energy for the job.
-
Danny Auble authored
-
Danny Auble authored
-