- 24 Jan, 2017 4 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
Could be used in bit_ffs and bit_fls functions rather than existing for loops.
-
Tim Wickberg authored
-
Tim Wickberg authored
-
- 23 Jan, 2017 15 commits
-
-
Tim Wickberg authored
-
Danny Auble authored
Bug 1599
-
Danny Auble authored
-
Morris Jette authored
Add new knl.conf parameter to the capmc_suspend and capmc_resume programs. They are not used by those programs, but we need to prevent an error if those new parameters are used.
-
Morris Jette authored
-
Morris Jette authored
Reset a job's memory limit based upon what's available after node reboot, which can change on a KNL if the MCDRAM mode is changes on reboot
-
Morris Jette authored
This bug was likely the root cause of bug 3366. If the backfill scheduler allocates resources for a batch job and a node reboot is required, the batch launch RPC would be sent to the agent. At that point, there is a race condition between the agent and the job_time_limit() function testing for boot completion. If the job_time_limit() function ran first, it would trigger a second launch RPC request getting sent to the agent. bug 3366
-
Morris Jette authored
Clean up logic to test if job is configuring bug 3366
-
Morris Jette authored
Do not launch a batch step while the job is configuring. Previous logic checked for the PrologSlurmctld running, but not nodes booting. Checking the job's CONFIGURING state flag will validate both. bug 3366
-
Morris Jette authored
Add check to avoid step allocation logic from executing job configuration completion logic multiple times (check if job is configurating before clearing flag and resetting time limit). bug 3366
-
Brian Christiansen authored
-
Morris Jette authored
slurmctld/agent race condition fix: Prevent job launch while PrologSlurmctld daemon is running or node boot in progress. bug 3366
-
Morris Jette authored
This is required to manage the configuration completion. bug 3366
-
Morris Jette authored
This will be required to lock the job structure bug 3366
-
Morris Jette authored
Remove the return value from the agent_retry() function. It is not used anywhere and needs to be removed to run as a pthread. bug 3366
-
- 21 Jan, 2017 3 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
Reasonable NFS systems do not need a minute to propagate changes.
-
- 20 Jan, 2017 18 commits
-
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
In favor of just using the -a option to show the tracking federated jobs. This allows scontrol -a show jobs to show the tracking jobs as well.
-
Brian Christiansen authored
-
Brian Christiansen authored
to indicate wheter the job was requeue held or not. This enables the federation to trigger off whether the job was requeue held or not.
-
Brian Christiansen authored
So that the origin job tell a remote cluster to cancel the job but mark the job as requeued in the database. See note about the KILL_* flags actually using 12bits instead of noted 8bits.
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
Follows pattern from c5ace562
-
Brian Christiansen authored
-