- 07 Mar, 2016 1 commit
-
-
Dominik Bartkiewicz authored
Added new job dependency type of "aftercorr" which will start a task of a job array after the corresponding task of another job array completes. bug 2460
-
- 05 Mar, 2016 7 commits
-
-
Morris Jette authored
Fix some timing issues with respect to rebooting a node, especailly KNL node needing reboot to change configuration.
-
Danny Auble authored
would only track gres/gpu, now it will track both gres/gpu and gres/gpu:tesla as separate gres if configured like AccountingStorageTRES=gres/gpu,gres/gpu:tesla
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
--gres=gpu:tesla before you needed to give a count --gres=gpu:tesla:1 now both should work.
-
- 04 Mar, 2016 9 commits
-
-
Danny Auble authored
-
Danny Auble authored
Step GRES value changed from type "int" to "int64_t" to support larger values. Signed-off-by: Danny Auble <da@schedmd.com>
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
These changes apply to both the main scheduling logic and backfill scheduler. If some SchedulerParameters value was configured, the slurmctld started, then completely removed, and slurmctld reconfigured the value would not be reset to it's default value but the originally configured value would persist until slurmctld restarted.
-
Brian Christiansen authored
Continuation of 31225a82
-
Morris Jette authored
Harden code to not fail if node_bitmap passed to _update_node_gres() has no bits set.
-
Brian Christiansen authored
-
Brian Christiansen authored
Bug 2430
-
- 03 Mar, 2016 14 commits
-
-
Morris Jette authored
This may be helpful for timing purposes. Added by Cray request.
-
Morris Jette authored
Unless a job is running in --multi-prog mode, modify the logic to resolve the job's path once rather than once for each task. This may slightly improve performance (requested by Cray).
-
Danny Auble authored
of it's very close version.
-
Thomas Hamel authored
We want to introduce a new behavior in the way slurmd uses the HealthCheckProgram. The idea is to avoid a race condition between the first HealthCheckProgram run and the node accepting jobs. The slurmd daemon will initialize and then loop on HealthCheckProgram execution before registering with slurmctld. It will stay in this loop until the HealthCheckProgram returns successfully (the node is still DOWN). On our clusters we are using NHC as an HealthCheckProgram. NHC drains the node if it fails and remove the drain if it is successfull, this behavior fits well with our purpose. This behavior permits us to start slurmd at boot without setting up a complex boot sequence in the init system, slurmd just wait for the node to be ready before registering. The HealthCheckProgram is not run during slurmd startup if HealthCheckInteval is 0.
-
Danny Auble authored
-
Danny Auble authored
-
Brian Christiansen authored
Bug 2507
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Step GRES value changed from type "int" to "int64_t" to support larger values. Previous logic could fail in step allocation values over 32-bits. Other GRES values are 64-bit.
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Danny Auble authored
slurmstepd to close potential open ones. It was pointed out the slurmd using acct_gather_energy/ipmi links to freeipmi which could possibly open /dev/ipmi0 without the close on exec flag set as root while launching a step leaving it open in the users app. What this does is sets the flag on the first 256 to mitigate the concern. Reported by Maksym Planeta. Bug 2506
-
Morris Jette authored
This enables the node_feature plugin to add GRES to nodes. Specifically it is intended for the node_feature/knl_cray plugin to build a GRES containing the MCDRAM size currently configured on the node. More work is needed for full functionality.
-
- 02 Mar, 2016 9 commits
-
-
Morris Jette authored
Make sure that capmc_suspend and capmc_resume are properly packaged in an RPM if a non-standard sbin location is configured
-
Danny Auble authored
bsub batch script or not. If it isn't we will wrap the script to avoid issues where $0 is used inside the script.
-
Morris Jette authored
Conflicts: src/plugins/sched/backfill/backfill.c
-
Gary B Skouson authored
Previous logic tested whatever the job's partition pointer indicated rather than the partition we are trying to run the job in. This bug was introduced in Slurm version 15.08.5, Nov 16, 2015, commit 94f0e948 bug 2499
-
Danny Auble authored
patch 2d5066e7
-
Morris Jette authored
Add a new function that can read power save configuration information before starting the power save thread. This lets us confirm that power save mode is configured to run earlier in the slurmctld start up logic and report an error at at earlier point if power save is not configured to run, but node_feature/knl_cray (which needs it) is configured.
-
Tim Wickberg authored
-
Thomas Cadeau authored
Introduced in c97e08a0 Change default CgroupMountpoint (in cgroup.conf) from "/cgroup" to "/sys/fs/cgroup" to match current standard. For details, see https://wiki.freedesktop.org/www/Software/systemd/PaxControlGroups/
-
Tim Wickberg authored
-