- 06 May, 2016 1 commit
-
-
Marco Ehlert authored
I would like to mention a problem which seems to be a calculation bug of used_cores in slurm version 15.08.7 If a node is divided into 2 partitions using MaxCPUsPerNode like this slurm.conf configuration NodeName=n1 CPUs=20 PartitionName=cpu NodeName=n1 MaxCPUsPerNode=16 PartitionName=gpu NodeName=n1 MaxCPUsPerNode=4 I run into a strange scheduling situation. The situation occurs after a fresh restart of the slurmctld daemon. I start jobs one by one: case 1 systemctl restart slurmctld.service sbatch -n 16 -p cpu cpu.sh sbatch -n 1 -p gpu gpu.sh sbatch -n 1 -p gpu gpu.sh sbatch -n 1 -p gpu gpu.sh sbatch -n 1 -p gpu gpu.sh => Problem now: The gpu jobs are kept in PENDING state. This picture changes if I start the jobs this way case 2 systemctl restart slurmctld.service sbatch -n 1 -p gpu gpu.sh scancel <gpu job_id> sbatch -n 16 -p cpu cpu.sh sbatch -n 1 -p gpu gpu.sh sbatch -n 1 -p gpu gpu.sh sbatch -n 1 -p gpu gpu.sh sbatch -n 1 -p gpu gpu.sh and all jobs are running fine. By looking into the code I figured out a wrong calculation of 'used_cores' in function _allocate_sc() plugins/select/cons_res/job_test.c _allocate_sc(...) ... for (c = core_begin; c < core_end; c++) { i = (uint16_t) (c - core_begin) / cores_per_socket; if (bit_test(core_map, c)) { free_cores[i]++; free_core_count++; } else { used_cores[i]++; } if (part_core_map && bit_test(part_core_map, c)) used_cpu_array[i]++; This part of code seems to work only if the part_core_map exists for a partition or on a completly free node. But in case 1 there is no part_core_map for gpu created yet. Starting a gpu the core_map contains 4 cores left from the cpu job. Now all non free cores of the cpu partion are counted as used cores in the gpu partition and this condition will match in the next code parts free_cpu_count + used_cpu_count > job_ptr->part_ptr->max_cpus_per_node what is definitely wrong. As soon as a part_core_map appears, means a gpu job was started on a free node (case 2) then there is no problem at all. To get case 1 work I changed the above code to the following and all works fine: for (c = core_begin; c < core_end; c++) { i = (uint16_t) (c - core_begin) / cores_per_socket; if (bit_test(core_map, c)) { free_cores[i]++; free_core_count++; } else { if (part_core_map && bit_test(part_core_map, c)){ used_cpu_array[i]++; used_cores[i]++; } } I am not sure this code change is really good, but it fixes my problem.
-
- 05 May, 2016 3 commits
-
-
Morris Jette authored
-
Morris Jette authored
Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. bug 2698
-
Danny Auble authored
they are in a step.
-
- 04 May, 2016 3 commits
-
-
Tim Wickberg authored
1) step_ptr->step_layout has already been dereferenced plenty of times. 2) Can't possible have rpc_version >= MIN_PROTOCOL_VERSION and < 8, this code is dead.
-
Morris Jette authored
Issue the "node_reinit" command on all nodes identified in a single call to capmc. Only if that fails will individual nodes be restarted using multiple pthreads. This improves efficiency while retaining the ability to operate on individual nodes when some failure occurs. bug 2659
-
Danny Auble authored
-
- 03 May, 2016 5 commits
-
-
Danny Auble authored
-
Brian Christiansen authored
E.g. info, debug, etc.
-
Brian Christiansen authored
-
Tim Wickberg authored
-
Eric Martin authored
-
- 29 Apr, 2016 6 commits
-
-
Danny Auble authored
Backport of commit cca1616b from 16.05
-
Tim Wickberg authored
MCS plugin should not have been retroactively added to the 15.08 RPCs, and caused 'scontrol show config' from a 15.08 scontrol to a 16.05 slurmctld to fail.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Brian Christiansen authored
-
- 28 Apr, 2016 4 commits
-
-
Artem Polyakov authored
See bug 2672 for details
-
Tim Wickberg authored
-
Danny Auble authored
of Slurm.
-
Morris Jette authored
Use TaskPluginParam for default task binding if no user specified CPU binding. User --cpu_bind option takes precident over default. No longer any error if user --cpu_bind option does not match TaskPluginParam. bug 2655
-
- 27 Apr, 2016 4 commits
-
-
Tim Wickberg authored
Compiler errors out preventing these 13 from running without fixing the implied int type for main.
-
Tim Wickberg authored
Do not use AC_CONFIG_FILES as this may not expand all variables at config time. Loosely based on recommendations from http://www.gnu.org/software/autoconf/manual/autoconf.html#Makefile-Substitutions Run autogen.sh to pick up changes as well. Bug 2247/2298.
-
Morris Jette authored
Prior logic only supported ntasks_per_core bug 2655
-
Morris Jette authored
Avoid error message of "Requested cpu_bind option requires entire node to be allocated; disabling affinity" being generated in some cases where task/affinity and task/cgroup plugins used together.
-
- 26 Apr, 2016 4 commits
-
-
Danny Auble authored
restart of the slurmctld.
-
Sam Gallop authored
Otherwise miscalculated limit will lead to job cancellation even when well inside the allocated amount. Bug 2660.
-
Brian Christiansen authored
Bug 2386
-
Brian Christiansen authored
Bug 2386
-
- 25 Apr, 2016 1 commit
-
-
Tim Wickberg authored
Also remove remove misleading note "Unless PreemptType=preempt/partition_prio the partition Priority is not critical"; it does still impact scheduling when nodes overlap partitions.
-
- 23 Apr, 2016 1 commit
-
-
Tim Wickberg authored
in the slurmdbd segfaulting. Bug 2656
-
- 21 Apr, 2016 2 commits
-
-
Brian Christiansen authored
-
Morris Jette authored
burst_buffer/cray - Don't call Datawarp "paths" function if script includes only create or destroy of persistent burst buffer. Some versions of Datawarp software return an error for such scripts, causing the job to be held. bug 2624
-
- 20 Apr, 2016 3 commits
-
-
Morris Jette authored
burst_buffer/cray - Don't call Datawarp "paths" function if script includes only create or destroy of persistent burst buffer. Some versions of Datawarp software return an error for such scripts, causing the job to be held. bug 2624
-
Janne Blomqvist authored
I noticed that the CpuFreqDef config option was only partially implemented. The value was parsed, but the never used. So I took the liberty of re-purposing it to mean sort of the opposite, namely the frequency governor to use when running a job step in case the job doesn't explicitly provide any --cpu-freq option. I also changed the default of the CpuFreqGovernors option to be "ondemand,performance", since ondemand isn't available with the intel_pstate driver. Otherwise the patch should be relatively straightforward and only changes a few minor things here and there.
-
Tim Wickberg authored
-
- 15 Apr, 2016 1 commit
-
-
Morris Jette authored
Add TopologyParam option of "TopoOptional" to optimize network topology only for jobs requesting it. bug 2567
-
- 14 Apr, 2016 2 commits
-
-
Tim Wickberg authored
Timeout stalled transfers and cleanup related data structures. Default to wait five minutes since last update. Hook onto registration/ping message type to trigger cleanup in a minimally invasive manner. While here restructure certain functions to use list_* functions rather than iterate on the structures.
-
Tim Wickberg authored
Otherwise --mail-type=ALL will send an unexpected stage_out message back to the user. Bug 2541.
-