Commit 33c48ac5 authored by Dorian Krause's avatar Dorian Krause Committed by Morris Jette
Browse files

Fix to GRES NoConsume logic

we came across the following error message in the slurmctld logs when
using non-consumable resources:

error: gres/potion: job 39 dealloc of node node1 bad node_offset 0 count
is 0

The error comes from _job_dealloc():

node_gres_data=0x7f8a18000b70, node_offset=0, gres_name=0x1999e00
"potion", job_id=46,
    node_name=0x1987ab0 "node1") at gres.c:3980
(job_gres_list=0x199b7c0, node_gres_list=0x199bc38, node_offset=0,
job_id=46,
    node_name=0x1987ab0 "node1") at gres.c:4190
job_ptr=0x19e9d50, pre_err=0x7f8a31353cb0 "_will_run_test", remove_all=true)
    at select_linear.c:2091
bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, max_share=1, req_nodes=1,
    preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40) at
select_linear.c:3176
bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2,
    preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40,
exc_core_bitmap=0x0) at select_linear.c:3390
bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2,
    preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40,
exc_core_bitmap=0x0) at node_select.c:588
avail_bitmap=0x7f8a2f910d38, min_nodes=1, max_nodes=1, req_nodes=1,
exc_core_bitmap=0x0)
    at backfill.c:367

The cause of this problem is that _node_state_dup() in gres.c does not
duplicate the no_consume flag.
The cr_ptr passed to _rm_job_from_nodes() is created with _dup_cr()
which calls _node_state_dup().

Below is a simple patch to fix the problem. A "future-proof" alternative
might be to memcpy() from gres_ptr to new_gres and
only handle pointers separately.
parent 408c108e
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment