-
Dorian Krause authored
we came across the following error message in the slurmctld logs when using non-consumable resources: error: gres/potion: job 39 dealloc of node node1 bad node_offset 0 count is 0 The error comes from _job_dealloc(): node_gres_data=0x7f8a18000b70, node_offset=0, gres_name=0x1999e00 "potion", job_id=46, node_name=0x1987ab0 "node1") at gres.c:3980 (job_gres_list=0x199b7c0, node_gres_list=0x199bc38, node_offset=0, job_id=46, node_name=0x1987ab0 "node1") at gres.c:4190 job_ptr=0x19e9d50, pre_err=0x7f8a31353cb0 "_will_run_test", remove_all=true) at select_linear.c:2091 bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, max_share=1, req_nodes=1, preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40) at select_linear.c:3176 bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2, preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40, exc_core_bitmap=0x0) at select_linear.c:3390 bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2, preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40, exc_core_bitmap=0x0) at node_select.c:588 avail_bitmap=0x7f8a2f910d38, min_nodes=1, max_nodes=1, req_nodes=1, exc_core_bitmap=0x0) at backfill.c:367 The cause of this problem is that _node_state_dup() in gres.c does not duplicate the no_consume flag. The cr_ptr passed to _rm_job_from_nodes() is created with _dup_cr() which calls _node_state_dup(). Below is a simple patch to fix the problem. A "future-proof" alternative might be to memcpy() from gres_ptr to new_gres and only handle pointers separately.
33c48ac5
To find the state of this project's repository at the time of any of these versions, check out the tags.