- 06 Jun, 2018 14 commits
-
-
Morris Jette authored
Coverity CID 186480
-
Morris Jette authored
Coverity CID 186420
-
Morris Jette authored
bug was introduced in commit 074ef9a7 bug could result in infinite loop given some (many) job scripts
-
Morris Jette authored
burst_buffer.conf - Add SetExecHost flag to enable burst buffer access from the login node for interactive jobs.
-
Morris Jette authored
Coverity CID 186419 and 196422
-
Alejandro Sanchez authored
And remove the initialization before all the calls to the function. It is non-functional and the motivation is more a preventive thing so that if we ever use slurm_mktime() we know tm_isdst is consistently set to -1. Bug 5230.
-
Dominik Bartkiewicz authored
Bug 4887.
-
Dominik Bartkiewicz authored
Disable setting triggers from non-root/slurm_user by default. Bug 4887.
-
Dominik Bartkiewicz authored
-
Brian Christiansen authored
-
Morris Jette authored
Coverity CID 186420
-
Morris Jette authored
Coverity CID 186423
-
Brian Christiansen authored
which were marked down due to ResumeTimeout. If a cloud node was marked down due to not responding by ResumeTimeout, the code inadvertently added the node back to the avail_node_bitmap -- after being cleared by set_node_down_ptr(). The scheduler would then attempt to allocate the node again, which would cause a loop of hitting ResumeTimeout and allocating the downed node again. Bug 5264
-
Morris Jette authored
Coverity CID 186424
-
- 05 Jun, 2018 5 commits
-
-
Morris Jette authored
Without change, slurmctld would abort running test38.1: slurmctld: error: /home/jette/Desktop/SLURM/slurm.git/src/slurmctld/job_mgr.c:11177: get_next_job_id(): Assertion (verify_lock(FED_LOCK, READ_LOCK)) failed. Aborted (core dumped)
-
Morris Jette authored
slurmctld would abort if started when slurmdbd down then later started. slurmctld: (node_scheduler.c:3182) job:14683 gres_req:NONE gres_alloc: slurmctld: (node_scheduler.c:2868) job:14683 gres:NONE gres_alloc: slurmctld: sched: Allocate JobID=14683 NodeList=nid00001 #CPUs=1 Partition=debug slurmctld: error: slurmdbd: Sending PersistInit msg: Connection refused slurmctld: error: slurmdbd: DBD_SEND_MULT_JOB_START failure: Connection refused slurmctld: error: /home/jette/Desktop/SLURM/slurm.git/src/slurmctld/controller.c:2456: set_cluster_tres(): Assertion (verify_lock(NODE_LOCK, WRITE_LOCK)) failed. ==29635== ==29635== Process terminating with default action of signal 6 (SIGABRT): dumping core ==29635== at 0x54980BB: raise (raise.c:51) ==29635== by 0x5499F5C: abort (abort.c:90) ==29635== by 0x4FCBFC0: __xassert_failed (xassert.c:57) ==29635== by 0x131FC5: set_cluster_tres (controller.c:2456) ==29635== by 0x1329B4: _assoc_cache_mgr (controller.c:3230) ==29635== by 0x52497FB: start_thread (pthread_create.c:465) ==29635== by 0x5575B5E: clone (clone.S:95)
-
Tim Wickberg authored
Bug 5180.
-
Killian authored
Bug 5206.
-
Tim Wickberg authored
And get it sent all the way into the slurmstepd. Bug 3547.
-
- 04 Jun, 2018 21 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
No change in logic
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
I was seeing rare failures on the test due to timing issues. This increased timeout seems to fix the issue for me.
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
These calls to select_g_select_jobinfo_get() are a no-op on non-select/bluegene systems, so remove them. Since we're no longer modifying the job_desc through that call, remove the debug3 statements around this as well. The rest is removal of HAVE_BG blocks. Collapse the one else if that was broken across an ifndef block as well here.
-
Tim Wickberg authored
Remove the temporary variables and directly reference the structure values while here. Removes another select_g_select_jobinfo_get() block.
-
Tim Wickberg authored
Continue to remove select_g_select_jobinfo_get() calls.
-
Tim Wickberg authored
Continue removing select_g_alter_node_cnt() calls. The err_cpus return return for: select_g_select_nodeinfo_get(node_ptr->select_nodeinfo, SELECT_NODEDATA_SUBCNT, NODE_STATE_ERROR, &err_cpus); is always zero for non-select/bluegene plugins, so this whole function can be drastically simplified. Move the node_ptr->name check up front. Invert the remaining logic, which makes it clear any nodes in DRAIL/FAIL/DOWN are what will trigger the clusteracct_storage_g_node_down call.
-
Tim Wickberg authored
SELECT_GET_NODE_SCALING is always 1 for non-select/bluegene plugins. Pack it directly. On a future RPC layer this should be removed once the client commands have been updated as well.
-
Tim Wickberg authored
SELECT_APPLY_NODE_MAX_OFFSET is a no-op on non-bluegene plugins, so we can now just pack total_nodes directly. Similarly, update_part does not need to translate a min node count into a midplane count anymore.
-
Tim Wickberg authored
Just strip out all HAVE_BG code here to remove the select_g_alter_node_cnt() calls buried within.
-
Tim Wickberg authored
select_p_alter_node_cnt() is a no-op here for every non-bluegene select plugin.
-
Tim Wickberg authored
This is always zero on non-bluegene select plugins: select_g_select_nodeinfo_get(node_ptr->select_nodeinfo, SELECT_NODEDATA_SUBCNT, NODE_STATE_ERROR, &err_cpus); So all of this code can be collapsed down to three lines.
-