- 06 Jun, 2018 3 commits
-
-
Morris Jette authored
Coverity CID 186420
-
Morris Jette authored
Coverity CID 186423
-
Morris Jette authored
Coverity CID 186424
-
- 05 Jun, 2018 3 commits
-
-
Morris Jette authored
Without change, slurmctld would abort running test38.1: slurmctld: error: /home/jette/Desktop/SLURM/slurm.git/src/slurmctld/job_mgr.c:11177: get_next_job_id(): Assertion (verify_lock(FED_LOCK, READ_LOCK)) failed. Aborted (core dumped)
-
Morris Jette authored
slurmctld would abort if started when slurmdbd down then later started. slurmctld: (node_scheduler.c:3182) job:14683 gres_req:NONE gres_alloc: slurmctld: (node_scheduler.c:2868) job:14683 gres:NONE gres_alloc: slurmctld: sched: Allocate JobID=14683 NodeList=nid00001 #CPUs=1 Partition=debug slurmctld: error: slurmdbd: Sending PersistInit msg: Connection refused slurmctld: error: slurmdbd: DBD_SEND_MULT_JOB_START failure: Connection refused slurmctld: error: /home/jette/Desktop/SLURM/slurm.git/src/slurmctld/controller.c:2456: set_cluster_tres(): Assertion (verify_lock(NODE_LOCK, WRITE_LOCK)) failed. ==29635== ==29635== Process terminating with default action of signal 6 (SIGABRT): dumping core ==29635== at 0x54980BB: raise (raise.c:51) ==29635== by 0x5499F5C: abort (abort.c:90) ==29635== by 0x4FCBFC0: __xassert_failed (xassert.c:57) ==29635== by 0x131FC5: set_cluster_tres (controller.c:2456) ==29635== by 0x1329B4: _assoc_cache_mgr (controller.c:3230) ==29635== by 0x52497FB: start_thread (pthread_create.c:465) ==29635== by 0x5575B5E: clone (clone.S:95)
-
Tim Wickberg authored
And get it sent all the way into the slurmstepd. Bug 3547.
-
- 04 Jun, 2018 22 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
No change in logic
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
I was seeing rare failures on the test due to timing issues. This increased timeout seems to fix the issue for me.
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
These calls to select_g_select_jobinfo_get() are a no-op on non-select/bluegene systems, so remove them. Since we're no longer modifying the job_desc through that call, remove the debug3 statements around this as well. The rest is removal of HAVE_BG blocks. Collapse the one else if that was broken across an ifndef block as well here.
-
Tim Wickberg authored
Remove the temporary variables and directly reference the structure values while here. Removes another select_g_select_jobinfo_get() block.
-
Tim Wickberg authored
Continue to remove select_g_select_jobinfo_get() calls.
-
Tim Wickberg authored
Continue removing select_g_alter_node_cnt() calls. The err_cpus return return for: select_g_select_nodeinfo_get(node_ptr->select_nodeinfo, SELECT_NODEDATA_SUBCNT, NODE_STATE_ERROR, &err_cpus); is always zero for non-select/bluegene plugins, so this whole function can be drastically simplified. Move the node_ptr->name check up front. Invert the remaining logic, which makes it clear any nodes in DRAIL/FAIL/DOWN are what will trigger the clusteracct_storage_g_node_down call.
-
Tim Wickberg authored
SELECT_GET_NODE_SCALING is always 1 for non-select/bluegene plugins. Pack it directly. On a future RPC layer this should be removed once the client commands have been updated as well.
-
Tim Wickberg authored
SELECT_APPLY_NODE_MAX_OFFSET is a no-op on non-bluegene plugins, so we can now just pack total_nodes directly. Similarly, update_part does not need to translate a min node count into a midplane count anymore.
-
Tim Wickberg authored
Just strip out all HAVE_BG code here to remove the select_g_alter_node_cnt() calls buried within.
-
Tim Wickberg authored
select_p_alter_node_cnt() is a no-op here for every non-bluegene select plugin.
-
Tim Wickberg authored
This is always zero on non-bluegene select plugins: select_g_select_nodeinfo_get(node_ptr->select_nodeinfo, SELECT_NODEDATA_SUBCNT, NODE_STATE_ERROR, &err_cpus); So all of this code can be collapsed down to three lines.
-
Tim Wickberg authored
Only used by select/bluegene, which is being removed. Still need to remove all calling paths into select_g_alter_node_cnt, so leave stubbed out for now.
-
- 02 Jun, 2018 4 commits
-
-
Brian Christiansen authored
srun would not return an exit code if a previous task exited before a latter task exited with a signal. If multiple tasks exit with a signal, srun returns the highest signal. Partially reverts commit 04b449e1 -- the setting of local_global_rc to NO_VAL as srun doesn't need to know whether it's been set or not anymore. srun always sets the signal if a task exited with a signal. Bug 5083
-
Brian Christiansen authored
-
Michael Hinton authored
-
Michael Hinton authored
NONE was not documented. Bug 5161
-
- 01 Jun, 2018 4 commits
-
-
Morris Jette authored
Change some of the "cpu" references to "core" references as appropriate. more places requiring change exist, but this is a start to the needed changes. Remove unused cpu_count argument from gres_plugin_job_test() Remove unused cpu_cnt parameter to gres_plugin_step_alloc()
-
Morris Jette authored
-
Morris Jette authored
Avoid left-over test input file
-
Morris Jette authored
Change "cons_res" references to "cons_tres"
-
- 31 May, 2018 4 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Tim Wickberg authored
-
Tim Wickberg authored
src/plugins/ and src/common/assoc_mgr.c are still left to convert.
-