- 07 Jun, 2018 24 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
Unused, but missed during prior clean up since it is split across files.
-
Tim Wickberg authored
Both are always one. (On non-BG.)
-
Tim Wickberg authored
-
Tim Wickberg authored
Always one (non-BG), so just change these to the increment operator.
-
Tim Wickberg authored
-
Tim Wickberg authored
SELECT_NODEDATA_SUBGRP_SIZE is always zero on non-BG, so all calling paths into this can be removed.
-
Tim Wickberg authored
-
Tim Wickberg authored
Removing the CLUSTER_FLAG_BG block forces some code up a tabspot, making this look a bit uglier than indended. Also, remove the error_cpus values, as those only applied to BG.
-
Tim Wickberg authored
NODE_STATE_ERROR is only valid on a BlueGene, so make this a no-op for now.
-
Tim Wickberg authored
As done previously, this can be removed as it always returns zero: select_g_select_nodeinfo_get(node_ptr->select_nodeinfo, SELECT_NODEDATA_SUBCNT, NODE_STATE_ERROR, &err_cpus); Simplify the resulting code, and remove CLUSTER_FLAG_BG code as well.
-
Tim Wickberg authored
No functional changes, but do cleanup some style in these updated blocks while copying.
-
Tim Wickberg authored
No changes yet.
-
Tim Wickberg authored
Helps avoid pointer soup like &((*msg)->node_scaling). No functional change.
-
Tim Wickberg authored
Mention in RELEASE_NOTES in the appropriate sections. Tidy up function declarations and formatting while here.
-
Tim Wickberg authored
This is always 0 on non-select/bluegene. Remove other BlueGene-specific output from this block while here. This is all leading up to removal of the node_scaling concept from Slurm's internals.
-
Tim Wickberg authored
-
Tim Wickberg authored
Make it easier to use with designated initializers in the future.
-
Brian Christiansen authored
If an array task exited with a RequeueExit* code, the array job wasn't being marked as requeued. Bug 5105
-
Isaac Hartung authored
if any task in the array was requeued. This is a hint to use "sacct --duplicates" to see the whole picture of the array job. Bug 5105
-
Michael Hinton authored
Bug 5163
-
Michael Hinton authored
Bug 5163
-
- 06 Jun, 2018 14 commits
-
-
Morris Jette authored
Coverity CID 186480
-
Morris Jette authored
Coverity CID 186420
-
Morris Jette authored
bug was introduced in commit 074ef9a7 bug could result in infinite loop given some (many) job scripts
-
Morris Jette authored
burst_buffer.conf - Add SetExecHost flag to enable burst buffer access from the login node for interactive jobs.
-
Morris Jette authored
Coverity CID 186419 and 196422
-
Alejandro Sanchez authored
And remove the initialization before all the calls to the function. It is non-functional and the motivation is more a preventive thing so that if we ever use slurm_mktime() we know tm_isdst is consistently set to -1. Bug 5230.
-
Dominik Bartkiewicz authored
Bug 4887.
-
Dominik Bartkiewicz authored
Disable setting triggers from non-root/slurm_user by default. Bug 4887.
-
Dominik Bartkiewicz authored
-
Brian Christiansen authored
-
Morris Jette authored
Coverity CID 186420
-
Morris Jette authored
Coverity CID 186423
-
Brian Christiansen authored
which were marked down due to ResumeTimeout. If a cloud node was marked down due to not responding by ResumeTimeout, the code inadvertently added the node back to the avail_node_bitmap -- after being cleared by set_node_down_ptr(). The scheduler would then attempt to allocate the node again, which would cause a loop of hitting ResumeTimeout and allocating the downed node again. Bug 5264
-
Morris Jette authored
Coverity CID 186424
-
- 05 Jun, 2018 2 commits
-
-
Morris Jette authored
Without change, slurmctld would abort running test38.1: slurmctld: error: /home/jette/Desktop/SLURM/slurm.git/src/slurmctld/job_mgr.c:11177: get_next_job_id(): Assertion (verify_lock(FED_LOCK, READ_LOCK)) failed. Aborted (core dumped)
-
Morris Jette authored
slurmctld would abort if started when slurmdbd down then later started. slurmctld: (node_scheduler.c:3182) job:14683 gres_req:NONE gres_alloc: slurmctld: (node_scheduler.c:2868) job:14683 gres:NONE gres_alloc: slurmctld: sched: Allocate JobID=14683 NodeList=nid00001 #CPUs=1 Partition=debug slurmctld: error: slurmdbd: Sending PersistInit msg: Connection refused slurmctld: error: slurmdbd: DBD_SEND_MULT_JOB_START failure: Connection refused slurmctld: error: /home/jette/Desktop/SLURM/slurm.git/src/slurmctld/controller.c:2456: set_cluster_tres(): Assertion (verify_lock(NODE_LOCK, WRITE_LOCK)) failed. ==29635== ==29635== Process terminating with default action of signal 6 (SIGABRT): dumping core ==29635== at 0x54980BB: raise (raise.c:51) ==29635== by 0x5499F5C: abort (abort.c:90) ==29635== by 0x4FCBFC0: __xassert_failed (xassert.c:57) ==29635== by 0x131FC5: set_cluster_tres (controller.c:2456) ==29635== by 0x1329B4: _assoc_cache_mgr (controller.c:3230) ==29635== by 0x52497FB: start_thread (pthread_create.c:465) ==29635== by 0x5575B5E: clone (clone.S:95)
-