- 01 Aug, 2017 3 commits
-
-
Dominik Bartkiewicz authored
NULL is returned if the token is not found, testing against '\0' is wrong (although does work okay in older compilers). Fixes new GCC 7.1 warning.
-
Alejandro Sanchez authored
-
Alejandro Sanchez authored
When the node isn't actually rebooted, the BootTime isn't updated and Slurm doesn't consider that the node is returned to service, even if slurmd is responding. Bug 4039
-
- 31 Jul, 2017 1 commit
-
-
Tim Shaw authored
This will be fixed before 17.11, but is being left as-is on 17.02. Bug 3956.
-
- 28 Jul, 2017 6 commits
-
-
Danny Auble authored
to have 'socket=' in AuthInfo to work. This is to make it so people don't have to update their slurmdbd.conf's when upgrading (and to match documentation). Continuation of last commit Bug 4009
-
Danny Auble authored
connection. Bug 4009
-
Morris Jette authored
-
Alejandro Sanchez authored
jobcomp/elasticsearch saves/load the state to/from elasticsearch_state. Since the jobcomp API isn't designed with save/load state operations, the plugin _save_state() isn't extern and not available from outside the plugin itself, thus it is highly coupled to fini() function. This state doesn't follow the same execution path as the rest of Slurm states, where in save_all_sate() they are all independently scheduled. So we save it manually here on a RPC of type REQUEST_CONTROL. This enables that when the Primary ctld issues a REQUEST_CONTROL to the Backup which is currently in controller mode, the Backup will save the state and when the Primary assumes control again it can process the saved pending jobs. The other way around was already controlled, because when the Primary is running in controller mode and the Backup issues a REQUEST_CONTROL, the Primary is shutdown and when breaking the ctld main() function while(1) loop, there was already a g_slurm_jobcomp_fini() call in place. Bug 3908
-
Dominik Bartkiewicz authored
-
Dominik Bartkiewicz authored
Bug 3973.
-
- 27 Jul, 2017 4 commits
-
-
Morris Jette authored
Prevent the possibility (never observed) of the ping count going negative
-
Morris Jette authored
-
Alejandro Sanchez authored
When more than 1 ping cycle is spawned simultaneously (for instance REQUEST_PING + REQUEST_NODE_REGISTRATION_STATUS for the selected nodes), we do not track a separate ping_start time for each cycle. When ping_begin() is called, the information about the previous ping cycle is lost. Then when ping_end() is called for the first of the two cycles, we set ping_start=0, which is incorrectly used to see if the last cycle ran for more than PING_TIMEOUT seconds (100s), thus incorrectly triggering the: error("Node ping apparently hung, many nodes may be DOWN or configured " "SlurmdTimeout should be increased"); Bug 3914
-
Tim Shaw authored
Bug 3941.
-
- 26 Jul, 2017 14 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
44e2b1a9 and where the functionality was promptly removed a few days later in 51076871. Thanks to Artem Polyakov for pointing out the history :).
-
Isaac Hartung authored
-- Add slurm.conf configuration parameters SlurmctldSyslogDebug and SlurmdSyslogDebug to control which messages from the slurmctld and slurmd daemons get written to syslog. -- Add slurmdbd.conf configuration parameter DebugLevelSyslog to control which messages from the slurmdbd daemon get written to syslog. bug 3933
-
Danny Auble authored
ee84f5c1.
-
Danny Auble authored
like slurm_switch_ops_t instead of the new dynamic_plugin_data_t. Bug 4025
-
Morris Jette authored
Reported by Coverity, CID 45140 and 45141
-
Morris Jette authored
-
Morris Jette authored
pack_job_id was being reported as pack_job_offset
-
Tim Wickberg authored
-
Danny Auble authored
-
Danny Auble authored
Get rid of any race conditions and call anything that was in _pre_task_privileged from the parent instead of the child. NOTE: This should be safe as we don't execute the task until after _exec_wait_child_wait_for_parent is signaled which happens after all this is long over.
-
Danny Auble authored
Bug 3865
-
Dominik Bartkiewicz authored
Fix regression in commit e5c05549 that would put the stepd pid into the memory cgroup instead of the task's pid. Beforehand this would put the result of getpid() into the cgroup. Before e5c05549 this was done in the child of the fork which would get you the task's pid, but moving it to run in the parent broke this logic. What this patch does is adds pid to the input parameters of task_g_pre_launch_priv making it so we could use the correct pid.
-
- 25 Jul, 2017 4 commits
-
-
Tim Wickberg authored
Regression in commit afeca4e2. Bug 4026.
-
Morris Jette authored
-
Morris Jette authored
-
Danny Auble authored
-
- 24 Jul, 2017 6 commits
-
-
Morris Jette authored
-
Tim Shaw authored
-
Danny Auble authored
-
Dominik Bartkiewicz authored
Bug 3953
-
Danny Auble authored
-
Danny Auble authored
Pretty much fix the entire purpose of this max_agent_queue.
-
- 21 Jul, 2017 2 commits
-
-
Danny Auble authored
Bug 3159
-
Tim Shaw authored
Bug 3956
-