Commits · 4490da133b7ab2f07a7d3992594cf1bd3b974c90 · Manuel G. Marciani / ces_slurm_simulator

01 Aug, 2017 3 commits

Fix strchr return value tests. · 4490da13

Dominik Bartkiewicz authored Aug 01, 2017

NULL is returned if the token is not found, testing against '\0'
is wrong (although does work okay in older compilers).

Fixes new GCC 7.1 warning.

4490da13

Merge branch 'slurm-17.02' · 4bf580be
Alejandro Sanchez authored Aug 01, 2017

4bf580be

Docs - mention the relevance of slurmd's "-b" option on ResumeProgram. · d6a57327

Alejandro Sanchez authored Aug 01, 2017

When the node isn't actually rebooted, the BootTime isn't updated and
Slurm doesn't consider that the node is returned to service, even
if slurmd is responding.

Bug 4039

d6a57327

31 Jul, 2017 1 commit
- Docment inconsistent behavior of GroupUpdateForce option. · 333932bc
  Tim Shaw authored Jul 31, 2017
```
This will be fixed before 17.11, but is being left as-is on 17.02.

Bug 3956.
```
  333932bc
28 Jul, 2017 6 commits

Partial revert of commit making it possible again to not have · 1f6555c7

Danny Auble authored Jul 28, 2017

to have 'socket=' in AuthInfo to work.

This is to make it so people don't have to update their slurmdbd.conf's
when upgrading (and to match documentation).

Continuation of last commit

Bug 4009

1f6555c7

Fix issue when an alternate munge key when communicating on a persistent · 591dc036
Danny Auble authored Jul 28, 2017
```
connection.

Bug 4009
```
591dc036
Merge branch 'slurm-17.02' · 65c78797
Morris Jette authored Jul 28, 2017

65c78797

jobcomp/elasticsearch - save state on REQUEST_CONTROL. · 8944b77a

Alejandro Sanchez authored Jul 28, 2017

jobcomp/elasticsearch saves/load the state to/from elasticsearch_state.  Since
the jobcomp API isn't designed with save/load state operations, the plugin
_save_state() isn't extern and not available from outside the plugin itself,
thus it is highly coupled to fini() function. This state doesn't follow the
same execution path as the rest of Slurm states, where in save_all_sate()
they are all independently scheduled. So we save it manually here on a RPC
of type REQUEST_CONTROL.

This enables that when the Primary ctld issues a REQUEST_CONTROL to the Backup
which is currently in controller mode, the Backup will save the state and when
the Primary assumes control again it can process the saved pending jobs.  The
other way around was already controlled, because when the Primary is running
in controller mode and the Backup issues a REQUEST_CONTROL, the Primary is
shutdown and when breaking the ctld main() function while(1) loop, there was
already a g_slurm_jobcomp_fini() call in place.

Bug 3908

8944b77a

Fix for uninitialized federation lock · eb963179
Dominik Bartkiewicz authored Jul 28, 2017

eb963179
Docs - add information about slurmd restart necessity after adding nodes · 71a8c6e7
Dominik Bartkiewicz authored Jul 28, 2017
```
Bug 3973.
```
71a8c6e7

27 Jul, 2017 4 commits

Minor improvement in ping_count logic · 086e1857
Morris Jette authored Jul 27, 2017
```
Prevent the possibility (never observed) of the ping count going
negative
```
086e1857
Merge branch 'slurm-17.02' · bfd8094e
Morris Jette authored Jul 27, 2017

bfd8094e

Fix bug when tracking multiple simultaneous spawned ping cycles · f7463ef5

Alejandro Sanchez authored Jul 27, 2017

When more than 1 ping cycle is spawned simultaneously (for instance
REQUEST_PING + REQUEST_NODE_REGISTRATION_STATUS for the selected nodes),
we do not track a separate ping_start time for each cycle. When ping_begin()
is called, the information about the previous ping cycle is lost. Then when
ping_end() is called for the first of the two cycles, we set ping_start=0,
which is incorrectly used to see if the last cycle ran for more than
PING_TIMEOUT seconds (100s), thus incorrectly triggering the:

error("Node ping apparently hung, many nodes may be DOWN or configured "
"SlurmdTimeout should be increased");

Bug 3914

f7463ef5

Docs - add note the UnkillableStepTimeout that node will be drained. · 04b431b4
Tim Shaw authored Jul 27, 2017
```
Bug 3941.
```
04b431b4

26 Jul, 2017 14 commits
- Fix issue where UnkillableStepProgram if step was in an ending state. · 9f48e07c
  Danny Auble authored Jul 26, 2017
  
  9f48e07c
- Fix bluegene to compile after commit ee84f5c1 . · ccf8f198
  Danny Auble authored Jul 26, 2017
  
  ccf8f198
- Remove unneeded switch_job (was set, but never used). Introduced in · 74f2e698
  Danny Auble authored Jul 26, 2017
```
44e2b1a9 and where the functionality was promptly removed a few days later
in 51076871.  Thanks to Artem Polyakov for pointing out the history :).
```
  74f2e698
- Add configuration parameters for daemons to write to syslog · 05ee90f1
  Isaac Hartung authored Jul 26, 2017
```
 -- Add slurm.conf configuration parameters SlurmctldSyslogDebug and
    SlurmdSyslogDebug to control which messages from the slurmctld and
    slurmd daemons get written to syslog.
 -- Add slurmdbd.conf configuration parameter DebugLevelSyslog to
    control which messages from the slurmdbd daemon get written to syslog.
bug 3933
```
  05ee90f1
- Fix nrt plugin to work with the new jobpack code introduced in commit · 2befb143
  Danny Auble authored Jul 26, 2017
```
ee84f5c1.
```
  2befb143
- Fix regression in commit 86ed603c. Code was still treating the variable · 86922baf
  Danny Auble authored Jul 26, 2017
```
like slurm_switch_ops_t instead of the new dynamic_plugin_data_t.

Bug 4025
```
  86922baf
- Fix 2 memory leaks · 8abd161e
  Morris Jette authored Jul 26, 2017
```
Reported by Coverity, CID 45140 and 45141
```
  8abd161e
- Cosmetic changed, no changes to logic · 3b1607e8
  Morris Jette authored Jul 26, 2017
  
  3b1607e8
- Correct sview output · d434b7f0
  Morris Jette authored Jul 26, 2017
```
pack_job_id was being reported as pack_job_offset
```
  d434b7f0
- Merge branch 'slurm-17.02' · e3686d5c
  Tim Wickberg authored Jul 25, 2017
  
  e3686d5c
- Fix minor memory leak if launch fails in the slurmstepd. · 558d7c1a
  Danny Auble authored Jul 24, 2017
  
  558d7c1a
- Continuation of e5c05549. · 7e5d3d15
  Danny Auble authored Jul 24, 2017
```
Get rid of any race conditions and call anything that was in
_pre_task_privileged from the parent instead of the child.

NOTE: This should be safe as we don't execute the task until after
_exec_wait_child_wait_for_parent is signaled which happens after all this is
long over.
```
  7e5d3d15
- If failing after switch_g_job_init happened make sure switch_g_job_fini is called. · 488c7c36
  Danny Auble authored Jul 24, 2017
```
Bug 3865
```
  488c7c36
- Fix regression in commit e5c05549 that would put the stepd pid into the... · f28b1a97
  Dominik Bartkiewicz authored Jul 05, 2017
```
Fix regression in commit e5c05549 that would put the stepd pid into the memory cgroup instead of the task's pid.

Beforehand this would put the result of getpid() into the cgroup.  Before
e5c05549 this was done in the child of the fork which would get you
the task's pid, but moving it to run in the parent broke this logic.

What this patch does is adds pid to the input parameters of
task_g_pre_launch_priv making it so we could use the correct pid.
```
  f28b1a97
25 Jul, 2017 4 commits
- Fix incorrect return code from xdaemon(). · 941cac12
  Tim Wickberg authored Jul 25, 2017
```
Regression in commit afeca4e2. Bug 4026.
```
  941cac12
- Merge branch 'slurm-17.02' · 36594230
  Morris Jette authored Jul 25, 2017
  
  36594230
- Add SLUG17 agenda and hotel information to web site · 54140205
  Morris Jette authored Jul 25, 2017
  
  54140205
- Fix typos on signall[ed|ing] to not have the erroneous double 'l' · 364ca76c
  Danny Auble authored Jul 25, 2017
  
  364ca76c
24 Jul, 2017 6 commits
- CRAY - Throttle step creation if trying to create too many steps at once. · f9f13a86
  Morris Jette authored Jul 24, 2017
  
  f9f13a86
- Add LogTimeFormat to be packed and displayed with scontrol show config · 628c88da
  Tim Shaw authored Jul 24, 2017
  
  628c88da
- Improve debug message to know which node no change was needed was. · 571b4b88
  Danny Auble authored Jul 24, 2017
  
  571b4b88
- Set Reason=dependency over Reason=JobArrayTaskLimit for pending jobs. · ad0b7c27
  Dominik Bartkiewicz authored Jul 05, 2017
```
Bug 3953
```
  ad0b7c27
- Better document how max_agent_queue works (in the code anyway). · 32056835
  Danny Auble authored Jul 24, 2017
  
  32056835
- Fix memory leak in slurmctld when agent queue to the DBD has filled up. · 6c7b9ba1
  Danny Auble authored Jul 24, 2017
```
Pretty much fix the entire purpose of this max_agent_queue.
```
  6c7b9ba1
21 Jul, 2017 2 commits
- Serialize updates from from the dbd to the slurmctld. · 24375cb8
  Danny Auble authored Jul 21, 2017
```
Bug 3159
```
  24375cb8
- Fixed truncation on scontrol show config output. · 5b1983d5
  Tim Shaw authored Jul 21, 2017
```
Bug 3956
```
  5b1983d5