Commits · 605d7e1f3075c52dbbf5703e50d212566e49e7f6 · Manuel G. Marciani / ces_slurm_simulator

11 Aug, 2017 1 commit
- Fix incorrect lock levels when creating or updating a reservation · 605d7e1f
  Dominik Bartkiewicz authored Aug 11, 2017
  
  605d7e1f
10 Aug, 2017 1 commit
- Add fake syscfg file to test output from the Dell flavor. · 356b3272
  Danny Auble authored Aug 10, 2017
  
  356b3272
08 Aug, 2017 1 commit
- Docs - update link to Linux coding style docs. · b5b80b87
  Tim Wickberg authored Aug 08, 2017
  
  b5b80b87
07 Aug, 2017 4 commits
- Docs - fix typo. · 9a940c8d
  Jason Travis authored Aug 07, 2017
```
Bug 4057.
```
  9a940c8d
- Include sysmacros.h when required for major() and minor(). · 35b505cc
  Justin Lecher authored Aug 07, 2017
```
Starting from glibc-2.25 the macros major and minor are only available
from sys/sysmacros.h. This patch uses an autoconf macro to detect the
location and includes the header accordingly.

Bug 3982.
```
  35b505cc
- Make it so the cray/switch plugin grabs new DebugFlags on a reconfigure. · d30f79d1
  Danny Auble authored Aug 07, 2017
  
  d30f79d1
- Close race condition on Slurm structures when setting DebugFlags. · 13b78dd2
  Dominik Bartkiewicz authored Aug 07, 2017
```
Bug 4019
```
  13b78dd2
04 Aug, 2017 6 commits
- Correct buffer size used in determining specialized cores to avoid possible · 75bb7c40
  Morris Jette authored Aug 04, 2017
```
truncation of core specification and not reserving the specified cores.

Fixes Coverity CID 45174 and 45175

Bug 4053
```
  75bb7c40
- NEWS comment for last commit. · bf4ac7ee
  Danny Auble authored Aug 04, 2017
  
  bf4ac7ee
- Fix issue with pmi[2|x] when TreeWidth=1. This will very likely never · b72096ac
  Artem Polyakov authored Aug 04, 2017
```
matter in production, but in testing it can.

Bug 4051
```
  b72096ac
- Sort TRES id's on limits when getting them from the database. · 7e55acf7
  Danny Auble authored Aug 04, 2017
  
  7e55acf7
- Continuation of last commit. · 5c2a74a5
  Marshall Garey authored Aug 04, 2017
```
Fix mysql plugin to correctly return parent limits for all children.

Bug 4050
```
  5c2a74a5
- Fix inherited association 'max' TRES limits combining multiple limits in · ab24f8b4
  Danny Auble authored Aug 04, 2017
```
the tree.

Bug 4050
```
  ab24f8b4
02 Aug, 2017 4 commits

Fix starting ctld w/out existing StateSaveLocation · ec78d45a

Marshall Garey authored Aug 02, 2017

Would fail when trying to create the clustername file because the
StateSaveLocation path didn't exist yet.

Bug 3988

ec78d45a

Update contributors list · f9db5758
Brian Christiansen authored Aug 02, 2017

f9db5758

Fix srun jobs to run in high prio partition · 948de46b

Marshall Garey authored Aug 02, 2017

srun jobs that could start immediately and requested multiple partitions
didn't run in the highest priority partition if the highest priority
partition wasn't listed first.

It's possible that the scontrol show jobs will show the partition list
in priority order now that the job's partition list gets sorted by
priority.

Bug 4015

948de46b

Fix strchr return value tests. · a5630a9b

Dominik Bartkiewicz authored Aug 01, 2017

NULL is returned if the token is not found, testing against '\0'
is wrong (although does work okay in older compilers).

Fixes new GCC 7.1 warning.

a5630a9b

01 Aug, 2017 4 commits
- Increase buffer to handle long /proc//stat output · 9f3b04c0
  Tim Shaw authored Aug 01, 2017
```
Bug 3999
```
  9f3b04c0
- Fix GRES selection with CPU binding · e94fdf2e
  Dominik Bartkiewicz authored Aug 01, 2017
```
Fix bug in selection of GRES bound to specific CPUs where the GRES count
    is 2 or more. Previous logic could allocate CPUs not available to the job.

bug 4029
```
  e94fdf2e
- Minor change to gres.conf man page · 6dea58ce
  Morris Jette authored Aug 01, 2017
```
Highlight the need to use Slurm abstract CPU ID
```
  6dea58ce
- Docs - mention the relevance of slurmd's "-b" option on ResumeProgram. · d6a57327
  Alejandro Sanchez authored Aug 01, 2017
```
When the node isn't actually rebooted, the BootTime isn't updated and
Slurm doesn't consider that the node is returned to service, even
if slurmd is responding.

Bug 4039
```
  d6a57327
31 Jul, 2017 1 commit
- Docment inconsistent behavior of GroupUpdateForce option. · 333932bc
  Tim Shaw authored Jul 31, 2017
```
This will be fixed before 17.11, but is being left as-is on 17.02.

Bug 3956.
```
  333932bc
28 Jul, 2017 5 commits

Partial revert of commit making it possible again to not have · 1f6555c7

Danny Auble authored Jul 28, 2017

to have 'socket=' in AuthInfo to work.

This is to make it so people don't have to update their slurmdbd.conf's
when upgrading (and to match documentation).

Continuation of last commit

Bug 4009

1f6555c7

Fix issue when an alternate munge key when communicating on a persistent · 591dc036
Danny Auble authored Jul 28, 2017
```
connection.

Bug 4009
```
591dc036

jobcomp/elasticsearch - save state on REQUEST_CONTROL. · 8944b77a

Alejandro Sanchez authored Jul 28, 2017

jobcomp/elasticsearch saves/load the state to/from elasticsearch_state.  Since
the jobcomp API isn't designed with save/load state operations, the plugin
_save_state() isn't extern and not available from outside the plugin itself,
thus it is highly coupled to fini() function. This state doesn't follow the
same execution path as the rest of Slurm states, where in save_all_sate()
they are all independently scheduled. So we save it manually here on a RPC
of type REQUEST_CONTROL.

This enables that when the Primary ctld issues a REQUEST_CONTROL to the Backup
which is currently in controller mode, the Backup will save the state and when
the Primary assumes control again it can process the saved pending jobs.  The
other way around was already controlled, because when the Primary is running
in controller mode and the Backup issues a REQUEST_CONTROL, the Primary is
shutdown and when breaking the ctld main() function while(1) loop, there was
already a g_slurm_jobcomp_fini() call in place.

Bug 3908

8944b77a

Fix for uninitialized federation lock · eb963179
Dominik Bartkiewicz authored Jul 28, 2017

eb963179
Docs - add information about slurmd restart necessity after adding nodes · 71a8c6e7
Dominik Bartkiewicz authored Jul 28, 2017
```
Bug 3973.
```
71a8c6e7

27 Jul, 2017 2 commits

Fix bug when tracking multiple simultaneous spawned ping cycles · f7463ef5

Alejandro Sanchez authored Jul 27, 2017

When more than 1 ping cycle is spawned simultaneously (for instance
REQUEST_PING + REQUEST_NODE_REGISTRATION_STATUS for the selected nodes),
we do not track a separate ping_start time for each cycle. When ping_begin()
is called, the information about the previous ping cycle is lost. Then when
ping_end() is called for the first of the two cycles, we set ping_start=0,
which is incorrectly used to see if the last cycle ran for more than
PING_TIMEOUT seconds (100s), thus incorrectly triggering the:

error("Node ping apparently hung, many nodes may be DOWN or configured "
"SlurmdTimeout should be increased");

Bug 3914

f7463ef5

Docs - add note the UnkillableStepTimeout that node will be drained. · 04b431b4
Tim Shaw authored Jul 27, 2017
```
Bug 3941.
```
04b431b4

26 Jul, 2017 5 commits

Fix issue where UnkillableStepProgram if step was in an ending state. · 9f48e07c
Danny Auble authored Jul 26, 2017

9f48e07c
Fix minor memory leak if launch fails in the slurmstepd. · 558d7c1a
Danny Auble authored Jul 24, 2017

558d7c1a

Continuation of . · 7e5d3d15

Danny Auble authored Jul 24, 2017

Get rid of any race conditions and call anything that was in
_pre_task_privileged from the parent instead of the child.

NOTE: This should be safe as we don't execute the task until after
_exec_wait_child_wait_for_parent is signaled which happens after all this is
long over.

7e5d3d15

If failing after switch_g_job_init happened make sure switch_g_job_fini is called. · 488c7c36
Danny Auble authored Jul 24, 2017
```
Bug 3865
```
488c7c36

Fix regression in commit that would put the stepd pid into the... · f28b1a97

Dominik Bartkiewicz authored Jul 05, 2017

Fix regression in commit e5c05549 that would put the stepd pid into the memory cgroup instead of the task's pid.

Beforehand this would put the result of getpid() into the cgroup. Before
e5c05549 this was done in the child of the fork which would get you
the task's pid, but moving it to run in the parent broke this logic.

What this patch does is adds pid to the input parameters of
task_g_pre_launch_priv making it so we could use the correct pid.

f28b1a97

25 Jul, 2017 1 commit
- Add SLUG17 agenda and hotel information to web site · 54140205
  Morris Jette authored Jul 25, 2017
  
  54140205
24 Jul, 2017 3 commits
- CRAY - Throttle step creation if trying to create too many steps at once. · f9f13a86
  Morris Jette authored Jul 24, 2017
  
  f9f13a86
- Better document how max_agent_queue works (in the code anyway). · 32056835
  Danny Auble authored Jul 24, 2017
  
  32056835
- Fix memory leak in slurmctld when agent queue to the DBD has filled up. · 6c7b9ba1
  Danny Auble authored Jul 24, 2017
```
Pretty much fix the entire purpose of this max_agent_queue.
```
  6c7b9ba1
21 Jul, 2017 2 commits
- Serialize updates from from the dbd to the slurmctld. · 24375cb8
  Danny Auble authored Jul 21, 2017
```
Bug 3159
```
  24375cb8
- Fixed truncation on scontrol show config output. · 5b1983d5
  Tim Shaw authored Jul 21, 2017
```
Bug 3956
```
  5b1983d5