Commits · 0264cb75b09d635a172a5846376572862c4b5102 · Manuel G. Marciani / ces_slurm_simulator

01 Mar, 2016 5 commits

Simplify Makefile.am for doc/ and run autogen.sh · 0264cb75
Tim Wickberg authored Mar 01, 2016

0264cb75
run autogen.sh with automake 1.15 · 48f36224
Tim Wickberg authored Mar 01, 2016

48f36224

Defer suspend until launch completes · d2cd18d1

Morris Jette authored Mar 01, 2016

This fixes a bug introduced in commit 52fe3de1
in the event the fork() call fails in slurmstepd.

d2cd18d1

Defer suspend until launch completes · 52fe3de1

Morris Jette authored Feb 29, 2016

Insure that a job is completely launched before trying to suspend it.
Previous logic would start suspend logic early in the life of the
slurmstepd process, after it's listening socket was open but before
the tasks were launched. This defers the suspend logic until after
all prologs and setup completes and the tasks are launched. This is
important in the case of gang scheduling, in which newly launched
jobs can be immediately suspended.
bug 2494

52fe3de1

Add "JobId=" to some log messages for better clarity · 1a7b4f62
Morris Jette authored Feb 29, 2016

1a7b4f62

29 Feb, 2016 1 commit
- Fix test21.21 to work when AccountingStorageEnforce=safe isn't set. · 9e2e2f15
  Danny Auble authored Feb 29, 2016
```
Bug 1976
```
  9e2e2f15
26 Feb, 2016 5 commits
- Set correct reason when a QOS' MaxTresMins is violated. · 745568f2
  Danny Auble authored Feb 26, 2016
  
  745568f2
- Add not to slurm.conf man page about SallocDefaultCommand and TaskPlugins. · b5b349b0
  Tim Wickberg authored Feb 25, 2016
```
Add note to slurm.conf man page about setting "--cpu_bind=no" as part
of SallocDefaultCommand if a TaskPlugin is in use.
```
  b5b349b0
- Replace goto with break · e990c183
  Maksym Planeta authored Feb 25, 2016
  
  e990c183
- fix limitation in test · ac6b1c34
  Bjørn-Helge Mevik authored Feb 25, 2016
```
Test 14.10 in the test suite (of slurm 15.08.8, at least) uses

  $sinfo -tidle -h -o%n

to find idle nodes.  This only works if NodeHostname == NodeName on the
nodes.  The following should work regardless of this:

  $scontrol show hostnames \$($sinfo -tidle -h -o%N)
```
  ac6b1c34
- Grammatical nit in srun(1). · 3c2676ec
  Tim Wickberg authored Feb 25, 2016
  
  3c2676ec
25 Feb, 2016 3 commits

Add missing definition for val_to_char() · 344c74fc

Tim Wickberg authored Feb 25, 2016

Since the function is inlined the single definition let GCC build everything
properly, but debug builds (which disable inline) resulted in:
slurmstepd: [465.0]: symbol lookup error:
(trimmed path)/task_cgroup.so: undefined symbol: val_to_char
when running srun --cpu_bind=v.

task/affinity had this definition already, task/cgroup didn't.

344c74fc

Fix for unititialized memory · c0509864
Morris Jette authored Feb 25, 2016
```
Reported by valgrind running test7.2, but shouldn't cause any real problem
```
c0509864
Fix issue where SocketsPerBoard didn't translate to Sockets when CPUS= · fcae2193
Danny Auble authored Feb 24, 2016
```
was also given.
```
fcae2193

24 Feb, 2016 9 commits
- Make it so scontrol update part qos= will take away a partition QOS from · 3a7470ae
  Danny Auble authored Feb 24, 2016
```
a partition.
```
  3a7470ae
- Make it possible to change CPUsPerTask with scontrol. · de28c13a
  Danny Auble authored Feb 24, 2016
```
This also reverts most of commit fa331e30 as well as commit bd9fa830
which would try to set the pn_min_cpus every time a job was updated.
If a job didn't request node counts then they were hosed.

This commit takes away the magic which was screwing things up.  Now the
person gets what they asked for without magic changing things.

Bug 2302
Bug 2742
Bug 2478
```
  de28c13a
- Fix issue where when updating a job the pn_min_cpus was updated · bd9fa830
  Danny Auble authored Feb 24, 2016
```
erroneously.
```
  bd9fa830
- Properly handle select_g_select_nodeinfo_get() error · 542ead89
  Morris Jette authored Feb 24, 2016
```
Failure has never been observed, but initialize the used variable
  before calling the function so we don't re-use old data if the
  function returns an error.
```
  542ead89
- Rename a variable, no change in logic · 31d67fb5
  Morris Jette authored Feb 24, 2016
```
Rename an improperly named variable in the logic scontrol uses to
  print node information ("total_used" was really "idle_cpus"), so
  the logic looks the same as that used in sinfo to determine node
  state.
```
  31d67fb5
- Improve some step allocation logs · a0e3e5de
  Morris Jette authored Feb 23, 2016
```
Include warning for Cray simulation as reminder for developers to
change code as needed.
```
  a0e3e5de
- Merge remote-tracking branch 'origin/slurm-14.11' into slurm-15.08 · 18fe4463
  Danny Auble authored Feb 23, 2016
  
  18fe4463
- BGQ - Tighter locks around structures when nodes/cables change state. · c5925f41
  Danny Auble authored Feb 23, 2016
  
  c5925f41
- BGQ - Remove redeclaration of job_read_lock. · fd3dedda
  Danny Auble authored Feb 23, 2016
  
  fd3dedda
23 Feb, 2016 2 commits

select/cray: Log NHC run times over 1 minute · eb58137b
Morris Jette authored Feb 23, 2016

eb58137b

Fix issue with resizing jobs and limits not be kept track of correctly. · 92ac0dcd

Danny Auble authored Feb 22, 2016

This whole process could probably be done better by keeping track of
old values and new values and only calling one function instead of a
pre and post function, but that can probably wait for future generations
of the code as it works now and is probably adequate for the time being.

Bug 2352

92ac0dcd

19 Feb, 2016 8 commits
- Replace 'inexistent' with 'non-existent'. No functional change. · f75a90f5
  Tim Wickberg authored Feb 19, 2016
  
  f75a90f5
- Spelling corrections. No functional changes. · c276bf49
  Gennaro Oliva authored Feb 19, 2016
```
Consistantly use American English for
existant -> existent
assocation -> association

Correct some typos, and one grammatical mistake.
```
  c276bf49
- BurstBuffer/cray pre-run race condtition fix · e8959ae9
  Morris Jette authored Feb 19, 2016
```
BurstBuffer/cray - Defer job cancellation or time limit while "pre-run"
    operation in progress to avoid inconsistent state due to multiple calls
    to job termination functions.
bug 2454
```
  e8959ae9
- Move fclose into conditional. · 691b97b0
  Tim Wickberg authored Feb 18, 2016
```
Otherwise call fclose(NULL) iff the ClusterName is not set and the
clustername file does not exist. Should not happen in production.

Coverity #67041.
```
  691b97b0
- backport of commit aa5eb7ef · babfbef9
  Morris Jette authored Feb 18, 2016
  
  babfbef9
- Fix test to check for failure on the array · 8adeecd7
  Danny Auble authored Feb 18, 2016
  
  8adeecd7
- Update NEWS for start of v15.08.9 · 4aa2e3c2
  Morris Jette authored Feb 18, 2016
  
  4aa2e3c2
- Update META for v15.08.8 tag · 10cf27a9
  Morris Jette authored Feb 18, 2016
  
  10cf27a9
18 Feb, 2016 7 commits
- Fix minor regression from commit c6771b1e . · 1a2e4819
  Danny Auble authored Feb 18, 2016
  
  1a2e4819
- MYSQL - Avoid having multiple default accounts when a user is added to · eafc3f71
  Danny Auble authored Feb 18, 2016
```
a new account and making it a default all at once.

Bug 2428
```
  eafc3f71
- Prevent segfault in acct_gather_energy_cray if data requested before set. · c6771b1e
  Alejandro Sanchez authored Feb 18, 2016
```
Match acct_gather_energy/rapl plugin. Bug 2397.
```
  c6771b1e
- Add new assoc_limit_continue option to SchedulerParameters · 0615d809
  Tim Wickberg authored Feb 18, 2016
```
Control whether the scheduler will continue to try to run jobs in
a partition if a higher priority job is stuck due to an association
limit.

Can cause starvation for larger jobs, but will improve throughput and
utilization for systems that have extensively divvyed up their resources
through association/QOS limits.

Bug 2388 and 2452.
```
  0615d809
- Fix inadequate locks when updating a partition's TRES. · b223e52c
  Danny Auble authored Feb 18, 2016
```
Bug 2453
```
  b223e52c
- power/cray: add redundant node state check · 38afd69b
  Morris Jette authored Feb 18, 2016
```
This should have no effect, but is a belt-and-suspenders approach
  to checking node state.
```
  38afd69b
- Changing 'access denied' message of pam_slurm to be more clear · f8c2fcee
  Jeff White authored Feb 18, 2016
  
  f8c2fcee