Commits · dbfbcfe96d094bfda6fcbbf228905bc05367d6b9 · Manuel G. Marciani / ces_slurm_simulator

02 Jun, 2016 2 commits
- Print correct return code on failure to update node features through sview · dbfbcfe9
  Tim Wickberg authored Jun 02, 2016
```
Wrong order of operations results in the return code being 0/1.
```
  dbfbcfe9
- Fix issue where slurmd could core when running the ipmi energy plugin. · 880823f7
  Danny Auble authored Jun 02, 2016
```
If the plugin ever returns an error the variables weren't initialized so
when they were freed they could corrupt memory.

Bug 2790
```
  880823f7
31 May, 2016 5 commits
- Start NEWS for v15.08.13 · 29143a27
  Morris Jette authored May 31, 2016
  
  29143a27
- Update META for v15.08.12 tag · 2a212a17
  Morris Jette authored May 31, 2016
  
  2a212a17
- Fix backwards compatibility with sreport going to <= 14.11 coming from · d82263d2
  Danny Auble authored May 31, 2016
```
>= 15.08 for some reports.
```
  d82263d2
- Fix bad order-of-operations in _connect_srun_cr. · c53ce3f6
  Tim Wickberg authored May 31, 2016
```
Prevents correct error handling by rc being 0/1 instead of the original
return code.

Also fix slurm_send_only_controller_msg and slurm_send_only_node_msg
although these only result in bad printed values in the debug message.
```
  c53ce3f6
- Fix Hidden error during _rpc_forward_data call. · ecc2f7a4
  Artem Polyakov authored May 31, 2016
```
Bug 2120
```
  ecc2f7a4
28 May, 2016 2 commits
- Another fix for pbs parsing mail-type also caused by commit 2a817734 · d41215d2
  Danny Auble authored May 27, 2016
  
  d41215d2
- Continuation of commit 2a817734 which broke functionality. · 7d90f534
  Danny Auble authored May 27, 2016
  
  7d90f534
27 May, 2016 4 commits

remove some dead stores · b1d5df62
Morris Jette authored May 27, 2016

b1d5df62

Fix for tracking a node's allocated CPUs with gang scheduling. · 4ce62678

Morris Jette authored May 26, 2016

This bug was introduced by commit 21c52d2f
which fixed a different problem tracking resources associated with suspended
jobs. There are subtle differences between jobs that are suspended by a
user/administrator and jobs suspended by gang scheduling which resulted in
undercounting allocated CPUs when a job suspended by gang scheduling
was active at the same time of a slurmctld reconfiguration request.
See bugs 2353 (original bug related to commit 21c52d2f
and bug 2765

4ce62678

If no default account is given for a user when creating (only a list of · 9917c49d

Danny Auble authored May 26, 2016

accounts) no default account is printed, previously NULL was printed.

This is just not printing it, but whole function should probably be
revisited as the rigmarole can probably be avoided as we always know what
the default is going to be if none is specified (first off the list).

The problem with that though is if the user has been added to a cluster
already and they have a default, but then added to a new cluster where
they don't have a default. In this case you want to keep the first
clusters default, but set the default for the second cluster.

Bug 2725

9917c49d

Make it so --mail-type=NONE didn't throw an invalid error. · 2a817734
Danny Auble authored May 26, 2016

2a817734

25 May, 2016 2 commits
- Prevent possible deadlock in acct_gather_filesystem/lustre · 8bb7711e
  Tim Wickberg authored May 24, 2016
```
Add missing unlock before return. Coverity 44888.
```
  8bb7711e
- Prevent multiple responses to REQUEST_UPDATE_JOB_STEP from missing break. · 3fa3ad1e
  Tim Wickberg authored May 24, 2016
```
Coverity 44891.
```
  3fa3ad1e
24 May, 2016 6 commits
- Fix assignment instead of comparison to prevent infinite loop on failed execve. · a8601e91
  Tim Wickberg authored May 24, 2016
```
Coverity 44992.
```
  a8601e91
- Prevent possible deadlock in proctrack/lua. · 1d3da383
  Tim Wickberg authored May 24, 2016
```
Needs to unlock here, not re-lock the lock.
```
  1d3da383
- Add missing break to plugstack to prevent wrong verbose message. · 5ac6b927
  Tim Wickberg authored May 24, 2016
  
  5ac6b927
- Add missing break to sbcast option parsing. · 8013d254
  Tim Wickberg authored May 24, 2016
```
Prevent '--preserve' from being inadvertanly enabled by '-j'.
```
  8013d254
- Add missing break to prevent sbatch --array from inadvertanly enabling debug. · ada6f5c8
  Tim Wickberg authored May 24, 2016
  
  ada6f5c8
- Add missing break statement to fix CFULL_BLOCK distribution type. · 1189c849
  Tim Wickberg authored May 24, 2016
  
  1189c849
23 May, 2016 1 commit

Fix scancel(1) uninitialized condition variable · 370e828e

Nicolas Joly authored May 23, 2016

Still testing 16.05 on my NetBSD/amd64 workstation ...
Just encountered a crash with scancel(1).
njoly@lanfeust [~]> sbatch --wrap "sleep 3600"
Submitted batch job 4680
njoly@lanfeust [~]> scancel 4680
scancel: Error detected by libpthread: Invalid condition variable.
Detected by file "/local/src/NetBSD/src/lib/libpthread/pthread_cond.c", line 140, function "pthread_cond_timedwait".
See pthread(3) for information.
zsh: abort (core dumped)  scancel 4680
Checking the code show indeed that pthread_cond_wait() call from scancel.c:_signal_job_by_str() use an uninitialised condition variable "num_active_threads_cond"
The attached patch, which add the missing pthread_cond_init() seems to fix it.
bug 2753

370e828e

18 May, 2016 2 commits
- MYSQL - Fix order of operations issue where if the database is locked up · d378b297
  Danny Auble authored May 18, 2016
```
and the slurmctld doesn't wait long enough for the response it would give
up leaving the connection open and create a situation where the next message
sent could receive the response of the first one.

Bug 2739
```
  d378b297
- Fix MemSpecLimit to depend on TaskPlugin=task/cgroup and ConstrainRAMSpace. · f4ebc793
  Alejandro Sanchez authored May 18, 2016
```
Bug #2713.
```
  f4ebc793
17 May, 2016 4 commits
- Update man pages for SLURMD_NODENAME env var · d02ca09d
  Morris Jette authored May 17, 2016
```
Correct description of the SLURMD_NODENAME environment variable
in the sbatch and srun man pages.
```
  d02ca09d
- Correction to 370e397f to restore build. · 2d47779b
  Tim Wickberg authored May 17, 2016
  
  2d47779b
- Run autogen.sh. · 02daeacb
  Tim Wickberg authored May 17, 2016
  
  02daeacb
- Fix jobcomp/elasticsearch plugin build when curl is installed in a non standard location. · 370e397f
  Tim Wickberg authored May 17, 2016
  
  370e397f
16 May, 2016 2 commits
- Apply six patches from pkgsrc package to fix NetBSD build. · c8116026
  Jason Bacon authored May 16, 2016
  
  c8116026
- Fix typo in NEWS · c2cd292a
  Morris Jette authored May 16, 2016
  
  c2cd292a
13 May, 2016 2 commits

Performance fix for commit b1fbeb85 · d73d56ec
Danny Auble authored May 12, 2016

d73d56ec

Fix race condition with respects to cleaning up the profiling threads · b1fbeb85

Danny Auble authored May 12, 2016

when in use.

The problem here is the polling threads in the various acct_gather codes
were detached and could possibly still be polling after the plugin had
been unloaded making a seg fault with a backtrace like this...

#0  0x00007fe7af008c00 in ?? ()
#1  0x00007fe7b1138479 in __nptl_deallocate_tsd () at pthread_create.c:175
#2  0x00007fe7b11398b0 in __nptl_deallocate_tsd () at pthread_create.c:326
#3  start_thread (arg=0x7fe7b1f12700) at pthread_create.c:346
#4  0x00007fe7b0e6fb5d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

The fix was to make the threads non-detached and join them before calling
a dlclose.

b1fbeb85

12 May, 2016 2 commits

Fix minor formatting · 30d82842
Danny Auble authored May 12, 2016

30d82842

If the cluster name and state are stored on NFS (with root_squash), · e422127c

Danny Auble authored May 12, 2016

trying to verify the cluster name (which may try to /create/ files or
directories) *before* dropping privs results in a fatal error as
slurmctld tries to create items which ultimately fail.  Moving
this process until after the privs and uid have changed allows
the process to succeed.

Reported by Jon Nelson <jdnelson@dyn.com>

Bug 2728

e422127c

11 May, 2016 2 commits
- MySQL - Fix potential memory leak when rolling up data. · 24ff890f
  Danny Auble authored May 10, 2016
  
  24ff890f
- Fix issue when TopologyParam=NoInAddrAny is set the responses wouldn't · 03a9e836
  Danny Auble authored May 10, 2016
```
make it to the slurmctld when using message aggregation.
```
  03a9e836
10 May, 2016 4 commits
- If running cached information and the database loses all TRES information · 72f7c1fd
  Danny Auble authored May 10, 2016
```
make sure we handle it correctly when the database comes back up.
```
  72f7c1fd
- Perlapi - Remove unneeded/undefined mutex. · 7276f432
  Danny Auble authored May 10, 2016
  
  7276f432
- Fix wrong info message when updating a job cpus_per_task · 9f9d36a1
  Alejandro Sanchez authored May 10, 2016
  
  9f9d36a1
- Add array_inx to job_submit.lua plugin for job arrays. · f52a3317
  Tim Wickberg authored May 10, 2016
  
  f52a3317