Commits · 78d1bcf888a4c1f41ae063f3fffbef55321a81c8 · Manuel G. Marciani / ces_slurm_simulator

18 Mar, 2016 1 commit

Fix for srun abort on SIGSTOP+SIGCONT · 1ed38f26

Morris Jette authored Mar 18, 2016

Avoid possibly aborting srun that gets simultaneous SIGSTOP+SIGCONT while
    creating the job step. The result is that the signal hanlder gets a
    argument (the signal received) of zero.

Here's a log, window 1:
$ srun hostname
srun: Job step creation temporarily disabled, retrying
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 0
srun: Cancelled pending job step

Window 2:
$  kill -STOP 18696 ; kill -CONT 18696
$  kill -STOP 18696 ; kill -CONT 18696
$  kill -STOP 18696 ; kill -CONT 18696
....

bug 2494

1ed38f26

17 Mar, 2016 2 commits

Change calculation of node's allocated CPUs · ec50cb2f

Morris Jette authored Mar 17, 2016

Change how a node's allocated CPU count is calculated to avoid double
counting CPUs allocated to multiple jobs at the same time.
Previous logic would sum the maximum number of CPUs allocated by each
partition for any time slice, which could double count CPUs allocated
to multiple jobs. New logic ORs bitmap of allocated CPUs for every
partition and time slice, then counts the total for a given node.
This avoids double counting CPUs allocated to multiple jobs, but
does not remove from the count CPUs which have been allocated to
jobs which might be suspended by the gang scheduler (either for
time slicing or preemption).

ec50cb2f

Prevent uid update from corrupting assoc_hash table. · 60b58b70

Tim Wickberg authored Mar 17, 2016

The uid is used as part of the hash function, must remove old reference
and recalculate if it may change, otherwise _delete_assoc_hash
will not find it again when the association is removed, causing
slurmctld to segfault.

Bug 2560.

60b58b70

16 Mar, 2016 8 commits

Add --gres-flags=enforce-binding option · 5d7f8b76

Morris Jette authored Mar 16, 2016

Add --gres-flags=enforce-binding option to salloc, sbatch and srun commands.
    If set, the only CPUs available to the job will be those bound to the
    selected GRES (i.e. the CPUs identifed in the gres.conf file will be
    strictly enforced rather than advisory).
bug 1725

5d7f8b76

Update gang scheduling data structures when job changes in size · 701917cc

Morris Jette authored Mar 16, 2016

Previous gang scheduling logic maintained information about resources
  originally allocated to the job and made scheduling decisions on
  that basis.
bug 2494

701917cc

gang scheduling for with manually job suspend/resume · 344d2eab

Morris Jette authored Mar 16, 2016

Update gang scheduling table when job manually suspended or resumed. Prior
    logic could mess up job suspend/resume sequencing.
bug 2494

344d2eab

Fix issue when adding a new TRES to AccountingStorageTRES for the first · 6c436e34

Danny Auble authored Mar 16, 2016

time.

https://bugs.schedmd.com/show_bug.cgi?id=2547

The code just wasn't fully baked before and was probably written before
a lot of the other supporting code was done i.e
assoc_mgr_set_assoc|qos_tres_cnt were done specifically for this kind of
thing.  Many of the usage structures weren't realloced either as well as
the tres_cnt local to each qos and assoc wasn't updated.  So all in all
pretty bad code - bad Danny.  This makes sure all this sets up and no
memory corruption happens.

6c436e34

Send burst buffer teardown immediately · d85cdcc7

Morris Jette authored Mar 16, 2016

Generate burst buffer use completion email immediately afer teardown
    completes rather than at job purge time (likely minutes later).
bug 2539

d85cdcc7

Modify burst buffer stage out message · fae4c3d3

Morris Jette authored Mar 16, 2016

Change burst buffer use completion message from
"SLURM Job_id=1360353 Name=tmp Staged Out, StageOut time 00:01:47" to
"SLURM Job_id=1360353 Name=tmp StageOut/Teardown time 00:01:47"

fae4c3d3

Don't call primary controller for every RPC when backup is in control. · a380ee41
Brian Christiansen authored Mar 15, 2016

a380ee41
Add TCPTimeout option to slurm[dbd].conf · 7ff89ad2
Brian Christiansen authored Mar 15, 2016
```
Bug 2396
```
7ff89ad2

15 Mar, 2016 2 commits
- acct_gather_energy/ipmi - add threshold for message logging · 18608974
  Alejandro Sanchez authored Mar 15, 2016
  
  18608974
- Check that bb_state.tres_pos is set correctly to avoid overwriting CPU TRES. · 5708037d
  Tim Wickberg authored Mar 15, 2016
```
Bug 2543.
```
  5708037d
14 Mar, 2016 2 commits
- Add option for TopologyParam=NoInAddrAnyCtld to make the slurmctld listen · 775c46de
  Danny Auble authored Mar 14, 2016
```
on only one port like TopologyParam=NoInAddrAny does for everything else.
```
  775c46de
- FreeBSD - set_oom_adj is Linux-specific, stub out to avoid errors. · b3f2359f
  Tim Wickberg authored Mar 14, 2016
```
There's no /proc on *BSD, and BSD handles OOM in a completely different way.
```
  b3f2359f
12 Mar, 2016 1 commit
- Add srun --compress option for use with --bcast option · 7bb03489
  Morris Jette authored Mar 11, 2016
  
  7bb03489
11 Mar, 2016 3 commits
- Fix some data flushing in sbcast compression logic · dffc1908
  Morris Jette authored Mar 11, 2016
  
  dffc1908
- Fix job array step function printout. · 03d29e24
  Tim Wickberg authored Mar 11, 2016
```
Return [0-100:2] formatting, rather than [0,2,4,6,8,...] when using
a step function.

Was inadvertantly broken in 14.11 with commit 5ffdca92.

Bug 2535.
```
  03d29e24
- Increase default MaxTasksPerNode to 512 · d21c44f6
  Morris Jette authored Mar 11, 2016
```
Need higher count for KNL processor.
```
  d21c44f6
10 Mar, 2016 5 commits

cray job requeue bug · 536c8451

Morris Jette authored Mar 09, 2016

Fix Cray NHC spawning on job requeue. Previous logic would leave nodes
allocated to a requeued job as non-usable on job termination.

Specifically, each job has a "cleaning/cleaned" flag. Once a job
terminates, the cleaning flag is set, then after the job node health
check completes, the value gets set to cleaned. If the job is requeued,
on its second (or subsequent) termination, the select/cray plugin
is called to launch the NHC. The plugin sees the "cleaned" flag
already set, it then logs:
error: select_p_job_fini: Cleaned flag already set for job 1283858, this should never happen
and returns, never launching the NHC. Since the termination of the
job NHC triggers releasing job resources (CPUs, memory, and GRES),
those resources are never released for use by other jobs.

Bug 2384

536c8451

Correctly parse nids in slurmconfgen_smw.py · e050806e

David Gloe authored Mar 09, 2016

An error in slurmconfgen_smw.py caused it to parse the nic as the nid.
On some systems those values differ, causing the generated slurm.conf file to
be incorrect.

Bug 2532.

e050806e

Fix route/topology plugin to prevent segfault in sbcast. · 0dfc924c

Bill Brophy authored Mar 08, 2016

route_p_split_hostlist was not thread-safe, and would cause
one of several segfaults depending on where in the initialization
code each thread was.

Bug 2495.

0dfc924c

Fix displayed value for RoutePlugin. · db8491f1
Tim Wickberg authored Mar 08, 2016
```
Was incorrectly displaying "(null)" even when loaded successfully.
```
db8491f1
Add NEWS for commit 3bb2e602 · a0be0dc5
Morris Jette authored Mar 10, 2016

a0be0dc5

09 Mar, 2016 2 commits

cray job requeue bug · fec5e03b

Morris Jette authored Mar 09, 2016

Fix Cray NHC spawning on job requeue. Previous logic would leave nodes
allocated to a requeued job as non-usable on job termination.

Specifically, each job has a "cleaning/cleaned" flag. Once a job
terminates, the cleaning flag is set, then after the job node health
check completes, the value gets set to cleaned. If the job is requeued,
on its second (or subsequent) termination, the select/cray plugin
is called to launch the NHC. The plugin sees the "cleaned" flag
already set, it then logs:
error: select_p_job_fini: Cleaned flag already set for job 1283858, this should never happen
and returns, never launching the NHC. Since the termination of the
job NHC triggers releasing job resources (CPUs, memory, and GRES),
those resources are never released for use by other jobs.

Bug 2384

fec5e03b

Correctly parse nids in slurmconfgen_smw.py · 88ccc111

David Gloe authored Mar 09, 2016

An error in slurmconfgen_smw.py caused it to parse the nic as the nid.
On some systems those values differ, causing the generated slurm.conf file to
be incorrect.

Bug 2532.

88ccc111

08 Mar, 2016 2 commits

Fix route/topology plugin to prevent segfault in sbcast. · 897c4b27

Bill Brophy authored Mar 08, 2016

route_p_split_hostlist was not thread-safe, and would cause
one of several segfaults depending on where in the initialization
code each thread was.

Bug 2495.

897c4b27

Fix displayed value for RoutePlugin. · 14c51e65
Tim Wickberg authored Mar 08, 2016
```
Was incorrectly displaying "(null)" even when loaded successfully.
```
14c51e65

07 Mar, 2016 1 commit

Added per job array task dependencies · c8dd9790

Dominik Bartkiewicz authored Mar 07, 2016

Added new job dependency type of "aftercorr" which will start a task of a
    job array after the corresponding task of another job array completes.
bug 2460

c8dd9790

05 Mar, 2016 2 commits
- Make it so jobs/steps track ':' named gres/tres, before hand gres/gpu:tesla · 0cd69296
  Danny Auble authored Mar 04, 2016
```
would only track gres/gpu, now it will track both gres/gpu and
gres/gpu:tesla as separate gres if configured like
AccountingStorageTRES=gres/gpu,gres/gpu:tesla
```
  0cd69296
- Fixed double read lock on getting job's gres/tres. · b23a57cf
  Danny Auble authored Mar 04, 2016
  
  b23a57cf
04 Mar, 2016 3 commits
- Fix issue where steps weren't always getting the gres/tres involved. · b294f81b
  Danny Auble authored Mar 04, 2016
  
  b294f81b
- Fix NEWS entry. · d2b913a2
  Brian Christiansen authored Mar 03, 2016
```
Continuation of 31225a82
```
  d2b913a2
- Fix for tasks being packed onto core when --ntasks-per-core=1 and --cpus-per-task > threads. · b11ec103
  Brian Christiansen authored Mar 03, 2016
```
Bug 2430
```
  b11ec103
03 Mar, 2016 5 commits

Defer slurmd registration until NodeHealthCheck · 7fb0c981

Thomas Hamel authored Mar 03, 2016

We want to introduce a new behavior in the way slurmd uses the
HealthCheckProgram. The idea is to avoid a race condition between the
first HealthCheckProgram run and the node accepting jobs. The slurmd
daemon will initialize and then loop on HealthCheckProgram execution
before registering with slurmctld. It will stay in this loop until
the HealthCheckProgram returns successfully (the node is still DOWN).

On our clusters we are using NHC as an HealthCheckProgram. NHC drains
the node if it fails and remove the drain if it is successfull, this
behavior fits well with our purpose. This behavior permits us to start
slurmd at boot without setting up a complex boot sequence in the init
system, slurmd just wait for the node to be ready before registering.

The HealthCheckProgram is not run during slurmd startup if
HealthCheckInteval is 0.

7fb0c981

Fix issue with sbcast not doing a correct fanout. · 72f13426
Danny Auble authored Mar 03, 2016

72f13426
Fix getting reservations to database when database is down. · 5c43d754
Brian Christiansen authored Mar 03, 2016
```
Bug 2507
```
5c43d754

Increase step GRES variable size · 7f0bdc84

Morris Jette authored Mar 03, 2016

Step GRES value changed from type "int" to "int64_t" to support larger
values. Previous logic could fail in step allocation values over 32-bits.
Other GRES values are 64-bit.

7f0bdc84

Force close on exec on first 256 file descriptors when launching a · f502f1e5

Danny Auble authored Mar 02, 2016

slurmstepd to close potential open ones.

It was pointed out the slurmd using acct_gather_energy/ipmi links to
freeipmi which could possibly open /dev/ipmi0 without the close on exec
flag set as root while launching a step leaving it open in the users app.

What this does is sets the flag on the first 256 to mitigate the concern.

Reported by Maksym Planeta.

Bug 2506

f502f1e5

02 Mar, 2016 1 commit

Backfill scheduler to validate correct job partition · efd9d35e

Gary B Skouson authored Mar 02, 2016

Previous logic tested whatever the job's partition pointer indicated
rather than the partition we are trying to run the job in. This bug
was introduced in Slurm version 15.08.5, Nov 16, 2015, commit
94f0e948
bug 2499

efd9d35e