Commits · 58bf4d793efd85f11c9749fe19a7446b45a4fae8 · Manuel G. Marciani / ces_slurm_simulator

21 Apr, 2016 7 commits

Fix tests for some configurations · 58bf4d79

Morris Jette authored Apr 21, 2016

Some portions of tests 21.30 and 21.34 failed with accounting and
priority basic. These changes disable portions of those tests as
needed based upon configuration.

58bf4d79

Disable tests 21.30 and 21.34 when running priority/basic. · 5083c09e
Brian Christiansen authored Apr 21, 2016

5083c09e
Fix priority plugin not removing full run mins. · 8977187e
Brian Christiansen authored Apr 21, 2016
```
The basic plugin doesn't do a decay. So it just needs to remove the all of the allocated minutes.
```
8977187e
Fix indentation. · 5c843c11
Brian Christiansen authored Apr 21, 2016

5c843c11

Burst_buffer paths fix · 571915db

Morris Jette authored Apr 20, 2016

This add some additional logic to the commit made to version 15.08
as needed for operation with version 16.04. Specifically, once a
persistent burst buffer is created in versioin 16.04 the create
flag is cleared to prevent attempts at duplicate buffer create.
A new "use" persistent burst buffer is added for our needs (indicating
that a DataWarp "paths" operation is required). The first commit
is 905ac850

571915db

burst_buffer/cray - fix create/desroy buffer only · 905ac850

Morris Jette authored Apr 20, 2016

burst_buffer/cray - Don't call Datawarp "paths" function if script includes
    only create or destroy of persistent burst buffer. Some versions of Datawarp
    software return an error for such scripts, causing the job to be held.
bug 2624

905ac850

Move some definitions into alphabetic order · 75fbaaca
Morris Jette authored Apr 20, 2016
```
No change in any logic or definitions
```
75fbaaca

20 Apr, 2016 7 commits

Avoid burst buffer plugin fail if no config file · b3459551
Morris Jette authored Apr 20, 2016

b3459551

Add time limits to some tests · 635c88be

Morris Jette authored Apr 19, 2016

Without these time limits and without time limits on the partitions,
the group usage limits become huge values and make validation of some
qos/association limit tests confusing

635c88be

Fix assertion when using scancel --signal. · 4ff9ac09
Brian Christiansen authored Apr 20, 2016
```
Bug 2601
```
4ff9ac09
Fix race condition that causes segfault. · 2f593e5d
Brian Christiansen authored Apr 20, 2016
```
When using NO_NHC, the step's job ptr would be nulled out before signalling the tasks.
```
2f593e5d

Support the intel_pstate scaling driver · a4f35c45

Janne Blomqvist authored Apr 20, 2016

I noticed that the CpuFreqDef config option was only partially implemented. The value was parsed, but the never used. So I took the liberty of re-purposing it to mean sort of the opposite, namely the frequency governor to use when running a job step in case the job doesn't explicitly provide any --cpu-freq option.

I also changed the default of the CpuFreqGovernors option to be "ondemand,performance", since ondemand isn't available with the intel_pstate driver.

Otherwise the patch should be relatively straightforward and only changes a few minor things here and there.

a4f35c45

Forgot to include NEWS entry in staged changes. · 836decb1
Tim Wickberg authored Apr 19, 2016

836decb1

Fix build on FreeBSD · 2349d93d

Tim Wickberg authored Apr 19, 2016

a) setpgrp() swapped for equivalent setpgid(0, 0)

b) define _GNU_SOURCE to unmask getline function definition in stdlib.h

2349d93d

16 Apr, 2016 1 commit

Fix race condtion in a test · 0cbeddda

Morris Jette authored Apr 15, 2016

The test was sensitive with respect to a batch step starting before
requeuing the job. The batch step accounting record either appeared
in the accounting records or did not depending upon timing. A sleep
has been added after the job enters RUNNING state to make sure the
batch steps starts and an accounting records is generated for it.

0cbeddda

15 Apr, 2016 21 commits

Improve tests · b3c5cd09

Morris Jette authored Apr 15, 2016

Include test ID in the account name to better identify where
  vestigial accounts come from.

b3c5cd09

Remove umask call. · 726caf19

Brian Christiansen authored Apr 15, 2016

Coverity reported:
    CID 93013:  Error handling issues  (CHECKED_RETURN)
	    "read(int, void *, size_t)" returns the number of bytes read, but it is ignored.

umask() is also not thread-safe.

726caf19

[PATCH] Separate health check from shutdown check · 988edf12

Thomas Hamel authored Apr 15, 2016

While waiting for the HealthCheckProgram to succeed, slurmd can be
stopped. The previous behavior introduced a delay up to 10 seconds
between the shutdown request and the actual shutdown. This patch
removes this delay.

988edf12

Remove siphash_init() and rely on fixed keying. · ba7a6869

Tim Wickberg authored Apr 15, 2016

Intentially leave the key value fixed, rather than initialize from
from /dev/urandom as is commonly recommended. Slurm does not rely on
the hash function for any cryptographic functionality, and randomness
would make debugging harder if the hash key changed on each start.

ba7a6869

Remove dead assignment. · 8da4aa58
Brian Christiansen authored Apr 15, 2016
```
Found by clang.
```
8da4aa58
Merge branch 'slurm-15.08' · cdf7a90b
Morris Jette authored Apr 15, 2016

cdf7a90b
For for coverity reported bug · 4f0e0236
Morris Jette authored Apr 15, 2016

4f0e0236
Fix for coverity reported error · 0e4dc730
Morris Jette authored Apr 15, 2016

0e4dc730
Harden deadline tests · 0d4442e9
Morris Jette authored Apr 15, 2016

0d4442e9

Don't start deadline job at submission · 16eff879

Morris Jette authored Apr 15, 2016

The job submit logic is not prepared to deal with deadline scheduling.
If a job is submitted with a deadline, defer it's scheduler to the
main scheduling loop or backfill scheduler, which has logic to manage
deadlines.

16eff879

Determine if the CCM prologue needs to be rerun during job recovery · 5b660c7a

Marlys Konhke authored Apr 15, 2016

As part of the setup activity prior to invoking the CCM prologue on Cray native
Slurm systems, the job prolog_running value is incremented and the job_state is
OR'd with JOB_CONFIGURING. After the CCM prologue completes, these field
changes are removed. That setup activity allows the CCM prologue to complete
before the job launch continues.

If the slurmctld is shutdown or killed while a CCM prologue is executing, those
two job field changes can't be removed since slurmctld is no longer there.
Clearing those field values is now handled during job recovery within the
select/cray plugin select_p_job_init() procedure. If a job being recovered came
from a CCM defined partition and if either of those two field values are still
set as above, then the CCM prologue is run again.

The CCM prologue handles being called more than once. The above field changes
are then removed after this rerun CCM prologue completes. The CCM epilogue is
not affected.

5b660c7a

Replace strcasecmp|str with Slurm variants. · cea685ee
Danny Auble authored Apr 12, 2016

cea685ee
Replace locale var for a #define · 6781c298
Danny Auble authored Apr 12, 2016

6781c298
Removed #ifdefs to compile code no matter what. · 1e3678b4
Danny Auble authored Apr 12, 2016

1e3678b4
Move functions into order given, no real code change. · e21844a9
Marlys Konhke authored Apr 11, 2016

e21844a9
Take leading '_' off extern functions and vice verse for static functions. · 48850f7a
Danny Auble authored Apr 11, 2016

48850f7a
Initial commit for changes needed to make CCM work on a Cray XT. · 3bc9fff4
Marlys Konhke authored Apr 11, 2016

3bc9fff4

Fix for job deadline with QOS MaxWall · 8598bab5

Morris Jette authored Apr 15, 2016

If a job was submitted with a deadline and no time_limit or min_time,
but the system has a QOS MaxWall the job's time_limit would be set
to the QOS limit. Since there is no min_time specified, the QOS MaxWall
would be treated as a min and max time limit for the job and potentially
make the deadline impossible to satisfy. Now we set the min_time to
1 minute of there is a deadline, but no time_limit or min_time.

8598bab5

Fix bsub test for front-end configuration · 3170aaa2
Morris Jette authored Apr 15, 2016
```
Also make sure the job is cancelled at the end of the test
```
3170aaa2
Improve a regression test · b1ddb40a
Morris Jette authored Apr 15, 2016

b1ddb40a

Network topology option · bd42eaf7

Morris Jette authored Apr 14, 2016

Add TopologyParam option of "TopoOptional" to optimize network topology
    only for jobs requesting it.
bug 2567

bd42eaf7

14 Apr, 2016 4 commits

file_bcast - add read/write locking to file transfer list · 0575fcb4

Tim Wickberg authored Apr 14, 2016

Timeout stalled transfers and cleanup related data structures. Default
to wait five minutes since last update. Hook onto registration/ping message
type to trigger cleanup in a minimally invasive manner.

While here restructure certain functions to use list_* functions
rather than iterate on the structures.

0575fcb4

Don't set stage_out email for a Cray Burst Buffer if not set. · 40f8cca3
Tim Wickberg authored Apr 14, 2016
```
Otherwise --mail-type=ALL will send an unexpected stage_out message back
to the user.

Bug 2541.
```
40f8cca3
Don't set stage_out email for a Cray Burst Buffer if not set. · 523d193e
Tim Wickberg authored Apr 14, 2016
```
Otherwise --mail-type=ALL will send an unexpected stage_out message back
to the user.

Bug 2541.
```
523d193e
Add "--with-cray_dir" build/configure option · 1768a63a
Morris Jette authored Apr 14, 2016

1768a63a