Commits · 2349d93d50f8447dd40563cc76732aaef070fa16 · Manuel G. Marciani / ces_slurm_simulator

20 Apr, 2016 1 commit

Tim Wickberg authored Apr 19, 2016

a) setpgrp() swapped for equivalent setpgid(0, 0)

b) define _GNU_SOURCE to unmask getline function definition in stdlib.h

2349d93d

16 Apr, 2016 1 commit

Fix race condtion in a test · 0cbeddda

Morris Jette authored Apr 15, 2016

The test was sensitive with respect to a batch step starting before
requeuing the job. The batch step accounting record either appeared
in the accounting records or did not depending upon timing. A sleep
has been added after the job enters RUNNING state to make sure the
batch steps starts and an accounting records is generated for it.

0cbeddda

15 Apr, 2016 21 commits

Improve tests · b3c5cd09

Morris Jette authored Apr 15, 2016

Include test ID in the account name to better identify where
  vestigial accounts come from.

b3c5cd09

Remove umask call. · 726caf19

Brian Christiansen authored Apr 15, 2016

Coverity reported:
    CID 93013:  Error handling issues  (CHECKED_RETURN)
	    "read(int, void *, size_t)" returns the number of bytes read, but it is ignored.

umask() is also not thread-safe.

726caf19

[PATCH] Separate health check from shutdown check · 988edf12

Thomas Hamel authored Apr 15, 2016

While waiting for the HealthCheckProgram to succeed, slurmd can be
stopped. The previous behavior introduced a delay up to 10 seconds
between the shutdown request and the actual shutdown. This patch
removes this delay.

988edf12

Remove siphash_init() and rely on fixed keying. · ba7a6869

Tim Wickberg authored Apr 15, 2016

Intentially leave the key value fixed, rather than initialize from
from /dev/urandom as is commonly recommended. Slurm does not rely on
the hash function for any cryptographic functionality, and randomness
would make debugging harder if the hash key changed on each start.

ba7a6869

Remove dead assignment. · 8da4aa58
Brian Christiansen authored Apr 15, 2016
```
Found by clang.
```
8da4aa58
Merge branch 'slurm-15.08' · cdf7a90b
Morris Jette authored Apr 15, 2016

cdf7a90b
For for coverity reported bug · 4f0e0236
Morris Jette authored Apr 15, 2016

4f0e0236
Fix for coverity reported error · 0e4dc730
Morris Jette authored Apr 15, 2016

0e4dc730
Harden deadline tests · 0d4442e9
Morris Jette authored Apr 15, 2016

0d4442e9

Don't start deadline job at submission · 16eff879

Morris Jette authored Apr 15, 2016

The job submit logic is not prepared to deal with deadline scheduling.
If a job is submitted with a deadline, defer it's scheduler to the
main scheduling loop or backfill scheduler, which has logic to manage
deadlines.

16eff879

Determine if the CCM prologue needs to be rerun during job recovery · 5b660c7a

Marlys Konhke authored Apr 15, 2016

As part of the setup activity prior to invoking the CCM prologue on Cray native
Slurm systems, the job prolog_running value is incremented and the job_state is
OR'd with JOB_CONFIGURING. After the CCM prologue completes, these field
changes are removed. That setup activity allows the CCM prologue to complete
before the job launch continues.

If the slurmctld is shutdown or killed while a CCM prologue is executing, those
two job field changes can't be removed since slurmctld is no longer there.
Clearing those field values is now handled during job recovery within the
select/cray plugin select_p_job_init() procedure. If a job being recovered came
from a CCM defined partition and if either of those two field values are still
set as above, then the CCM prologue is run again.

The CCM prologue handles being called more than once. The above field changes
are then removed after this rerun CCM prologue completes. The CCM epilogue is
not affected.

5b660c7a

Replace strcasecmp|str with Slurm variants. · cea685ee
Danny Auble authored Apr 12, 2016

cea685ee
Replace locale var for a #define · 6781c298
Danny Auble authored Apr 12, 2016

6781c298
Removed #ifdefs to compile code no matter what. · 1e3678b4
Danny Auble authored Apr 12, 2016

1e3678b4
Move functions into order given, no real code change. · e21844a9
Marlys Konhke authored Apr 11, 2016

e21844a9
Take leading '_' off extern functions and vice verse for static functions. · 48850f7a
Danny Auble authored Apr 11, 2016

48850f7a
Initial commit for changes needed to make CCM work on a Cray XT. · 3bc9fff4
Marlys Konhke authored Apr 11, 2016

3bc9fff4

Fix for job deadline with QOS MaxWall · 8598bab5

Morris Jette authored Apr 15, 2016

If a job was submitted with a deadline and no time_limit or min_time,
but the system has a QOS MaxWall the job's time_limit would be set
to the QOS limit. Since there is no min_time specified, the QOS MaxWall
would be treated as a min and max time limit for the job and potentially
make the deadline impossible to satisfy. Now we set the min_time to
1 minute of there is a deadline, but no time_limit or min_time.

8598bab5

Fix bsub test for front-end configuration · 3170aaa2
Morris Jette authored Apr 15, 2016
```
Also make sure the job is cancelled at the end of the test
```
3170aaa2
Improve a regression test · b1ddb40a
Morris Jette authored Apr 15, 2016

b1ddb40a

Network topology option · bd42eaf7

Morris Jette authored Apr 14, 2016

Add TopologyParam option of "TopoOptional" to optimize network topology
    only for jobs requesting it.
bug 2567

bd42eaf7

14 Apr, 2016 17 commits
- file_bcast - add read/write locking to file transfer list · 0575fcb4
  Tim Wickberg authored Apr 14, 2016
```
Timeout stalled transfers and cleanup related data structures. Default
to wait five minutes since last update. Hook onto registration/ping message
type to trigger cleanup in a minimally invasive manner.

While here restructure certain functions to use list_* functions
rather than iterate on the structures.
```
  0575fcb4
- Don't set stage_out email for a Cray Burst Buffer if not set. · 40f8cca3
  Tim Wickberg authored Apr 14, 2016
```
Otherwise --mail-type=ALL will send an unexpected stage_out message back
to the user.

Bug 2541.
```
  40f8cca3
- Don't set stage_out email for a Cray Burst Buffer if not set. · 523d193e
  Tim Wickberg authored Apr 14, 2016
```
Otherwise --mail-type=ALL will send an unexpected stage_out message back
to the user.

Bug 2541.
```
  523d193e
- Add "--with-cray_dir" build/configure option · 1768a63a
  Morris Jette authored Apr 14, 2016
  
  1768a63a
- Introduce siphash, and use it. · 1a764b45
  Janne Blomqvist authored Apr 14, 2016
```
Siphash is a state of the art keyed hash function that is performance competitive with the usual non-cryptographic hash functions. It's used as the default hash function backing hash tables in e.g. Perl, Python, Rust, and so on. Here we initially use it for the gid cache hash table, and in the common xhash implementation.
```
  1a764b45
- Add SipHash reference implementation · 3d946a02
  Jean-Philippe Aumasson authored Apr 14, 2016
  
  3d946a02
- Partial revert of commit 2dd920b9 which added duplicate code which broke · 51fa53b1
  Danny Auble authored Apr 14, 2016
```
sacctmgr list events
```
  51fa53b1
- Remove extraneous null check to silence Coverity warning. · db599fa5
  Tim Wickberg authored Apr 14, 2016
```
step_ptr->job_ptr is already dereferenced several times by now, so
null check is unnecessary here.
```
  db599fa5
- Fix bluegene build. · a25d37c0
  Brian Christiansen authored Apr 14, 2016
  
  a25d37c0
- Merge branch 'slurm-15.08' · 3f6a4720
  Morris Jette authored Apr 14, 2016
```
Conflicts:
	NEWS
	src/plugins/accounting_storage/mysql/as_mysql_resv.c
```
  3f6a4720
- Set burst buffer reason for job · 49d483db
  Morris Jette authored Apr 14, 2016
```
If a job fails stage in, set its reason to BurstBufferOperation
with a string describing what happened. Previously the reason was
set to AdminHeld on stage-in failure.
```
  49d483db
- Update NEWS · dced435d
  Brian Christiansen authored Apr 14, 2016
```
For commits:
f980c588
510abf23
```
  dced435d
- Fix documentation · f1b82903
  Brian Christiansen authored Apr 14, 2016
  
  f1b82903
- Use function to get sacct --units. · 819bd3ff
  Brian Christiansen authored Apr 13, 2016
  
  819bd3ff
- Parse tres values with unit suffixes [KMGTP] · f980c588
  Brian Christiansen authored Apr 13, 2016
  
  f980c588
- Convert tres values to corresponding units. · 510abf23
  Brian Christiansen authored Apr 13, 2016
```
MB for memory and bb.
```
  510abf23
- Add --units=[KMGP] option to sacct to display values in specific unit type. · e937072b
  Brian Christiansen authored Apr 12, 2016
```
Bug 1783
```
  e937072b