Commits · 240514e80d6746768fce3c0adf393693b9e92c60 · Manuel G. Marciani / ces_slurm_simulator

07 Apr, 2016 2 commits
- Fix handling for single-character prognames · 11320ebc
  Sami Ilvonen authored Apr 07, 2016
  
  11320ebc
- fix for job "--contiguous" option · 47a07b54
  Morris Jette authored Apr 06, 2016
```
Fix for job "--contiguous" option that could cause job allocation/launch
    failure or slurmctld crash.
bug 2573
```
  47a07b54
06 Apr, 2016 7 commits

Start NEWS for v15.08.11 · 3a8ecf32
Morris Jette authored Apr 06, 2016

3a8ecf32

Fix situation on a heterogeneous memory cluster where the order of · 6150f565

Danny Auble authored Apr 06, 2016

constraints mattered in a job.

    Details include:
    A job doesn't request memory but the system is running
    with CR_*MEMORY with no default memory limit and the job requests nodes
    with features of different sizes.  Previously the order of constraints
    mattered where the smaller memory node would need to be requested first
    or the job would fail.

    Bug 2608

6150f565

Revert "Fix situation on a heterogeneous memory cluster where the order of" · 3ae45a51
Danny Auble authored Apr 06, 2016
```
This reverts commit f559a55c.
```
3ae45a51

Fix situation on a heterogeneous memory cluster where the order of · f559a55c

Danny Auble authored Apr 06, 2016

constraints mattered in a job.

Details include:
A job doesn't request memory but the system is running
with CR_*MEMORY with no default memory limit and the job requests nodes
with features of different sizes.  Previously the order of constraints
mattered where the smaller memory node would need to be requested first
or the job would fail.

Bug 2608

f559a55c

Don't change job time limit when updating unrelated field in a job · 594c7997

Morris Jette authored Apr 06, 2016

Previous logic would get an account and/or QOS time limit and use
  that value to overwrite the incoming RPC's NO_VAL value, which
  would change a job's time limit when changing an unrelated
  field (e.g. priority, QOS, etc.).
bug 2610

594c7997

Avoid double calculation on partition QOS if the job is using the same QOS. · e17a7eaf
Danny Auble authored Apr 06, 2016

e17a7eaf
Add SLURM_UMASK env var to user job · 58dea246
Morris Jette authored Apr 06, 2016
```
bug 2609
```
58dea246

05 Apr, 2016 1 commit

Fix backfill scheduler race condition · d8b18ff8

Morris Jette authored Apr 05, 2016

Fix backfill scheduler race condition that could cause invalid pointer in
    select/cons_res plugin. Bug introduced in 15.08.9, commit:
    efd9d35e

The scenario is as follows
1. Backfill scheduler is running, then releases locks
2. Main scheduling loop starts a job "A"
3. Backfill scheduler resumes, finds job "A" in its queue and
   resets it's partition pointer.
4. Job "A" completes and tries to remove resource allocation record
   from select/cons_res data structure, but fails to find it because
   it is looking in the table for the wrong partition.
5. Job "A" record gets purged from slurmctld
6. Select/cons_res plugin attempts to operate on resource allocation
   data structure, finds pointer into the now purged data structure
   of job "A" and aborts or gets SEGV
Bug 2603

d8b18ff8

04 Apr, 2016 2 commits
- Remove duplicates from AccountingStorageTRES · 921c59e4
  Danny Auble authored Apr 04, 2016
  
  921c59e4
- If using PrologFlags=contain: Don't launch the extern step if a job is · 91a83e41
  Danny Auble authored Apr 04, 2016
```
canceled while launching.
```
  91a83e41
02 Apr, 2016 2 commits
- checkpoint/blcr plugin: Fix memory leak. · 08d520db
  Morris Jette authored Apr 02, 2016
  
  08d520db
- Fix potential divide by zero when tree_width=1 · ef8c5e1b
  Danny Auble authored Apr 01, 2016
  
  ef8c5e1b
01 Apr, 2016 1 commit

Rename "Shared" to "OverSubscribe" · 5fe0915e

Morris Jette authored Apr 01, 2016

Rename partition configuration from "Shared" to "OverSubscribe". Rename
salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old
options will continue to function. Output field names also changed in
scontrol, sinfo, squeue, and sview.

5fe0915e

31 Mar, 2016 2 commits

power/cray fix for nodes not ready · 5b0800e4

Morris Jette authored Mar 31, 2016

Power/cray: Don't specify NID list to Cray APIs. If any of those nodes are
    not in a ready state, the API returned an error for ALL nodes rather than
    valid data for nodes in ready state.
bug 2332

5b0800e4

Make error message in the pmi2 code to debug as the issue can be expected · bcccd20c
Matthieu Hautreux authored Mar 30, 2016
```
and retries are done making the error message a little misleading.
```
bcccd20c

30 Mar, 2016 5 commits

Update node socket/core counts on the fly · 606948a8

Morris Jette authored Mar 30, 2016

Update a node's socket and cores per socket counts as needed after a node
boot to reflect configuration changes which can occur on KNL processors.
Note that the node's total core count must not change, only the distribution
of cores across varying socket counts (KNL NUMA nodes treated as sockets by
Slurm).

606948a8

Fix issue where if a slurmdbd rollup lasted longer than 1 hour the · 2bec1975
Danny Auble authored Mar 29, 2016
```
rollup would effectively never run again.

bug 2575

and sort of bug 2596
```
2bec1975

Replace SchedulerParameters option assoc_limit_continue · a685e0e9

Morris Jette authored Mar 29, 2016

Remove the SchedulerParameters option of "assoc_limit_continue", making it
the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop"
is set and a job cannot start due to association limits, then do not attempt
to initiate any lower priority jobs in that partition. Setting this can
decrease system throughput and utlization, but avoid potentially starving
larger jobs by preventing them from launching indefinitely.

a685e0e9

Start NEWS for v16.05.0-pre3 · f0f1286e
Morris Jette authored Mar 29, 2016

f0f1286e
Update NEWS for start of v15.08.10 · 173eb1e6
Morris Jette authored Mar 29, 2016

173eb1e6

29 Mar, 2016 1 commit
- Add SchedulerParameter no_env_cache, if set no env cache will be use when · 1f72fbe9
  Danny Auble authored Mar 29, 2016
```
launching a job, instead the job will fail and drain the node if the env
isn't loaded normally.

bug 2546
```
  1f72fbe9
28 Mar, 2016 4 commits

Add functionality to reset the lft and rgt values of the association table · 8ee976b4
Danny Auble authored Mar 28, 2016
```
with the slurmdbd.
```
8ee976b4
When a stepd is about to shutdown and send it's response to srun · ea470f71
Danny Auble authored Mar 28, 2016
```
make the wait to return data only hit after 500 nodes and configurable
based on the TcpTimeout value.
```
ea470f71

task/cgroup - Fix task binding to CPUs bug · ddf6d9a4

Morris Jette authored Mar 28, 2016

There was a subtle bug in how tasks were bound to CPUs which could result
in an "infinite loop" error. The problem was various socket/core/threasd
calculations were based upon the resources allocated to a step rather than
all resources on the node and rounding errors could occur. Consider for
example a node with 2 sockets, 6 cores per socket and 2 threads per core.
On the idle node, a job requesting 14 CPUs is submitted. That job would
be allocted 4 cores on the first socket and 3 cores on the second socket.
The old logic would get the number of sockets for the job at 2 and the
number of cores at 7, then calculate the number of cores per socket at
7/2 or 3 (rounding down to an integer). The logic layouting out tasks
would bind the first 3 cores on each socket to the job then not find any
remaining cores, report the "infinite loop" error to the user, and run
the job without one of the expected cores. The problem gets even worse
when there are some allocated cores on a node. In a more extreme case,
a job might be allocated 6 cores on one socket and 1 core on a second
socket. In that case, 3 of that job's cores would be unused.
bug 2502

ddf6d9a4

Fix for srun signal handling threading problem · c8d36dba

Morris Jette authored Mar 28, 2016

This is a revision to commit 1ed38f26
The root problem is that a pthread is passed an argument which is
a pointer to a variable on the stack. If that variable is over-written,
the signal number recieved will be garbage, and that bad signal
number will be interpretted by srun to possible abort the request.

c8d36dba

26 Mar, 2016 1 commit

Revert commit · c1dde86c

Morris Jette authored Mar 25, 2016

The previous commit obviously fixed a problem, but introduced a different
set of problems. This will be pursued later, perhaps in version 16.05.

c1dde86c

25 Mar, 2016 3 commits

Revert commit · f5920b77

Morris Jette authored Mar 25, 2016

With some configurations and systems, errors of the following sort were
occuring:
task/cgroup: task[1] infinite loop broken while trying to provision compute elements using block
task/cgroup: task[1] unable to set taskset '0x0'

f5920b77

Add "sacctmgr lost jobs" to report orphaned jobs on clsuter. · 2dd920b9
Nathan Yee authored Mar 25, 2016
```
Bug 1706
```
2dd920b9

burst_buffer/cray - pre-run fail fix · 5a48207e

Morris Jette authored Mar 25, 2016

burst_buffer/cray - If the pre-run operation fails then don't issue
    duplicate job cancel/requeue unless the job is still in run state. Prevents
    jobs hung in COMPLETING state.
bug 2587

5a48207e

24 Mar, 2016 1 commit
- Remove Rgt from the association output of scontrol show assoc_mgr. Rgt · ddfd2781
  Danny Auble authored Mar 24, 2016
```
isn't kept up to date in the cache.
```
  ddfd2781
23 Mar, 2016 4 commits

gang scheduling bug fix · 5f1e78f6

Morris Jette authored Mar 23, 2016

Fix gang scheduling resource selection bug which could prevent multiple jobs
    from being allocated the same resources. Bug was introduced in 15.08.6,
    commit 44f491b8

5f1e78f6

task/cgroup: Fix for task binding anomaly · efa83a02

Morris Jette authored Mar 23, 2016

Here's how to reproduce on smd-server with 2 sockets, 6 cores per
socket and 2 threads per core, just run the following command line
3 times in quick succession (all active at the same time):
srun --cpus-per-task=4 -m block sleep 30
What was happening is the first job would be allocated cores 0+1
The second job would be allocated cores 2+3
The thrid job would test use of cores 0-3 then exit because the
 job only needs 4 CPUs. The resulting core binding would include
 NO CPUs. The new logic tests that the core being considered for
 use actually has some resources available to the job before
 updating the counter which is being tested against the needed
 CPU counter.

efa83a02

task/cgroup: Fix for task layout logic when disabled resources. · 6c14b969

Morris Jette authored Mar 23, 2016

Specifically add the HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM flag when
loading configuration from HWLOC library. Previous logic in
task/cgroup did not do this, which was different behaviour from
how slurmd gets configuration information. Here's the HWLOC
documentation:
HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM
Detect the whole system, ignore reservations and offline settings.
Gather all resources, even if some were disabled by the administrator.
For instance, ignore Linux Cpusets and gather all processors and memory
nodes, and ignore the fact that some resources may be offline.

Without this flag, I was rarely observing a bad core count, which
resulted in the logic layout out tasks wrong and generating an error:
task/cgroup: task[0] infinite loop broken while trying to provision compute elements using cyclic

bug 2502

6c14b969

Fix check of per-user qos limits on the initial run by a user. · 2ed5c7fb
Danny Auble authored Mar 23, 2016

2ed5c7fb

21 Mar, 2016 2 commits

Change point where burst buffer env vars are set · 54f314e7

Morris Jette authored Mar 21, 2016

burst_buffer/cray: Set environment variables just before starting job rather
    than at job submission time to reflect persistent buffers created or
    modified while the job is pending.
bug 2545

54f314e7

Fix deadlock issue with burst_buffer/cray when a newly created burst · dcfa6ec0

Danny Auble authored Mar 21, 2016

buffer is found.

Bug 2576

What happened was a function was doing a double read lock which isn't
awesome to begin with, but not really horrible (if all you are doing is
read locks anyway).  The problem was after the first lock was locked a
different thread was going for a write lock and so when the second
read lock came in it created deadlocked.

dcfa6ec0

18 Mar, 2016 2 commits

Added SchedulingParameters option of "bf_min_prio_reserve" · 45560872
Morris Jette authored Mar 18, 2016
```
Jobs below the specified threshold will not have resources reserved for them.
bug 2565
```
45560872

Fix for srun abort on SIGSTOP+SIGCONT · 1ed38f26

Morris Jette authored Mar 18, 2016

Avoid possibly aborting srun that gets simultaneous SIGSTOP+SIGCONT while
    creating the job step. The result is that the signal hanlder gets a
    argument (the signal received) of zero.

Here's a log, window 1:
$ srun hostname
srun: Job step creation temporarily disabled, retrying
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 18
srun: I Got signal 0
srun: Cancelled pending job step

Window 2:
$  kill -STOP 18696 ; kill -CONT 18696
$  kill -STOP 18696 ; kill -CONT 18696
$  kill -STOP 18696 ; kill -CONT 18696
....

bug 2494

1ed38f26