Commits · ac6ef1f985d7978f8507925d7ab5e7a70287a193 · Manuel G. Marciani / ces_slurm_simulator

19 Apr, 2017 1 commit
- Refactor some sreport logic for re-use · ac6ef1f9
  Morris Jette authored Apr 19, 2017
  
  ac6ef1f9
18 Apr, 2017 8 commits
- Properly remove clusters when removed from fed · e3c2892e
  Brian Christiansen authored Apr 18, 2017
```
In 6eec8022, the cluster's recv connection is now being destroyed when
the cluster is being destroyed. The problem that showed itself was that
when a remote cluster is removed from the federation, the controller
calls slurmdb_destroy_federation_rec() which destroys the cluster's in
the list. Both the persistent recv thread and the cluster's recv are
pointing to the same thing so when the controller removed the recv
persistent connection the recv thread was pointing to bad memory.
```
  e3c2892e
- sreport cluster UserUtilizationByAccount federation support added · 0f958836
  Morris Jette authored Apr 18, 2017
  
  0f958836
- Merge remote-tracking branch 'origin/slurm-17.02' · 424e031b
  Danny Auble authored Apr 18, 2017
  
  424e031b
- sreport cluster AccountUtilizationByUser federation enabled · 309cd581
  Morris Jette authored Apr 18, 2017
  
  309cd581
- Docs - cannot have both DEPTH_OBLIVIOUS and FAIR_TREE set for PriorityFlags. · 38259742
  Tim Wickberg authored Apr 18, 2017
  
  38259742
- Move some sreport functions into common.c · d4993fb4
  Morris Jette authored Apr 18, 2017
  
  d4993fb4
- Fix issue with cleaning up cpuset and devices cgroups when multiple steps · 24e2cb07
  Danny Auble authored Apr 18, 2017
```
end at the same time.

Bug 3604.
```
  24e2cb07
- Get sreport user top working with federation · b9e5943a
  Morris Jette authored Apr 18, 2017
  
  b9e5943a
17 Apr, 2017 3 commits
- Enhancements to sreport in federation · 1bff0dbc
  Morris Jette authored Apr 17, 2017
  
  1bff0dbc
- sreport cluster utilization working in federation · ed795a60
  Morris Jette authored Apr 17, 2017
  
  ed795a60
- sreport add --local option · 3c7b74a2
  Morris Jette authored Apr 17, 2017
  
  3c7b74a2
15 Apr, 2017 18 commits
- Add missing sreport env var · 01f8f56c
  Morris Jette authored Apr 15, 2017
  
  01f8f56c
- Modify sreport for federated cluster support · 584ba784
  Morris Jette authored Apr 15, 2017
```
Modify sreport to report all jobs in federation by default. Also add --local
    option.
```
  584ba784
- Add sacct --cluster=all support · 3c94e4f4
  Morris Jette authored Apr 15, 2017
```
Modify sacct to accept "--cluster all" option (in addition to the old
    "--cluster -1", which is still accepted). This makes sacct behave
    more like the other commands.
```
  3c94e4f4
- Merge branch 'slurm-17.02' · 8bd1969c
  Morris Jette authored Apr 15, 2017
  
  8bd1969c
- sbatch man page format fix · 379f9ab4
  Morris Jette authored Apr 15, 2017
  
  379f9ab4
- Add federation support for sacct · 984fdafe
  Morris Jette authored Apr 15, 2017
```
Modify sacct to report all jobs in federation by default. Also add --local
    option.
```
  984fdafe
- improve formatting of squeue --help message · f1189fa9
  Morris Jette authored Apr 14, 2017
  
  f1189fa9
- Restore REQUEST_JOB_ALLOCATION_INFO_LITE RPC · 68e48687
  Morris Jette authored Apr 14, 2017
```
This RPC is still needed for version 17.02 commands executed
  against version 17.11 slurmctld daemon
```
  68e48687
- Docs - add initial SLUG17 info to meetings.shtml · 6507225c
  Tim Wickberg authored Apr 13, 2017
  
  6507225c
- Sanity check in perl api to make sure we get a new structure back from the · b2ef7488
  Morris Jette authored Apr 13, 2017
```
perl api.
```
  b2ef7488
- Make it so if xmalloc returns NULL we still get a size in the perl api. · 68e5a96b
  Morris Jette authored Apr 13, 2017
```
This changed in 17.11 where if the size was 0 we would return 0 which messes
up the perl api.

Bug 3644
```
  68e5a96b
- When running the "scontrol top" command, make sure that all of the user's · d41f1a89
  Morris Jette authored Apr 13, 2017
```
jobs have a priority that is lower than the selected job. Previous logic
would permit other jobs with equal priority (no jobs with higher priority).

Bug 3650
```
  d41f1a89
- docs - fix spelling of 'display' · 2edc2e7a
  Tim Wickberg authored Apr 13, 2017
  
  2edc2e7a
- Reset backfill timers correctly without skipping over them in certain · 0241c36a
  Dominik Bartkiewicz authored Apr 13, 2017
```
circumstances.
```
  0241c36a
- Add --array-unique to squeue which will display one unique pending job · 3193033c
  Tim Shaw authored Apr 12, 2017
```
array element per line.

Bug 3573
```
  3193033c
- Always free the old dependency string when asked to rebuild. · 96a8867b
  Bill Brophy authored Apr 12, 2017
```
If the depend_list is NULL or has zero elements, the string should
be cleared as well.

Bug 3651.
```
  96a8867b
- Fix segfault when using AdminComment field with job arrays. · 96899988
  Thomas Opfer authored Apr 07, 2017
```
The field needs to have its own copy, otherwise the pointer will
become invalid when xfree()'d by a separate array task.

Bug 3665.
```
  96899988
- Increase --cpu_bind and --mem_bind length limit to 1024 * 128 bytes. · c8e0d472
  Alejandro Sanchez authored Apr 07, 2017
```
So that it is the same max length as in src/common/env.c.

Used for explicitly laying out tasks on large CPU count nodes (e.g., KNL).

Bug 3675.
```
  c8e0d472
14 Apr, 2017 10 commits

Dispaly and sort job step's cluster name · 1190117e
Brian Christiansen authored Apr 14, 2017
```
For use in federated environments.
```
1190117e
Extract cluster_in_federation test in common place · 61c0a84e
Brian Christiansen authored Apr 14, 2017

61c0a84e
squeue: report fed job steps if in federation · 5b6e11b8
Brian Christiansen authored Apr 14, 2017

5b6e11b8
Ensure NULL is returned · 7bc058dc
Brian Christiansen authored Apr 14, 2017
```
in any case.
```
7bc058dc

Fix MPIR_partial_attach_ok issues for parallel debuggers. · 18e3d6fb

Dong Ahn authored Apr 14, 2017

As specified in MPIR debug interface
(https://www.mpi-forum.org/docs/mpir-specification-10-11-2010.pdf),
the presence of the MPIR_partial_attach_ok symbol
should inform the debugger that the initial startup synchronization
is implemented in such a way that the tool need not attach
nor continue MPI processes that the user is not interested in controlling.

To implement this, SLURM chose to send SIGCONT to those processes that are
not attached by the debugger.

However, the old code does not reliably detect the condition
in which a process is traced by the debugger, and this
has lead to various side effects.

On some systems (e.g., TOSS2), the old code sends SIGCONT to
all of the target processes including those attached by the debugger.
On newer systems (e.g., TOSS3), it does not send SIGCONT
to the target processes at all.

It seems that one of the reasons for such undefined behavior
is the use of CLONE_PTRACE.
@grondo found no documentation that indicates
CLONE_PTRACE is for the case where the process is being attached
by a debugger.
More importantly, this code is matching clone(2) flags
to proc(5) process flags, which are not the same, as task->flags
defined as PF_* flags from kernel source include/linux/sched.h.

This patch fixes these problems by replacing
the old detection logic with ones based on the TracerPid field
in /proc/<pid>/status.

From proc(5), TracerPid: PID of process tracing this process (0 if not
being traced).

18e3d6fb

Include submit_time when doing the sort for job scheduling. · 030d9d4b

Thomas Opfer authored Apr 14, 2017

Improve job scheduling sort after sorting by priority we now sort by
submit time and then by job id.  We used to not consider submit time.  This
handles the case where the job_ids have rolled or we are doing federation
scheduling.

Bug 3524

030d9d4b

Ran autogen.sh on ubuntu 17.04 · 9fddb466
Danny Auble authored Apr 14, 2017

9fddb466

Fix problems reported in latest coverity report · a93b6a07

Morris Jette authored Apr 14, 2017

All problems introduced in the course of changing un/pack logic
  required for removing pack jobs logic

a93b6a07

Merge branch 'unpack' · 0cba10d4
Morris Jette authored Apr 14, 2017

0cba10d4
Revert commit 133a4249 · 1fc38b96
Morris Jette authored Apr 14, 2017
```
bug 926
```
1fc38b96