Commits · 4f24ccb5e6f63e91647da194cb823e5f73e152fb · Manuel G. Marciani / ces_slurm_simulator

01 Aug, 2012 2 commits
- FRONTEND - Made error warning more apparent if a frontend node isn't · 4f24ccb5
  Danny Auble authored Aug 01, 2012
```
configured correctly.
```
  4f24ccb5
- Fix for sacct --state=S · 4be73163
  Danny Auble authored Jul 31, 2012
```
Move code to handle multiple clusters.  Previous code could have issues
where job_db_inx space could overlap.
```
  4be73163
31 Jul, 2012 8 commits

Fixed sacct --state=S query to return information about suspended jobs · f8ff6b38
Danny Auble authored Jul 31, 2012
```
current or in the past.
```
f8ff6b38
BLUEGENE - correct start time setup when no jobs are blocking the way · 50f698d9
Mark Nelson authored Jul 31, 2012
```
from Mark Nelson
```
50f698d9

Use mount and umount syscalls when handling cgroup namespaces. · 485c80bc

Janne Blomqvist authored Jul 31, 2012

Using the syscalls directly rather than calling bin/(u)mount via
system() avoids a few fork + exec calls, and provides better error
handling if something goes wrong.

Users of this functionality are also updated to use slurm_strerror in
order to provide a more informative error message.

The mount and umount syscalls are Linux-specific, but so are cgroups
so no portability is lost.

485c80bc

remove last patch to give author credit · 557c52d1
Danny Auble authored Jul 31, 2012

557c52d1

Use mount and umount syscalls when handling cgroup namespaces. · c3889ec4

Danny Auble authored Jul 31, 2012

Using the syscalls directly rather than calling bin/(u)mount via
system() avoids a few fork + exec calls, and provides better error
handling if something goes wrong.

Users of this functionality are also updated to use slurm_strerror in
order to provide a more informative error message.

The mount and umount syscalls are Linux-specific, but so are cgroups
so no portability is lost.

c3889ec4

Use mount and umount syscalls when handling cgroup namespaces. · b4c1d3d7

Danny Auble authored Jul 31, 2012

Using the syscalls directly rather than calling bin/(u)mount via
system() avoids a few fork + exec calls, and provides better error
handling if something goes wrong.

Users of this functionality are also updated to use slurm_strerror in
order to provide a more informative error message.

The mount and umount syscalls are Linux-specific, but so are cgroups
so no portability is lost.

b4c1d3d7

BGQ - added version string to the load of the runjob_mux plugin to verify · 610cfe65
Danny Auble authored Jul 31, 2012
```
    the current plugin has been loaded when using runjob_mux_refresh_config
```
610cfe65

Orphan/misleading comment · a0a703c0

Don Lipari authored Jul 31, 2012

These comments were orphaned with this commit:  874f797f
Move the start time calculation of pending jobs into a separate pthread on Nov 3, 2010.

a0a703c0

26 Jul, 2012 1 commit

Correct parsing of srun/sbatch input/output/error file names starting with "none" · 4234e00a

Morris Jette authored Jul 26, 2012

Correct parsing of srun/sbatch input/output/error file names so that only
the name "none" is mapped to /dev/null and not any file name starting
with "none" (e.g. "none.o"). This fixes bug #98.

4234e00a

25 Jul, 2012 3 commits
- Fix typo in slurm.conf man page · 8071bbb8
  Morris Jette authored Jul 25, 2012
  
  8071bbb8
- Cgroup release example failed if the lssubsys command is not found. · daaa1465
  Janne Blomqvist authored Jul 25, 2012
  
  daaa1465
- Correction to cgroup.conf example, parameter name was wrong · 1e8d3d45
  Janne Blomqvist authored Jul 25, 2012
  
  1e8d3d45
24 Jul, 2012 3 commits
- Merge branch 'slurm-2.3' into slurm-2.4 · 76b06e81
  Morris Jette authored Jul 24, 2012
  
  76b06e81
- Gres: Fix for tracking allocated resources when one item and associated file · 102258a2
  Morris Jette authored Jul 24, 2012
```
Gres: If a gres has a count of one and an associated file then when doing
a reconfiguration, the node's bitmap was not cleared resulting in an
underflow upon job termination or removal from scheduling matrix by the
backfill scheduler.
```
  102258a2
- BGQ - remove debug · 76963b4b
  Danny Auble authored Jul 24, 2012
  
  76963b4b
23 Jul, 2012 1 commit

Cray and BlueGene: Correct logic for front-end node allocation tracking · ca95f242

Morris Jette authored Jul 23, 2012

Cray and BlueGene - Do not treat lack of usable front-end nodes when
slurmctld deamon starts as a fatal error. Also preserve correct front-end
node for jobs when there is more than one front-end node and the slurmctld
daemon restarts.

ca95f242

19 Jul, 2012 8 commits
- move a verbose message to debug · bba19262
  Danny Auble authored Jul 19, 2012
  
  bba19262
- BLUEGENE - Fix for handling blocks when a larger block will not free and · 1b2b3c85
  Danny Auble authored Jul 19, 2012
```
while it is attempting to free underlying hardware is marked in error
making small blocks overlapping with the freeing block.  This only
applies to dynamic layout mode.
```
  1b2b3c85
- Update sacct man page with respect to explaining PrivateData option · 0430ca8b
  Bill Brophy authored Jul 19, 2012
  
  0430ca8b
- Remove spaces from recent patch · 7ddca915
  Morris Jette authored Jul 19, 2012
  
  7ddca915
- Add "define _GNU_SOURCE" to avoid warning about undefined eccess function · e41807f3
  Morris Jette authored Jul 19, 2012
  
  e41807f3
- Note contributions by Francois Diakhate (CEA) · 9dfa657b
  Morris Jette authored Jul 19, 2012
  
  9dfa657b
- More robust verification of the TMPDIR · 7a320bd5
  Francois Diakhate authored Jul 19, 2012
  
  7a320bd5
- Reset backfilled job counter only when explicitly cleared using scontrol. · b4202119
  Alejandro Lucero Palau authored Jul 19, 2012
  
  b4202119
17 Jul, 2012 3 commits
- In sview, only report count of requested nodes if job is pending. · ab2dea3a
  Morris Jette authored Jul 17, 2012
```
This corresponds to commit dd2dce54
from Mark Grondona's work in squeue, but applied to the sview command.
```
  ab2dea3a
- Merge pull request #19 from grondo/slurm-2.4-minor-fixes · dd2dce54
  Morris Jette authored Jul 17, 2012
```
Slurm 2.4 minor fixes
```
  dd2dce54
- Note how optional arguments to the commands are parsed in man pages · f846d54b
  Morris Jette authored Jul 17, 2012
  
  f846d54b
16 Jul, 2012 1 commit
- Note limited sbatch support for --immediate option · e063642d
  Morris Jette authored Jul 16, 2012
```
This addresses trouble ticket 85
```
  e063642d
13 Jul, 2012 9 commits

BGQ - fix to handle sub block but larger than 1 midplane step in the · 3f38dbd6
Danny Auble authored Jul 13, 2012
```
runjob_mux
```
3f38dbd6
Fix initialization of protocol_version for some messages to make sure it · b34e5c28
Danny Auble authored Jul 13, 2012
```
is always set when sending or receiving a message.
```
b34e5c28
BGL - Fix for syncing users on block from Tim Wickberg · 865bec2a
Tim Wickberg authored Jul 13, 2012

865bec2a

slurmd: set SLURM_CONF in prolog/epilog environment · b2b5b908

Mark A. Grondona authored Jul 11, 2012

Set SLURM_CONF in default prolog/epilog environment instead
of only in spank prolog/epilog environment.

This change fixes a potential hang during spank prolog/epilog
execution due to the possibility of memory allocation after
fork(2) and before exec(2) when invoking slurmstepd spank
prolog|epilog.

This also has the benefit that SLURM commands used in prolog and epilog
scripts will use the correct slurm.conf file.

b2b5b908

slurmstepd: don't call exec if task fails to get notification from parent · 9006dda4

Mark A. Grondona authored May 19, 2012

If exec_wait_child_wait_for_parent() fails for any reason, it is safer
to abort immediately rather than proceed to execute the user's job.

9006dda4

slurmstepd: Kill remaining children if fork fails · 5b8dba9e

Mark A. Grondona authored May 19, 2012

On a failure of fork(2), slurmstepd would print an error and exit,
possibly leaving previously forked children waiting.

Ensure a better cleanup by killing all active children on fork failure
before exiting slurmstepd.

5b8dba9e

slurmstepd: Close childfd of exec_wait_info in parent · eca089e3

Mark A. Grondona authored May 19, 2012

Close the read end of the pipe slurmstepd uses to notify children
it is time to call exec(2) in order to save one file descriptor per
task. (Previously, the read side of the pipe wasn't closed until
exec_wait_info was destroyed)

eca089e3

squeue: report number of nodes in completing for completing jobs · 2ddc6e70

Mark A. Grondona authored Jul 11, 2012

For some reason squeue was treating completing jobs the same as
pending jobs, and reported the number of nodes as the maximum of
requested nodelist, requested node count or CPUs (divided into nodes?)

This is in contrast to the squeue manpage which explicitly states
that the number of nodes reported for completing jobs should
be only the nodes that are still allocated to the job.

This patch removes the special handling of completing jobs in
src/squeue/print.c:_get_node_cnt(), so that the squeue output for
completing jobs matches documentation. A comment is also added
so that developers looking at the code understand what is going on.

2ddc6e70

Update to high throughput computing web page with more option descriptions · 46a3767e
Morris Jette authored Jul 12, 2012

46a3767e

12 Jul, 2012 1 commit
- move an info message to be debug · 2d3c09ae
  Danny Auble authored Jul 12, 2012
  
  2d3c09ae