Commits · 5d0767697ef92b6cd1f5cd102654677589783964 · Manuel G. Marciani / ces_slurm_simulator

28 Feb, 2012 1 commit
- Note recent SLURM changes. · 38619c30
  Morris Jette authored Feb 28, 2012
  
  38619c30
24 Feb, 2012 4 commits
- Add missing read lock to slurmctld/agent.c · 0a06f4e6
  Morris Jette authored Feb 24, 2012
  
  0a06f4e6
- Correct "scontrol show daemons" if multiple ControlMachine hosts configured · 10916457
  Morris Jette authored Feb 24, 2012
  
  10916457
- Fixed extremely hard to reproduce threading issue in assoc_mgr. · b4e5051b
  Danny Auble authored Feb 24, 2012
  
  b4e5051b
- UPdate NEWS for recent patches · 6da55b36
  Morris Jette authored Feb 23, 2012
  
  6da55b36
23 Feb, 2012 1 commit
- Fix smap regression to display nodes that are drained or down correctly. · 3f467a75
  Danny Auble authored Feb 22, 2012
  
  3f467a75
20 Feb, 2012 1 commit
- Modify linking to include "-ldl" only when needed · d1adfe62
  jette authored Feb 19, 2012
```
Patch from Aleksej Saushev.
```
  d1adfe62
06 Feb, 2012 1 commit

The openpty(3) call used by slurmstepd to allocate a pseudo-terminal · 2a1c08b0

Danny Auble authored Feb 02, 2012

is a convenience function in BSD and glibc that internally calls
the equivalent of

    int masterfd = open("/dev/ptmx", flags);
    grantpt (masterfd);
    unlockpt (masterfd);
    int slavefd = open (slave, O_RDRW|O_NOCTTY);

(in psuedocode)

On Linux, with some combinations of glibc/kernel (in this
case glibc-2.14/Linux-3.1), the equivalent of grantpt(3) was failing
in slurmstepd with EPERM, because the allocated pty was getting
root ownership instead of the user running the slurm job.

From the POSIX description of grantpt:

 "The grantpt() function shall change the mode and ownership of the
  slave pseudo-terminal device... The user ID of the slave shall
  be set to the real UID of the calling process..."

 http://pubs.opengroup.org/onlinepubs/007904875/functions/grantpt.html

This means that for POSIX-compliance, the real user id of slurmstepd
must be the user executing the SLURM job at the time openpty(3) is
called. Unfortunately, the real user id of slurmstepd at this
point is still root, and only the effective uid is set to the user.

This patch is a work-around that uses the (non-portable) setresuid(2)
system call to reset the real and effective uids of the slurmstepd
process to the job user, but keep the saved uid of root. Then after
the openpty(3) call, the previous credentials are reestablished
using the same call.

2a1c08b0

03 Feb, 2012 1 commit

Fix for srun with --exclude and --nodes · a4551158

Morris Jette authored Feb 03, 2012

Fix for srun allocating running within existing allocation with --exclude
option and --nnodes count small enough to remove more nodes.

    > salloc -N 8
    salloc: Granted job allocation 1000008
    > srun -N 2 -n 2 --exclude=tux3 hostname
    srun: error: Unable to create job step: Requested node configuration is not available

Patch from Phil Eckert, LLNL.

a4551158

02 Feb, 2012 1 commit

Fix bug in step task distribution · fac3586b

Morris Jette authored Feb 02, 2012

Fix bug in step task distribution when nodes are not configured in numeric
order. Patch from Hongjia Cao, NUDT.

fac3586b

01 Feb, 2012 2 commits

Fix job requeue bug · c0a7a7a4

Morris Jette authored Feb 01, 2012

Fix bug when requeued batch job is scheduled to run on a different node
zero, but attemts job launch on old node zero causing fatal error
"Invalid host_index -1 for job #"

c0a7a7a4

Avoid slurmctld abort due to bad pointer · 43936335

Morris Jette authored Jan 31, 2012

Avoid slurmctld abort due to bad pointer when setting an advanced
reservation MAINT flag if it contains no nodes (only licenses).

43936335

31 Jan, 2012 3 commits
- BLUEGENE - fix for not allowing jobs if all midplanes are drained and all · 1e40f647
  Danny Auble authored Jan 31, 2012
```
blocks are in an error state.
```
  1e40f647
- Note nature of latest change · 7189ecaa
  Morris Jette authored Jan 31, 2012
  
  7189ecaa
- Fix to the multifactor priority plugin to calculate effective usage earlier · 7d9e3ed2
  Danny Auble authored Jan 31, 2012
```
to give a correct priority on the first decay cycle after a restart of the
slurmctld. Patch from Martin Perry, Bull.
```
  7d9e3ed2
27 Jan, 2012 2 commits

Fix typo in accounting when using reservations. Patch from Alejandro · 92487dec
Danny Auble authored Jan 27, 2012
```
Lucero Palau.
```
92487dec

Fix slurmd/slurmstepd daadlock condition · 3579aa43

Morris Jette authored Jan 26, 2012

This patch was previously applied to SLURM v2.4 and is being back-ported
due to problems being reported in SLURM v2.3. Original commit is here
https://github.com/SchedMD/slurm/commit/4c0eea7b8c20ccb1cacad51838a1ea8257cc637d

3579aa43

25 Jan, 2012 1 commit

Set DEFAULT flag in partition structure · 9f4ef925

Morris Jette authored Jan 24, 2012

Set DEFAULT flag in partition structure when slurmctld reads the
configuration file. Patch from Rémi Palancher. Note the flag is set
when the information is sent via RPC for sinfo.

9f4ef925

24 Jan, 2012 1 commit
- Start v2.3.4 NEWS · 10fcf40e
  Morris Jette authored Jan 24, 2012
  
  10fcf40e
20 Jan, 2012 1 commit

Fix for segv in slurmctld dependency processing · 49ecf2d0

Morris Jette authored Jan 20, 2012

Fix for possible invalid memory reference in slurmctld in job dependency
logic. Patch from Carles Fenoy (Barcelona Supercomputer Center).

49ecf2d0

19 Jan, 2012 1 commit
- Fix PrivateFlags bug when using Priority Multifactor plugin. If using sprio · 854a2025
  Danny Auble authored Jan 19, 2012
```
all jobs would be returned even if the flag was set.
Patch from Bill Brophy, Bull.
```
  854a2025
18 Jan, 2012 1 commit

Correction to --switch option implemenation · 8f1d9b57

Morris Jette authored Jan 18, 2012

Fix bug in --switch option with topology resulting in bad switch count use.
Patch from Alejandro Lucero Palau (Barcelona Supercomputer Center).

8f1d9b57

13 Jan, 2012 3 commits
- Fix for sacct printing CPUTime(RAW) where the the is greater than a 32 bit · adf582b0
  Danny Auble authored Jan 13, 2012
```
number.
```
  adf582b0
- minor updates for latest commit · 08854a56
  Morris Jette authored Jan 13, 2012
  
  08854a56
- Let operators see reservation data even if private · 4c24fd7d
  Morris Jette authored Jan 12, 2012
```
Let operators see reservation data even if "PrivateData=reservations" flag
is set in slurm.conf. Patch from Don Albert, Bull.
```
  4c24fd7d
09 Jan, 2012 2 commits

Fix bug in srun --multi-prog configuration file · f59f6a27

Morris Jette authored Jan 09, 2012

Fix bug in srun --multi-prog configuration file to avoid printing duplicate
record error when "*" is used at the end of the file for the task ID. It
means all task IDs not otherwise identified.

f59f6a27

Fix possible slurmd deadlock from sbast command. · cb3b9fb5

Morris Jette authored Jan 09, 2012

Fix race condition where sbcast command can result in deadlock of slurmd
daemon. Patch by Don Albert, Bull.

cb3b9fb5

28 Dec, 2011 1 commit
- Permit gres count configuration of zero. · 0d779c41
  Morris Jette authored Dec 28, 2011
  
  0d779c41
21 Dec, 2011 1 commit
- Modify PAM module to use same libslurm as built with · d46b33f6
  Morris Jette authored Dec 20, 2011
  
  d46b33f6
19 Dec, 2011 1 commit
- Fix bug in sview layout if node count less than configured grid_x_width. · be1f9868
  Morris Jette authored Dec 19, 2011
  
  be1f9868
17 Dec, 2011 1 commit
- Note recent code changes · f455c48a
  Morris Jette authored Dec 16, 2011
  
  f455c48a
15 Dec, 2011 1 commit

Prevent resetting a held job's priority · fa477448

Morris Jette authored Dec 14, 2011

Prevent resetting a held job's priority when updating other job parameters.
Patch from Alejandro Lucero Palau, BSC.

fa477448

14 Dec, 2011 1 commit
- Handle numeric suffix of "T" for terabyte units · f58a563f
  Morris Jette authored Dec 14, 2011
```
Patch from John Thiltges, University of Nebraska-Lincoln.
```
  f58a563f
09 Dec, 2011 4 commits
- Add slashes in front of derived exit code when modifying a job. · fca0660c
  Danny Auble authored Dec 09, 2011
  
  fca0660c
- Fixed issue with comment field being used in a job finishing before it · a178318f
  Danny Auble authored Dec 09, 2011
```
starts in accounting.
```
  a178318f
- Fixed issue with QOS preemption when adding new QOS. · 614cd5fb
  Danny Auble authored Dec 09, 2011
  
  614cd5fb
- sacct search for jobs using filtering was ignoring wckey filter. · 66d68934
  Morris Jette authored Dec 09, 2011
  
  66d68934
08 Dec, 2011 1 commit
- BLUEGENE - Fixed preemption issue. · bcc3c6a9
  Danny Auble authored Dec 07, 2011
  
  bcc3c6a9
06 Dec, 2011 1 commit

Permit pending job to exeeded partition limit with QOS flag change. · 0e1abeda

Morris Jette authored Dec 06, 2011

One of our testers discovered a regression in version 2.3.1.  If a job is
pending due to PartitionNodeLimit and the limit is relieved with a
'sacctmgr modify qos name=<qos name> set flags=partitionmaxnodes' new jobs
exceeding the partition limit (but not the QOS limit) are allowed to run.
However, the pending job is never allowed to run.  Attached is a patch to
address this problem.  FYI, this problem doesn't exist in version 2.4.
Patch from Bill Brophy, Bull.

0e1abeda

05 Dec, 2011 1 commit
- Fix task/cgroup plugin error when used with GRES · 6443e89f
  Morris Jette authored Dec 05, 2011
```
Patch by Alexander Bersenev (Institute of Mathematics and Mechanics, Russia).
```
  6443e89f