Commits · 2d392fbccfd8b4a6b9dc8f5331db5aad3092f91d · Manuel G. Marciani / ces_slurm_simulator

06 Jul, 2015 1 commit

Fix check of getgrouplist to check the original size of array instead · 2d392fbc

Jason Coverston authored Jul 06, 2015

of the size returned.

This check is redundant though since getgrouplist will return -1 if it
tries to use more than ngroups_max.  Perhaps we should just take the check
out.

2d392fbc

02 Jul, 2015 18 commits
- mpi/mvapich: Replace spaces with tabs · 7bcc6814
  Morris Jette authored Jul 02, 2015
```
No change in logic
```
  7bcc6814
- MPI/MVAPICH plugin requires Munge for authentication · 90c14bb5
  Morris Jette authored Jul 02, 2015
```
Original patch from LLNL assumed Munge was installed, which would
  result in a build error if the Munge development package was
  not installed
```
  90c14bb5
- remove newlines in mvapich abort message before writing to syslog · 097e6093
  Adam Moody authored Jul 02, 2015
  
  097e6093
- avoid potential buffer overrun in mvapich abort message · 2ce202a9
  Adam Moody authored Jul 02, 2015
  
  2ce202a9
- change some fatals to errors, use xstrdup_printf, add comment in mvapich plugin · eeb408b9
  Adam Moody authored Jul 02, 2015
  
  eeb408b9
- add munge to mvapich plugin · 1d8d9dc8
  Adam Moody authored Jul 02, 2015
  
  1d8d9dc8
- Phil's patch to sreport to correct End time requests in the future · 0ab95050
  Don Lipari authored Jul 02, 2015
```
If a user runs sreport and specifies a time period that stretches into
the future, this patch sets the end of the period to the current time.
```
  0ab95050
- proctrack/cgroup: Move slurmstepd out of freezer cgroup before removal · c2ce30c2
  Mark A. Grondona authored Jul 02, 2015
```
In order to successfully remove the freezer cgroup at the end of
a job step, the slurmstepd process itself must first be moved
outside of the cgroup, or removal will always fail.

This fix moves the slurmstepd back to the root cgroup just
before the rmdir operations are attempted.
```
  c2ce30c2
- Change function name to be more descriptive · 506bad75
  Morris Jette authored Jul 02, 2015
  
  506bad75
- slurmd: return failure to signal job step if prolog is running · 6728119f
  Mark A. Grondona authored Jul 02, 2015
```
If the job prolog is running we can't send a signal to job step
tasks, so return SLURM_FAILURE instead of ESLURM_INVALID_JOB_ID.
This should cause the caller to retry, instead of assuming the
job step is not running on the node.
```
  6728119f
- slurmd: Do not launch job step during job prolog · c2fbf88f
  Mark A. Grondona authored Jul 02, 2015
```
If a job step request comes in while the slurm prolog is running,
slurmd will happily launch the job step. This means that a user
could run code before the prolog is complete, which could cause
strange errors or in some cases a security issue. Instead return
an error (EINPROGRESS for now) and do not allow job steps to run
during prolog.
```
  c2fbf88f
- slurmd: abstract function to search list of running prologs · d9468711
  Mark A. Grondona authored Jul 02, 2015
```
Create a new function, _prolog_is_running(), to determine if a job
prolog is currently running from within slurmd, and use this new
function in place of list_find_first in _wait_for_job_running_prolog.
```
  d9468711
- Do not return error from xcgroup_delete if cgroup already gone · 7e9ff90c
  Mark A. Grondona authored Jul 02, 2015
```
Some cgroup code would retry continuously if somehow a cgroup
didn't exist when xcgroup_delete() was called (See for instance
proctrack/cgroup's slurm_container_plugin_wait()). To fix this
for all callers, return SUCCESS from xcgroup_delete() if rmdir(2)
return ENOENT, since the cgroup is already deleted.
```
  7e9ff90c
- Add gres configuration test · 24d2308f
  Nathan Yee authored Jul 02, 2015
```
Test of gres.conf configurations options
test 913
```
  24d2308f
- Fix test file clean-up · 3bfe5968
  Morris Jette authored Jul 02, 2015
  
  3bfe5968
- Fix test cleanup if can't fully run · b6776abf
  Morris Jette authored Jul 02, 2015
  
  b6776abf
- Correct some data times for "scontrol show cache" · 96592047
  Morris Jette authored Jul 02, 2015
  
  96592047
- Add assoc usage to cache info dump · 35d2edeb
  Morris Jette authored Jul 01, 2015
```
Add association usage information to "scontrol show cache" command output.
```
  35d2edeb
01 Jul, 2015 8 commits

srun: Enable output processing on stdout in pty mode · 03e156c5

Dorian Krause authored Jul 01, 2015

Dear all,

we noticed that debugX() and error() calls in srun mess up the output if the --pty flag is used. The reason for this behavior is that srun sets the terminal in raw mode and disables output processing. This is fine for the output forwarded from the remote damon but not for local printf()s by the srun process. This problem can be circumented by re-enabling OPOST after the cfmakeraw() call in srun(). A patch is pasted at the bottom of the mail. It works nicely on Linux but I am not 100% sure it may not have some undesirable side effects on other platforms.

Best regards,
Dorian

Example with current HEAD:

[snip]
srun: jobid 10: nodes(1):`node1', cpu counts: 1(x1)
srun: launching 10.0 on host node1, 1 tasks: 0
                                              srun: route default plugin loaded
                                                                               srun: Node node1, 1 tasks started
                                                                                                                srun: Received task exit notification for 1 task (status=0x0100).
                                         srun: error: node1: task 0: Exited with exit code 1
                                                                                            #
Example with the patch applied:

[snip]
 srun: jobid 11: nodes(1):`node1', cpu counts: 1(x1)
srun: launching 11.0 on host node1, 1 tasks: 0
srun: route default plugin loaded
srun: Node node1, 1 tasks started
srun: Received task exit notification for 1 task (status=0x0100).
srun: error: node1: task 0: Exited with exit code 1

03e156c5

Add TRES support to sview · bb2c84c6
Morris Jette authored Jul 01, 2015
```
Add job, step and reservation TRES information to sview command
```
bb2c84c6
Make CPU frequency test more thorough · a7ad5445
Morris Jette authored Jul 01, 2015

a7ad5445

Refactor CPU frequency test · 179d2502

Morris Jette authored Jul 01, 2015

Prevent test failure if the compute node does not permit user
  control over CPU frequency (no "userspace" governor).

179d2502

Prevent test leaving vestigial file · 7118060a
Morris Jette authored Jul 01, 2015

7118060a

Show job in sacct when step's cpus are different from job allocation. · 0f8e7338

Brian Christiansen authored Jul 01, 2015

When submitting a job with srun -n# the job may be allocated more than # because
the job was given the whole core or socket (eg. CR_CORE, CR_SOCKET). sacct
showed only what the step used and not the allocation. This commit shows the job
and the step if job and step cpus are different.

0f8e7338

Add example of how EnergyIPMIPowerSensors works in the acct_gather.conf · 26806204
Thomas Cadeau authored Jul 01, 2015
```
man page.
```
26806204

Add TRES support to sreport command · b860ed8e

Morris Jette authored Jun 30, 2015

Major re-write of the sreport command to support --tres job option
and permit users to select specific tracable resources to generate
reports for. For most reports, each TRES is listed on a separate
line of output with its name. The default TRES type is "cpu" to
minimize changes to output.

b860ed8e

30 Jun, 2015 10 commits
- Fix unpack of reservation record. · 13531531
  Danny Auble authored Jun 30, 2015
  
  13531531
- Add other variables to pretend to be power. · 785fe754
  Danny Auble authored Jun 30, 2015
  
  785fe754
- Initialize variables · 9703226a
  Danny Auble authored Jun 30, 2015
  
  9703226a
- Fix calculation regressions for IPMI plugin from commit · 61f6c332
  Danny Auble authored Jun 30, 2015
```
556c4ace
```
  61f6c332
- Get rid of redundant code · 161f6326
  Danny Auble authored Jun 29, 2015
  
  161f6326
- Add Yoann to the Slurm team! · 5f8ea769
  Danny Auble authored Jun 29, 2015
  
  5f8ea769
- Merge remote-tracking branch 'origin/slurm-14.11' · c58c1898
  Brian Christiansen authored Jun 30, 2015
  
  c58c1898
- Display error message when attempting to modify priority of a held job. · d798caa9
  Thomas Cadeau authored Jun 30, 2015
```
Bug 1745
```
  d798caa9
- Revert "Display error message when attempting to modify priority of a held job." · 36cb918c
  Brian Christiansen authored Jun 30, 2015
```
This reverts commit 3f91f4b2.
```
  36cb918c
- updated globals_accounting with add_qos and add_qos redefinitions · 85a29c89
  Danny Auble authored Jun 29, 2015
```
and test21.* updated to use them.
```
  85a29c89
29 Jun, 2015 3 commits
- Display error message when attempting to modify priority of a held job. · 3f91f4b2
  Nathan Yee authored Jun 29, 2015
```
Bug 1745
```
  3f91f4b2
- Add partition information to sshare. · ea95512d
  David Bigagli authored Jun 29, 2015
  
  ea95512d
- Merge pull request #114 from rathamahata/master · ebbaee6d
  David Bigagli authored Jun 29, 2015
```
Add parrtition information to sshare output
```
  ebbaee6d