Commits · 5fecf85254377b118058eea2c2fac498d01e8316 · Manuel G. Marciani / ces_slurm_simulator

07 Jul, 2015 16 commits
- job_submit/lua plugin: Add partition qos field · 5fecf852
  Trey Dockendorf authored Jul 07, 2015
  
  5fecf852
- job_submit/lua plugin: QOS enhancements · 793990be
  Trey Dockendorf authored Jul 07, 2015
```
Add job record qos field and partition record allow_qos field.
```
  793990be
- job_submit/lua: Add default_qos field · 0d85b7a5
  Trey Dockendorf authored Jul 07, 2015
  
  0d85b7a5
- Merge branch 'slurm-14.11' · b3147006
  Morris Jette authored Jul 07, 2015
  
  b3147006
- Update job's QOS before partition · f2faa213
  Trey Dockendorf authored Jul 07, 2015
```
This patch moves the QOS update of an existing job to be before the
partition update.  This ensures a new QOS value is the value used when
doing validations against things like a partition's AllowQOS and DenyQOS.

Currently if a two partitions have AllowQOS that do not share any QOS,
the order of updates prevents a job from being moved from one partition
to another using something like the following:

scontrol update job=<jobID> partition=<new part> qos=<new qos>
```
  f2faa213
- Correction to job array suspend test · b56d4f90
  Morris Jette authored Jul 07, 2015
  
  b56d4f90
- Correct README test number · 8fe72d45
  Morris Jette authored Jul 07, 2015
  
  8fe72d45
- Fix typo from last commit · 941cd847
  Danny Auble authored Jul 07, 2015
  
  941cd847
- Update documentation about the 2 locations and functions of TrackWCKey. · df1e032d
  Danny Auble authored Jul 07, 2015
  
  df1e032d
- Merge branch 'slurm-14.11' · ad7055ce
  Morris Jette authored Jul 07, 2015
  
  ad7055ce
- Fix the scontrol man page describing the release argument · 2053cbb3
  David Bigagli authored Jul 07, 2015
  
  2053cbb3
- fix tests from last commit · 0975d397
  Danny Auble authored Jul 07, 2015
  
  0975d397
- Fix scontrol|sacctmgr show config to print Yes|No instead of a numeric · dae17df5
  Danny Auble authored Jul 07, 2015
```
value for booleans.
```
  dae17df5
- Merge remote-tracking branch 'origin/slurm-14.11' · 5f1d2a42
  Danny Auble authored Jul 07, 2015
```
Conflicts:
	src/common/slurm_step_layout.c
```
  5f1d2a42
- Correct pack node logic · 0e0c64de
  Morris Jette authored Jul 06, 2015
```
Correct task layout with CR_Pack_Node option and more than 1 CPU per task.
Previous logic would place one task per CPU launch too few tasks.
bug 1781
```
  0e0c64de
- Merge remote-tracking branch 'origin/slurm-14.11' · 10d57ac6
  Danny Auble authored Jul 06, 2015
  
  10d57ac6
06 Jul, 2015 5 commits

Fix test to work if some nodes are allocated to other nodes. · c4a7b4d6
Nathan Yee authored Jul 06, 2015

c4a7b4d6
add available_nodes_hostnames function to globals. · f6da81b2
Nathan Yee authored Jul 06, 2015

f6da81b2

scheduler/backfill enhancements · edfbabe6

Morris Jette authored Jul 06, 2015

Backfill scheduler now considers OverTimeLimit and KillWait configuration
parameters to estimate when running jobs will exit. Initially the job's
end time is estimated based upon it's time limit. After the time limit
is reached, the end time estimate is based upon the OverTimeLimit and
KillWait configuration parameters.
bug 1774

edfbabe6

Add backfill scheduler timeout · 7e944220

Morris Jette authored Jul 06, 2015

Backfill scheduler: The configured backfill_interval value (default 30
    seconds) is now interpretted as a maximum run time for the backfill
    scheduler. Once reached, the scheduler will build a new job queue and
    start over, even if not all jobs have been tested.
bub 1774

7e944220

Fix check of getgrouplist to check the original size of array instead · 2d392fbc

Jason Coverston authored Jul 06, 2015

of the size returned.

This check is redundant though since getgrouplist will return -1 if it
tries to use more than ngroups_max.  Perhaps we should just take the check
out.

2d392fbc

03 Jul, 2015 1 commit
- clarify bf_min_age_reserve config parameter · 8fef7dff
  Morris Jette authored Jul 02, 2015
  
  8fef7dff
02 Jul, 2015 18 commits
- mpi/mvapich: Replace spaces with tabs · 7bcc6814
  Morris Jette authored Jul 02, 2015
```
No change in logic
```
  7bcc6814
- MPI/MVAPICH plugin requires Munge for authentication · 90c14bb5
  Morris Jette authored Jul 02, 2015
```
Original patch from LLNL assumed Munge was installed, which would
  result in a build error if the Munge development package was
  not installed
```
  90c14bb5
- remove newlines in mvapich abort message before writing to syslog · 097e6093
  Adam Moody authored Jul 02, 2015
  
  097e6093
- avoid potential buffer overrun in mvapich abort message · 2ce202a9
  Adam Moody authored Jul 02, 2015
  
  2ce202a9
- change some fatals to errors, use xstrdup_printf, add comment in mvapich plugin · eeb408b9
  Adam Moody authored Jul 02, 2015
  
  eeb408b9
- add munge to mvapich plugin · 1d8d9dc8
  Adam Moody authored Jul 02, 2015
  
  1d8d9dc8
- Phil's patch to sreport to correct End time requests in the future · 0ab95050
  Don Lipari authored Jul 02, 2015
```
If a user runs sreport and specifies a time period that stretches into
the future, this patch sets the end of the period to the current time.
```
  0ab95050
- proctrack/cgroup: Move slurmstepd out of freezer cgroup before removal · c2ce30c2
  Mark A. Grondona authored Jul 02, 2015
```
In order to successfully remove the freezer cgroup at the end of
a job step, the slurmstepd process itself must first be moved
outside of the cgroup, or removal will always fail.

This fix moves the slurmstepd back to the root cgroup just
before the rmdir operations are attempted.
```
  c2ce30c2
- Change function name to be more descriptive · 506bad75
  Morris Jette authored Jul 02, 2015
  
  506bad75
- slurmd: return failure to signal job step if prolog is running · 6728119f
  Mark A. Grondona authored Jul 02, 2015
```
If the job prolog is running we can't send a signal to job step
tasks, so return SLURM_FAILURE instead of ESLURM_INVALID_JOB_ID.
This should cause the caller to retry, instead of assuming the
job step is not running on the node.
```
  6728119f
- slurmd: Do not launch job step during job prolog · c2fbf88f
  Mark A. Grondona authored Jul 02, 2015
```
If a job step request comes in while the slurm prolog is running,
slurmd will happily launch the job step. This means that a user
could run code before the prolog is complete, which could cause
strange errors or in some cases a security issue. Instead return
an error (EINPROGRESS for now) and do not allow job steps to run
during prolog.
```
  c2fbf88f
- slurmd: abstract function to search list of running prologs · d9468711
  Mark A. Grondona authored Jul 02, 2015
```
Create a new function, _prolog_is_running(), to determine if a job
prolog is currently running from within slurmd, and use this new
function in place of list_find_first in _wait_for_job_running_prolog.
```
  d9468711
- Do not return error from xcgroup_delete if cgroup already gone · 7e9ff90c
  Mark A. Grondona authored Jul 02, 2015
```
Some cgroup code would retry continuously if somehow a cgroup
didn't exist when xcgroup_delete() was called (See for instance
proctrack/cgroup's slurm_container_plugin_wait()). To fix this
for all callers, return SUCCESS from xcgroup_delete() if rmdir(2)
return ENOENT, since the cgroup is already deleted.
```
  7e9ff90c
- Add gres configuration test · 24d2308f
  Nathan Yee authored Jul 02, 2015
```
Test of gres.conf configurations options
test 913
```
  24d2308f
- Fix test file clean-up · 3bfe5968
  Morris Jette authored Jul 02, 2015
  
  3bfe5968
- Fix test cleanup if can't fully run · b6776abf
  Morris Jette authored Jul 02, 2015
  
  b6776abf
- Correct some data times for "scontrol show cache" · 96592047
  Morris Jette authored Jul 02, 2015
  
  96592047
- Add assoc usage to cache info dump · 35d2edeb
  Morris Jette authored Jul 01, 2015
```
Add association usage information to "scontrol show cache" command output.
```
  35d2edeb