Commits · 84eeff7acbee4f044d8085edd37cea5b4df6b35e · Manuel G. Marciani / ces_slurm_simulator

09 Jul, 2015 1 commit

Bug fixes in powercapping logic · 44619ea0

Morris Jette authored Jul 08, 2015

Changed spaces to tabs at start of lines.
Minor changes to some formatting.
Added the new files to the RPM (slurm.spec file).
Prevent memory leak of "l_name" variable if which_power_layout()
    function is called more than once.
Initialize "cpufreq" variable in powercap_get_cpufreq() function.
Array "tmp_max_watts_dvfs" could be NULL and used if "max_watts_dvfs" variable is
    NULL in powercap_get_node_bitmap_maxwatts_dvfs()
Variable "tmp_pcap_cpu_freq" could be used with uninitialized value in function
    _get_req_features()
Variable "tmp_max_watts" could be used with uninitialized value in function
    _get_req_features()
Array "tmp_max_watts_dvfs" could be used with uninitialized value in function
    _get_req_features()
Array "allowed_freqs" could be NULL and used if "node_record_count" variable is
    zero in powercap_get_job_nodes_numfreq()
Overwriting a memory buffer header (especially with different data types) is
    just asking for something bad to happen. This code from function
    powercap_get_job_nodes_numfreq():
			allowed_freqs = xmalloc(sizeof(int)*((int)num_freq+2));
			allowed_freqs[-1] = (int) num_freq;
Clean up memory on slurmctld shutdown

44619ea0

08 Jul, 2015 3 commits
- Fix NEWS · 1bdd8655
  David Bigagli authored Jul 08, 2015
  
  1bdd8655
- Start NEWS for v15.08.0-pre7 · cb608e19
  Morris Jette authored Jul 07, 2015
  
  cb608e19
- Start NEWS for v14.11.9 · 527b61ec
  Morris Jette authored Jul 07, 2015
  
  527b61ec
07 Jul, 2015 6 commits

job_submit/lua plugin: Add partition qos field · 5fecf852
Trey Dockendorf authored Jul 07, 2015

5fecf852
job_submit/lua plugin: QOS enhancements · 793990be
Trey Dockendorf authored Jul 07, 2015
```
Add job record qos field and partition record allow_qos field.
```
793990be
job_submit/lua: Add default_qos field · 0d85b7a5
Trey Dockendorf authored Jul 07, 2015

0d85b7a5

Update job's QOS before partition · f2faa213

Trey Dockendorf authored Jul 07, 2015

This patch moves the QOS update of an existing job to be before the
partition update. This ensures a new QOS value is the value used when
doing validations against things like a partition's AllowQOS and DenyQOS.

Currently if a two partitions have AllowQOS that do not share any QOS,
the order of updates prevents a job from being moved from one partition
to another using something like the following:

scontrol update job=<jobID> partition=<new part> qos=<new qos>

f2faa213

Fix the scontrol man page describing the release argument · 2053cbb3
David Bigagli authored Jul 07, 2015

2053cbb3

Correct pack node logic · 0e0c64de

Morris Jette authored Jul 06, 2015

Correct task layout with CR_Pack_Node option and more than 1 CPU per task.
Previous logic would place one task per CPU launch too few tasks.
bug 1781

0e0c64de

06 Jul, 2015 2 commits

scheduler/backfill enhancements · edfbabe6

Morris Jette authored Jul 06, 2015

Backfill scheduler now considers OverTimeLimit and KillWait configuration
parameters to estimate when running jobs will exit. Initially the job's
end time is estimated based upon it's time limit. After the time limit
is reached, the end time estimate is based upon the OverTimeLimit and
KillWait configuration parameters.
bug 1774

edfbabe6

Add backfill scheduler timeout · 7e944220

Morris Jette authored Jul 06, 2015

Backfill scheduler: The configured backfill_interval value (default 30
    seconds) is now interpretted as a maximum run time for the backfill
    scheduler. Once reached, the scheduler will build a new job queue and
    start over, even if not all jobs have been tested.
bub 1774

7e944220

02 Jul, 2015 2 commits

MPI/MVAPICH plugin requires Munge for authentication · 90c14bb5

Morris Jette authored Jul 02, 2015

Original patch from LLNL assumed Munge was installed, which would
  result in a build error if the Munge development package was
  not installed

90c14bb5

Add assoc usage to cache info dump · 35d2edeb
Morris Jette authored Jul 01, 2015
```
Add association usage information to "scontrol show cache" command output.
```
35d2edeb

01 Jul, 2015 2 commits

Show job in sacct when step's cpus are different from job allocation. · 0f8e7338

Brian Christiansen authored Jul 01, 2015

When submitting a job with srun -n# the job may be allocated more than # because
the job was given the whole core or socket (eg. CR_CORE, CR_SOCKET). sacct
showed only what the step used and not the allocation. This commit shows the job
and the step if job and step cpus are different.

0f8e7338

Add TRES support to sreport command · b860ed8e

Morris Jette authored Jun 30, 2015

Major re-write of the sreport command to support --tres job option
and permit users to select specific tracable resources to generate
reports for. For most reports, each TRES is listed on a separate
line of output with its name. The default TRES type is "cpu" to
minimize changes to output.

b860ed8e

30 Jun, 2015 2 commits
- Display error message when attempting to modify priority of a held job. · d798caa9
  Thomas Cadeau authored Jun 30, 2015
```
Bug 1745
```
  d798caa9
- Revert "Display error message when attempting to modify priority of a held job." · 36cb918c
  Brian Christiansen authored Jun 30, 2015
```
This reverts commit 3f91f4b2.
```
  36cb918c
29 Jun, 2015 2 commits
- Display error message when attempting to modify priority of a held job. · 3f91f4b2
  Nathan Yee authored Jun 29, 2015
```
Bug 1745
```
  3f91f4b2
- Add partition information to sshare. · ea95512d
  David Bigagli authored Jun 29, 2015
  
  ea95512d
26 Jun, 2015 2 commits
- Update NEWS and RELEASE_NOTES about HDF5 and sh5util mods. · 6c768f2e
  Danny Auble authored Jun 26, 2015
  
  6c768f2e
- Add db index on assoc_table.acct. · 712122f9
  Brian Christiansen authored Jun 25, 2015
```
Bug 1746
```
  712122f9
25 Jun, 2015 2 commits
- Add NEWS describing previous commit · 2c727724
  Morris Jette authored Jun 25, 2015
  
  2c727724
- Clarify a NEWS item · 83ed9780
  Morris Jette authored Jun 24, 2015
  
  83ed9780
24 Jun, 2015 2 commits
- Fix core dump. · 7b99dcd0
  David Bigagli authored Jun 24, 2015
  
  7b99dcd0
- BGQ: Disable advanced reservation "REPLACE" option · 6e491256
  Morris Jette authored Jun 24, 2015
  
  6e491256
23 Jun, 2015 1 commit
- Set the totalview_stepid to the value of the job step instead of NO_VAL. · 5456f107
  David Bigagli authored Jun 23, 2015
  
  5456f107
22 Jun, 2015 3 commits

Advanced reservation fixes · a6454176

Morris Jette authored Jun 22, 2015

Updates of existing bluegene advanced reservations did not work at all.
Some multi-core configurations resulting in an abort due to creating
  core_bitmaps for the reservation that only had one bit per node rather
  than one bit per core.
These bugs were introduced in commit 5f258072

a6454176

Update NEWS · c8545598
David Bigagli authored Jun 22, 2015

c8545598
Update NEWS · 38007f9b
David Bigagli authored Jun 22, 2015

38007f9b

19 Jun, 2015 1 commit
- Fix squeue to print according to the man page. · 2973524c
  David Bigagli authored Jun 19, 2015
  
  2973524c
15 Jun, 2015 1 commit

Prevent abort on update of license-only reservation · 50deadb4

Morris Jette authored Jun 15, 2015

Logic was assuming the reservation had a node bitmap which was
being used to check for overlapping jobs. If there is no node
bitmap (e.g. a licenses only reservation), an abort would result.

50deadb4

12 Jun, 2015 3 commits
- Set job's reason to BadConstaints when job can't run on any node. · 475988f5
  Brian Christiansen authored Jun 12, 2015
```
Bug 1739
```
  475988f5
- Remove TICKET_BASED fairshare. · d540af5b
  Brian Christiansen authored Jun 12, 2015
```
Bug 1743
```
  d540af5b
- Deprecated TICKET_BASED fairshare. · c3a30337
  Brian Christiansen authored Jun 12, 2015
```
Bug 1743
```
  c3a30337
11 Jun, 2015 1 commit
- Use correct slurmd spooldir when creating cpu-frequency locks. · 9d20cf02
  Brian Christiansen authored Jun 10, 2015
```
Bug 1733
```
  9d20cf02
10 Jun, 2015 1 commit
- Add NEWS for last commit · 30e50e6c
  Morris Jette authored Jun 10, 2015
  
  30e50e6c
09 Jun, 2015 3 commits

Search for user in all groups · 93ead71a
David Bigagli authored Jun 09, 2015

93ead71a

Fix scheduling inconsistency with GRES · e1a00772

Morris Jette authored Jun 09, 2015

1. I submit a first job that uses 1 GPU:
$ srun --gres gpu:1 --pty bash
$ echo $CUDA_VISIBLE_DEVICES
0

2. while the first one is still running, a 2-GPU job asking for 1 task per node
waits (and I don't really understand why):
$ srun --ntasks-per-node=1 --gres=gpu:2 --pty bash
srun: job 2390816 queued and waiting for resources

3. whereas a 2-GPU job requesting 1 core per socket (so just 1 socket) actually
gets GPUs allocated from two different sockets!
$ srun -n 1  --cores-per-socket=1 --gres=gpu:2 -p testk --pty bash
$ echo $CUDA_VISIBLE_DEVICES
1,2

With this change #2 works the same way as #3.
bug 1725

e1a00772

Enable backup controller on external Cray node with Native Slurm. · 5671bde2
Brian Christiansen authored Jun 09, 2015
```
Bug 1572
```
5671bde2