Commits · 5c32e16bce714c50802b7e4e60d7aaf0bc8d0ee3 · Manuel G. Marciani / ces_slurm_simulator

17 May, 2017 2 commits

Add pack_job_id and pack_job_offset to job records · 5c32e16b

Morris Jette authored May 17, 2017

Add pack_job_id and pack_job_offset fields to job record in slurmctld
  plus job_info structure
Add logic to save/restore pack_job_id and pack_job_offset fields on
  slurmctld restart
Add logging of pack_job_id and pack_job_offset fields in slurmctld's
  jobid2fmt() and jobid2str() functions
Add pack_job_id and pack_job_offset fields to "scontrol show job"
  output

5c32e16b

Add allocate_pack error handling · 36bb1ce2

Morris Jette authored May 17, 2017

cancel vestigial pack jobs when an error happens in some portion of
  the job request

36bb1ce2

16 May, 2017 3 commits

Fix for potential memory leak · ae2e221b
Morris Jette authored May 16, 2017

ae2e221b

Add REQUEST_JOB_PACK_ALLOCATION RPC · 503f3531

Morris Jette authored May 16, 2017

Add a new RPC, REQUEST_JOB_PACK_ALLOCATION, that will send a List of
job request descriptors in a single message. Each job descriptor
will be logged and an job record created for it.

503f3531

packjob/salloc parse refactoring · 6e6adb10

Morris Jette authored May 15, 2017

Refactor the salloc option parsing logic to build a separate set of
job descriptor data structures for each portion of the heterogeneous
job. Only the last job record is current submitted to slurmctld.

6e6adb10

15 May, 2017 1 commit
- sbatch cosmetic changes, no changes to logic · 2f313292
  Morris Jette authored May 15, 2017
  
  2f313292
13 May, 2017 3 commits
- Merge branch 'slurm-17.02' · e0ca803c
  Morris Jette authored May 12, 2017
  
  e0ca803c
- Remove log files from test20.12 · 7bb4d9a1
  Isaac Hartung authored May 12, 2017
```
Bug 3695
```
  7bb4d9a1
- knl_cray plugin: Change capmc parsing of mcdram_pct from string to number · 7bd276b1
  Morris Jette authored May 12, 2017
```
bug 3779
```
  7bd276b1
12 May, 2017 4 commits

knl_cray plugin: Log incomplete capmc output for a node · 80b27490

Morris Jette authored May 12, 2017

If capmc reports a node name, but not mcdram_cfg for the node, then
  log the missing data rather than assume the value is zero and
  report a value mismatch with cnselect.

80b27490

Prevent scontrol crash when operating on array and no-array jobs at once. · 006f7eeb

Alejandro Sanchez authored May 12, 2017

When requesting an operation on jobs, where the operation permits to specify
more than one job in the same request, and a job array appears before a
regular job (no-array job) in the list of jobs to operate with, the
job_array_resp_msg_t pointer was not properly NULL'ed and thus incorrectly
accessed when processing the no-array job. This fix prevents the crash from
happening in the following scontrol operations:

uhold, hold, suspend, requeue, requeuehold, update, release

when the same request has <array_jobid>,<non-array_jobid> in this order in
the job list to process.

Bug 3759

006f7eeb

Enhance job expansion example · 02b790bc

Morris Jette authored May 12, 2017

Job expansion example in FAQ enhanced to demonstrate operation in
    heterogeneous environments.
bug 2979

02b790bc

avoid starting scheduler on busy system after power cap change · e29e8511
Alejandro Sanchez authored May 12, 2017
```
Do not attempt to schedule jobs after changing the power cap if there are
    already many active threads.
```
e29e8511

11 May, 2017 2 commits
- Merge remote-tracking branch 'origin/slurm-17.02' · 4fa4f65e
  Danny Auble authored May 10, 2017
```
# Conflicts:
#	META
#	NEWS
```
  4fa4f65e
- Update NEWS for next release. · d65ed698
  Danny Auble authored May 10, 2017
  
  d65ed698
10 May, 2017 3 commits
- Update META for v17.02.3 tag · b6f8ca23
  Danny Auble authored May 10, 2017
  
  b6f8ca23
- Return error when bad separator is given for scontrol update job licenses. · 521a574c
  Dominik Bartkiewicz authored May 10, 2017
```
Bug 3760
```
  521a574c
- Partial revert of commit c6a144c1 which made it so CR_ONE_TASK_PER_CORE · 9556b4ab
  Danny Auble authored May 09, 2017
```
didn't work at all.

Bug 3712.
```
  9556b4ab
09 May, 2017 9 commits
- Revert "Return error when bad separator is given for scontrol update job licenses." · 36718220
  Danny Auble authored May 09, 2017
```
This reverts commit ecfd007f.
```
  36718220
- Return error when bad separator is given for scontrol update job licenses. · ecfd007f
  Dominik Bartkiewicz authored May 09, 2017
  
  ecfd007f
- Fix casting · c865bd0c
  Brian Christiansen authored May 09, 2017
```
Continuation of 9a1370e3
CID 168995
```
  c865bd0c
- Don't remove admin comment when updating a job. · 6cf363c4
  Danny Auble authored May 09, 2017
```
It was noticed that while doing any update to a job the admin comment would
be blown away.  This patch fixes that.
```
  6cf363c4
- Fix updating job priority on multiple partitions to be correct. · bf7e0e7b
  Dominik Bartkiewicz authored May 09, 2017
```
Bug 3789
```
  bf7e0e7b
- It was found on system running openmpi using multiple-slurmds we couldn't · 3a4f38d0
  Danny Auble authored May 09, 2017
```
run multiple tasks on multiple nodes.  Changing the max nodes setting from
3 to 6 fixes the issue without apparent compromise to the test.
```
  3a4f38d0
- Message Aggr - Remove race condition on slurmd shutdown with respects to · bc3cdabf
  Danny Auble authored May 09, 2017
```
destroying a mutex.
```
  bc3cdabf
- Filter out duplicate federated jobs · 9a1370e3
  Brian Christiansen authored May 08, 2017
```
When running sacct from a federated client, the db returns jobs for each
cluster with duplicate jobs removed on each cluster. A federated job could have
ran on a different cluster when the before the jobid's rolled. This patch
filters out past old federated jobs and leaves the newest ones.

Reverted d31965 which was too slow.
```
  9a1370e3
- Revert "Filter out duplicate federated jobs" · 381f9408
  Brian Christiansen authored May 08, 2017
```
This reverts commit d31965f3.
```
  381f9408
08 May, 2017 5 commits
- Allow reboot program to use arguments. · 64c3fc37
  Tim Shaw authored May 08, 2017
```
Bug 3612

Looks like a regression in commit ad07aebc
```
  64c3fc37
- Sanity check to make sure we have started a job in acct_policy.c before we · f795589d
  Danny Auble authored May 08, 2017
```
clear it as started.
```
  f795589d
- Revert "Clean up the bb allocation that happened when we ran bb_g_job_begin when" · 2b75ee8c
  Danny Auble authored May 08, 2017
```
This reverts commit e87edf8d.

Per Moe's suggestion, we revert this sense it most likely isn't totally
correct.
```
  2b75ee8c
- Merge branch 'slurm-17.02' · b82aaffb
  Morris Jette authored May 08, 2017
  
  b82aaffb
- Change topology.conf generation tool to newer program · 39fbbd8a
  Morris Jette authored May 08, 2017
  
  39fbbd8a
06 May, 2017 3 commits
- Merge branch 'slurm-17.02' · b967924a
  Tim Wickberg authored May 05, 2017
  
  b967924a
- Merge branch 'slurm-16.05' into slurm-17.02 · e40dd183
  Tim Wickberg authored May 05, 2017
  
  e40dd183
- Testsuite - add --stop-on-first-fail option to regression.py. · cdcffc23
  Tim Wickberg authored May 05, 2017
  
  cdcffc23
05 May, 2017 5 commits

Merge remote-tracking branch 'origin/slurm-17.02' · 0290b4d5
Danny Auble authored May 05, 2017

0290b4d5

Make sview federation aware · 28e34b39

Morris Jette authored May 05, 2017

Work is incomplete, but for now we get all jobs in federation and highlight
  the local nodes associated with each job.

28e34b39

It turns out The state needs to be running before we call · 17083eb6

Danny Auble authored May 05, 2017

select_g_select_nodeinfo_set().

This is a continuation of commit 80443cc1.

Bug 3690.

Test 1.62 will fail otherwise.

17083eb6

Clean up the bb allocation that happened when we ran bb_g_job_begin when · e87edf8d
Danny Auble authored May 05, 2017
```
we failed.

Bug 3690 continuation of commit bf0429d1.
```
e87edf8d

give get node/partition calls federation support · 38932c0e

Morris Jette authored May 05, 2017

Add "cluster_name" field to node_info_t and partition_info_t data structure.
It is filled in only when the cluster is part of a federation and
SHOW_GLOBAL flag used.
Functions slurm_load_node() slurm_load_partitions() modified to show all
nodes/partitions in a federation when the SHOW_GLOBAL flag is used.

38932c0e