Commits · ebaa43660e9ad829e83774666bd2dfe6cc849121 · Manuel G. Marciani / ces_slurm_simulator

17 Jun, 2014 1 commit

Morris Jette authored Jun 17, 2014

Correct logic to support Power7 processor with 1 or 2 threads per core
(CPU IDs are not consecutive).
bug 891

ebaa4366

03 Jun, 2014 1 commit

scale jobs mem-per-cpu limit · 0fbdb9e2

Morris Jette authored Jun 02, 2014

If a job --mem-per-cpu limit exceeds the partition or system limit, then
scale the job's memory limit and CPUs per task to satisfy the limit.
bug 848

0fbdb9e2

19 May, 2014 1 commit

Properly handle job requeue options · 68a4bfd7

Morris Jette authored May 19, 2014

Properly enforce job --requeue and --norequeue options. Previous
logic was in three places not doing so (either ignoring the value,
ANDing it with the JobRequeue configuration option or using the
JobRequeue configuration option by itself).
bug 821

68a4bfd7

12 May, 2014 2 commits

Fix support for job --profile=none option · 043e1b08
Puenlap Lee authored May 12, 2014
```
Also correct related documentation
```
043e1b08

fix of comp nodes causing backfill to end early · d508ea95

Hongjia Cao authored May 12, 2014

Completing nodes is removed when calling _try_sched() for a job, which
is the case in select_nodes(). If _try_sched() thinks the job can run
now but select_nodes() returns ESLURM_NODES_BUSY, the backfill loop will
be ended.

d508ea95

09 May, 2014 1 commit
- If an invalid assoc_ptr comes in don't use the id to verify it. · 2261d393
  Danny Auble authored May 09, 2014
  
  2261d393
08 May, 2014 1 commit

Correct sinfo sort fields options · ff518ad1

Morris Jette authored May 07, 2014

Correct sinfo --sort fields to match documentation: E => Reason,
H -> Reason Time (new), R -> Partition Name, u/U -> Reason user (new)

ff518ad1

06 May, 2014 1 commit
- BGQ - Fix issue with uninitialized variable. · 950a3fd6
  Danny Auble authored May 06, 2014
  
  950a3fd6
05 May, 2014 4 commits
- Fix perlapi to compile correctly with perl 5.18 · 21ebf585
  Danny Auble authored May 05, 2014
  
  21ebf585
- Handle node ranges better when dealing with accounting max node limits. · d849aadb
  Danny Auble authored May 05, 2014
  
  d849aadb
- BGQ - Move code to only start job on a block after limits are checked. · 3a4246cc
  Danny Auble authored May 05, 2014
```
Related to bug 771
```
  3a4246cc
- BGQ - Fix issue where limits were checked on midplane counts instead of · 836b654f
  Danny Auble authored May 05, 2014
```
cnode counts.
```
  836b654f
02 May, 2014 2 commits
- BGQ - Temp fix issue where job could be left on job_list after it finished. · e4f1a099
  Danny Auble authored May 02, 2014
  
  e4f1a099
- Fix issue where user is requesting --acctg-freq=0 and no memory limits. · 17e4e2ac
  Danny Auble authored May 02, 2014
  
  17e4e2ac
30 Apr, 2014 1 commit

switch/nrt - CAU and RMDA tracking correction · 6f66fdef

Morris Jette authored Apr 30, 2014

Switch/nrt - Properly track usage of CAU and RDMA resources with multiple
tasks per compute node. Previous logic would allocate resources once per
task and then deallocate once per node, leaking CMA and RDMA resources
and preventing their use by future jobs.

6f66fdef

18 Apr, 2014 1 commit

switch/nrt - free partial allocation · a197a1da

Morris Jette authored Apr 18, 2014

On switch resource allocation failure, free partial allocation.
Failure mode was CAU could be allocated on some nodes, but not
others. The CAU allocated on nodes and switches up to the failure
point were never released.

a197a1da

08 Apr, 2014 4 commits
- Start NEWS for v2.6.10 · 3114d035
  Morris Jette authored Apr 08, 2014
  
  3114d035
- Fix logic bugs for max_rpc_cnt SchedulerParameters · 78f9b4cc
  Morris Jette authored Apr 08, 2014
```
Fix logic bugs for SchedulerParameters option of max_rpc_cnt.
Scheduling would be delayed for job arrays and
backfill scheduling would be disabled unless max_rpc_cnt > 0.
```
  78f9b4cc
- Fix sacctmgr update user with no "where" condition. · 7ad6df27
  Danny Auble authored Apr 07, 2014
  
  7ad6df27
- Fix sinfo to work correctly with draining/mixed nodes as well as filtering · 2fb004cf
  Danny Auble authored Apr 07, 2014
```
on Mixed state.
```
  2fb004cf
07 Apr, 2014 3 commits
- Start NEWS for v2.6.9 · a51f6fbf
  Morris Jette authored Apr 07, 2014
  
  a51f6fbf
- BGQ - Fix sub block steps using a block when the block has passthrough's · 91c70cc9
  Danny Auble authored Apr 07, 2014
```
in it.

Signed-off-by: Danny Auble <da@schedmd.com>
```
  91c70cc9
- BGQ - Fix deny_pass to work correctly. · bee1ec08
  Danny Auble authored Apr 04, 2014
  
  bee1ec08
05 Apr, 2014 1 commit
- added SchedulerParameters option of max_rpc_cnt · ab381fd3
  Morris Jette authored Apr 04, 2014
```
Disables job scheduling when there are too many pending RPCs
```
  ab381fd3
04 Apr, 2014 3 commits
- MySQL - Fix it so a lock isn't held unnecessarily. · 9fe5c605
  Danny Auble authored Apr 04, 2014
  
  9fe5c605
- Fix sinfo to work correctly with draining/mixed nodes. (copied sview code) · 1d3b553c
  Danny Auble authored Apr 04, 2014
```
This also reverts commit 8cff3b08 and
ced2fa3f
```
  1d3b553c
- NEWS for the last 2 commits · ac4b337a
  Danny Auble authored Apr 03, 2014
  
  ac4b337a
03 Apr, 2014 2 commits
- Fix issue where associations weren't correct if backup takes control and · 9368ff2d
  Danny Auble authored Apr 03, 2014
```
new associations were added since it was started.
```
  9368ff2d
- Defer scheduling for many batch jobs · dd4aa1c3
  Morris Jette authored Apr 02, 2014
```
Permit multiple batch job submissions to be made for each run of the
scheduler logic if the job submissions occur at the nearly same time.
bug 616
```
  dd4aa1c3
02 Apr, 2014 1 commit

launch/poe - fix network value · ad7100b8

Morris Jette authored Apr 02, 2014

if an job step's network value is set by poe, either by directly
executing poe or srun launching poe, that value was not being
propagated to the job step creation RPC and the network was not
being set up for the proper protocol (e.g. mpi, lapi, pami, etc.).
The previous logic would only work if the srun execute line
explicitly set the protocol using the --network option.

ad7100b8

31 Mar, 2014 1 commit
- prempt/partition_prio fix · a0ba1865
  Marcin Stolarek authored Mar 31, 2014
```
Prevent preemption of jobs in partition where PreemptMode=off
```
  a0ba1865
26 Mar, 2014 1 commit
- Lock the /cgroup/freezer subsystem when creating files for tracking · bd05aaf2
  David Bigagli authored Mar 26, 2014
```
processes.
```
  bd05aaf2
25 Mar, 2014 1 commit
- mysql - Fix invalid memory reference. · 00cabba3
  Danny Auble authored Mar 25, 2014
  
  00cabba3
24 Mar, 2014 1 commit

job array dependency recovery fix · fca71890

Morris Jette authored Mar 24, 2014

When slurmctld restarted, it would not recover dependencies on
job array elements and would just discard the depenency. This
corrects the parsing problem to recover the dependency. The old code
would print a mesage like this and discard it:
slurmctld: error: Invalid dependencies discarded for job 51: afterany:47_*

fca71890

21 Mar, 2014 1 commit

NRT - Fix issue with 1 node jobs. It turns out the network does need to · 440932df

Danny Auble authored Mar 21, 2014

be setup for 1 node jobs. Here are some of the reasons from IBM...

1. PE expects it.
2. For failover, if there was some challenge or difficulty with the
shared-memory method of data transfer, the protocol stack might
want to go through the adapter instead.
3. For flexibility, the protocol stack might want to be able to transfer
data using some variable combination of shared memory and adapter-based
communication, and
4. Possibly most important, for overall performance, it might be that
bandwidth or efficiency (BW per CPU cycles) might be better using the
adapter resources. (An obvious case is for large messages, it might
require a lot fewer CPU cycles to program the DMA engines on the
adapter to move data between tasks, rather than depend on the CPU
to move the data with loads and stores, or page re-mapping -- and
a DMA engine might actually move the data more quickly, if it's well
integrated with the memory system, as it is in the P775 case.)

440932df

20 Mar, 2014 2 commits
- task/affinity - Protect against zero divide when simulating more hardware · 92b4de3c
  Danny Auble authored Mar 20, 2014
```
than you really have.
```
  92b4de3c
- sinfo - Make sure if partition name is long and the default the last char · c4bd5ba8
  Danny Auble authored Mar 20, 2014
```
doesn't get chopped off.
```
  c4bd5ba8
19 Mar, 2014 2 commits
- Move the comment from 2.6.7 to 2.6.8 · 9950679b
  David Bigagli authored Mar 19, 2014
  
  9950679b
- Fixed sacct.1 and srun.1 manual pages which contains a hyphen where · e1c8e670
  Gennaro Oliva authored Mar 19, 2014
```
    a minus sign for options was intended.
```
  e1c8e670
18 Mar, 2014 1 commit
- Free job_ptr->state_desc where ever state_reason is set. · c2ae6cfc
  Danny Auble authored Mar 17, 2014
  
  c2ae6cfc