Commits · 2d2a026bd3102e27b21650d10d460f9b1b5defa6 · Manuel G. Marciani / ces_slurm_simulator

17 Dec, 2012 3 commits
- Modify sview and squeue to ignore pending steps · 2d2a026b
  Morris Jette authored Dec 17, 2012
  
  2d2a026b
- Add "state" field to job step state information · 90e7dab5
  Morris Jette authored Dec 17, 2012
  
  90e7dab5
- Fix spelling of my surname · 3ca1c8c0
  Chris Read authored Dec 17, 2012
  
  3ca1c8c0
14 Dec, 2012 4 commits
- Fix for node being set down due to "unexpeced reboot", but really timing issue · 07c428a2
  Morris Jette authored Dec 14, 2012
  
  07c428a2
- CRAY - Fix for setting up the aprun for a large job (+2000 nodes). · 4e40d6d5
  Danny Auble authored Dec 14, 2012
  
  4e40d6d5
- Fix of job priority ordering · 6a2b1e5a
  Chris Reed authored Dec 14, 2012
```
Without this patch, use of sched/builtin would always result in FIFO
scheduling, even if priority/multifactor was configured
```
  6a2b1e5a
- BGQ - Fix to check block for action 'D' if it also has nodes in error. · b9b3d675
  Danny Auble authored Dec 13, 2012
  
  b9b3d675
13 Dec, 2012 3 commits
- Task/affinity fix for Power7 processor with hyper-threading disabled · 79132234
  jette authored Dec 13, 2012
  
  79132234
- BGQ - Only poll on initialized blocks instead of calling getBlocks on · 0a02938f
  Danny Auble authored Dec 12, 2012
```
each block independently.
```
  0a02938f
- BGQ - fix memory leak · ac4e5e61
  Danny Auble authored Dec 12, 2012
  
  ac4e5e61
12 Dec, 2012 1 commit
- Correct WillRun authentication logic when issued for non-job owner · 5cba7d6d
  Morris Jette authored Dec 12, 2012
  
  5cba7d6d
07 Dec, 2012 1 commit

Correction to hostlist sorting · c8f97453

Morris Jette authored Dec 07, 2012

Correction to hostlist sorting for hostnames that contain two numeric
components and the first numeric component has various sizes (e.g.
"rack9blade1" should come before "rack10blade1")

c8f97453

06 Dec, 2012 1 commit
- Start NEWS for v2.4.6 work · e884d4d8
  Morris Jette authored Dec 06, 2012
  
  e884d4d8
05 Dec, 2012 3 commits
- BGQ - If an allocation exists on a block that has a 'D' action on it fail · 4f3eb00c
  Danny Auble authored Dec 05, 2012
```
job on future step creation attempts.
```
  4f3eb00c
- BGQ - move initial poll to beginning of realtime interaction, which will · 7a5112d5
  Danny Auble authored Dec 05, 2012
```
also cause it to run if the realtime server ever goes away.
```
  7a5112d5
- Permit SlurmUser or operator to change QOS of non-pending jobs · 6074742c
  Morris Jette authored Dec 05, 2012
```
Especially for newly started jobs, the PrologSlurmctld can change
a job's QOS based upon resource allocation.
```
  6074742c
04 Dec, 2012 1 commit
- BGQ - Add mutex around recovery for the Real Time server to avoid hitting · dea23f7e
  Danny Auble authored Dec 04, 2012
```
DB2 so hard.
```
  dea23f7e
30 Nov, 2012 2 commits
- Fix inconsistency for hostlists that have more than 1 range. · bc7b91cd
  Danny Auble authored Nov 30, 2012
  
  bc7b91cd
- BGQ - Handle shared blocks that need to be removed and have jobs running · ef81bf67
  Danny Auble authored Nov 29, 2012
```
on them.  This should only happen in extreme conditions.
```
  ef81bf67
29 Nov, 2012 7 commits
- Accounting - Fix for if asking for users or accounts that were deleted · 34747e50
  Danny Auble authored Nov 29, 2012
```
with associations get the deleted associations as well.
```
  34747e50
- Add DenyOnLimit flag for QOS to deny jobs at submission time if they · 0d583d9c
  Francois Diakhate authored Nov 29, 2012
```
request resources that reach a 'Max' limit.
```
  0d583d9c
- Fix issue in accounting if a user puts a '\' in their job name. · acc8c531
  Danny Auble authored Nov 28, 2012
  
  acc8c531
- If an salloc is waiting for an allocation to happen and is canceled by the · 74f1b3cf
  Danny Auble authored Nov 28, 2012
```
user mark the state canceled instead of completed.
```
  74f1b3cf
- Change ".rc#" to "-rc#" for consistency · 8492bbbf
  Morris Jette authored Nov 28, 2012
  
  8492bbbf
- Accounting - If a job start message fails to the SlurmDBD reset the db_inx · c03c9b46
  Danny Auble authored Nov 28, 2012
```
so it gets sent again.  This isn't a major problem since the start will
happen when the job ends, but this does make things cleaner.
```
  c03c9b46
- Update NEWS file for start of v2.5.0 · 972164a6
  Morris Jette authored Nov 28, 2012
  
  972164a6
28 Nov, 2012 2 commits
- BGP - Fix for HTC mode · 27e7b048
  Danny Auble authored Nov 27, 2012
  
  27e7b048
- Accounting - Fixed issue where if nodenames have changed on a system and · b87be8bb
  Danny Auble authored Nov 27, 2012
```
you query against that with -N and -E you will get all jobs during that
time instead of only the ones running on -N.

Signed-off-by: Danny Auble <da@schedmd.com>
```
  b87be8bb
27 Nov, 2012 5 commits
- BGQ - handle pending actions on a block better when trying to deallocate it. · e4431036
  Danny Auble authored Nov 27, 2012
  
  e4431036
- BLUEGENE - With Dynamic layout mode - Fix issue where if a larger block · 0dad50ff
  Danny Auble authored Nov 27, 2012
```
was already in error and isn't deallocating and underlying hardware goes
bad one could get overlapping blocks in error making the code assert when
a new job request comes in.
```
  0dad50ff
- BGQ - Add 64 tasks per node as a valid option for srun when used with · d3435cfc
  Danny Auble authored Nov 26, 2012
```
overcommit.
```
  d3435cfc
- BGQ - Add 64 tasks per node as a valid option for srun when used with · 4f085be3
  Danny Auble authored Nov 26, 2012
```
overcommit.
```
  4f085be3
- If the PrologSlurmctld fails, then requeue the job indefinitely · ea8818bb
  Morris Jette authored Nov 26, 2012
```
Previously only requeued the job once
```
  ea8818bb
26 Nov, 2012 2 commits
- Energy RAPL - alter code to close open files (and only open them once · dfe86832
  Danny Auble authored Nov 26, 2012
```
where needed)
```
  dfe86832
- Modify srun to abandon I/O 60 seconds after the last task ends · 09d0935f
  jette authored Nov 26, 2012
```
Otherwise an aborted slurmstepd can cause the srun process to hang
indefinitely; a problem reported in trouble ticket 149.
```
  09d0935f
22 Nov, 2012 1 commit
- Cray - Add message thread for handling messages from the slurmctld and · 97e6b2eb
  Danny Auble authored Nov 21, 2012
```
introduce step accounting for a Cray.
```
  97e6b2eb
21 Nov, 2012 3 commits
- Restore support for srun "--mpi=list" option. · af6fe4e9
  Morris Jette authored Nov 21, 2012
  
  af6fe4e9
- Update news for spec file changes on Monday. · 2abe22a0
  Morris Jette authored Nov 21, 2012
  
  2abe22a0
- Add retry logic to munge encode/decode calls · d38330d3
  Morris Jette authored Nov 20, 2012
```
This is needed if the munge deamon is under very heavy load
(e.g. with 1000 slurmd daemons per compute node).
```
  d38330d3
20 Nov, 2012 1 commit
- Accounting - Fix issue where QOS usage was being zeroed out on a · 8b0b5ae7
  Danny Auble authored Nov 20, 2012
```
slurmctld restart.
```
  8b0b5ae7