Commits · 6074742caa749313e09ecab4f135a62d26f3f59e · Manuel G. Marciani / ces_slurm_simulator

05 Dec, 2012 1 commit
- Permit SlurmUser or operator to change QOS of non-pending jobs · 6074742c
  Morris Jette authored Dec 05, 2012
```
Especially for newly started jobs, the PrologSlurmctld can change
a job's QOS based upon resource allocation.
```
  6074742c
04 Dec, 2012 1 commit
- BGQ - Add mutex around recovery for the Real Time server to avoid hitting · dea23f7e
  Danny Auble authored Dec 04, 2012
```
DB2 so hard.
```
  dea23f7e
30 Nov, 2012 2 commits
- Fix inconsistency for hostlists that have more than 1 range. · bc7b91cd
  Danny Auble authored Nov 30, 2012
  
  bc7b91cd
- BGQ - Handle shared blocks that need to be removed and have jobs running · ef81bf67
  Danny Auble authored Nov 29, 2012
```
on them.  This should only happen in extreme conditions.
```
  ef81bf67
29 Nov, 2012 7 commits
- Accounting - Fix for if asking for users or accounts that were deleted · 34747e50
  Danny Auble authored Nov 29, 2012
```
with associations get the deleted associations as well.
```
  34747e50
- Add DenyOnLimit flag for QOS to deny jobs at submission time if they · 0d583d9c
  Francois Diakhate authored Nov 29, 2012
```
request resources that reach a 'Max' limit.
```
  0d583d9c
- Fix issue in accounting if a user puts a '\' in their job name. · acc8c531
  Danny Auble authored Nov 28, 2012
  
  acc8c531
- If an salloc is waiting for an allocation to happen and is canceled by the · 74f1b3cf
  Danny Auble authored Nov 28, 2012
```
user mark the state canceled instead of completed.
```
  74f1b3cf
- Change ".rc#" to "-rc#" for consistency · 8492bbbf
  Morris Jette authored Nov 28, 2012
  
  8492bbbf
- Accounting - If a job start message fails to the SlurmDBD reset the db_inx · c03c9b46
  Danny Auble authored Nov 28, 2012
```
so it gets sent again.  This isn't a major problem since the start will
happen when the job ends, but this does make things cleaner.
```
  c03c9b46
- Update NEWS file for start of v2.5.0 · 972164a6
  Morris Jette authored Nov 28, 2012
  
  972164a6
28 Nov, 2012 2 commits
- BGP - Fix for HTC mode · 27e7b048
  Danny Auble authored Nov 27, 2012
  
  27e7b048
- Accounting - Fixed issue where if nodenames have changed on a system and · b87be8bb
  Danny Auble authored Nov 27, 2012
```
you query against that with -N and -E you will get all jobs during that
time instead of only the ones running on -N.

Signed-off-by: Danny Auble <da@schedmd.com>
```
  b87be8bb
27 Nov, 2012 5 commits
- BGQ - handle pending actions on a block better when trying to deallocate it. · e4431036
  Danny Auble authored Nov 27, 2012
  
  e4431036
- BLUEGENE - With Dynamic layout mode - Fix issue where if a larger block · 0dad50ff
  Danny Auble authored Nov 27, 2012
```
was already in error and isn't deallocating and underlying hardware goes
bad one could get overlapping blocks in error making the code assert when
a new job request comes in.
```
  0dad50ff
- BGQ - Add 64 tasks per node as a valid option for srun when used with · d3435cfc
  Danny Auble authored Nov 26, 2012
```
overcommit.
```
  d3435cfc
- BGQ - Add 64 tasks per node as a valid option for srun when used with · 4f085be3
  Danny Auble authored Nov 26, 2012
```
overcommit.
```
  4f085be3
- If the PrologSlurmctld fails, then requeue the job indefinitely · ea8818bb
  Morris Jette authored Nov 26, 2012
```
Previously only requeued the job once
```
  ea8818bb
26 Nov, 2012 2 commits
- Energy RAPL - alter code to close open files (and only open them once · dfe86832
  Danny Auble authored Nov 26, 2012
```
where needed)
```
  dfe86832
- Modify srun to abandon I/O 60 seconds after the last task ends · 09d0935f
  jette authored Nov 26, 2012
```
Otherwise an aborted slurmstepd can cause the srun process to hang
indefinitely; a problem reported in trouble ticket 149.
```
  09d0935f
22 Nov, 2012 1 commit
- Cray - Add message thread for handling messages from the slurmctld and · 97e6b2eb
  Danny Auble authored Nov 21, 2012
```
introduce step accounting for a Cray.
```
  97e6b2eb
21 Nov, 2012 3 commits
- Restore support for srun "--mpi=list" option. · af6fe4e9
  Morris Jette authored Nov 21, 2012
  
  af6fe4e9
- Update news for spec file changes on Monday. · 2abe22a0
  Morris Jette authored Nov 21, 2012
  
  2abe22a0
- Add retry logic to munge encode/decode calls · d38330d3
  Morris Jette authored Nov 20, 2012
```
This is needed if the munge deamon is under very heavy load
(e.g. with 1000 slurmd daemons per compute node).
```
  d38330d3
20 Nov, 2012 3 commits
- Accounting - Fix issue where QOS usage was being zeroed out on a · 8b0b5ae7
  Danny Auble authored Nov 20, 2012
```
slurmctld restart.
```
  8b0b5ae7
- Add socket connect retry logic in case slurmd is down · 209e6bc5
  Morris Jette authored Nov 20, 2012
```
Modify sbast logic to continue when slurmd daemon restarts

Previously a file transmission in progress would be aborted when
any of the slurmd daemons restarted. Now it reconnects, revalidates
the credential, and resumes file transmission.
```
  209e6bc5
- Reset node MAINT state flag when a reservation's nodes or flags change · cc97d84b
  Morris Jette authored Nov 19, 2012
  
  cc97d84b
19 Nov, 2012 3 commits
- BGQ - Fix job step timeout actually happen when done from within an · 0500e007
  Danny Auble authored Nov 19, 2012
```
allocation.
```
  0500e007
- Modify use of OOM (out of memory protection) for Linux 2.6.36 kernel or later · 8ae5e73e
  Morris Jette authored Nov 19, 2012
```
NOTE: If you were setting the environment variable SLURMSTEPD_OOM_ADJ=-17,
it should be set to -1000 for Linux 2.6.36 kernel or later.
```
  8ae5e73e
- NEWS for e40883f1 · f42117c4
  Danny Auble authored Nov 19, 2012
  
  f42117c4
09 Nov, 2012 2 commits
- Update NEWS to start on v2.5.0 work · 13a60fe7
  Morris Jette authored Nov 09, 2012
  
  13a60fe7
- BGQ - added --verbose=OFF when srun --quiet is used · 61d6d537
  Danny Auble authored Nov 08, 2012
  
  61d6d537
08 Nov, 2012 2 commits
- start deprecation of sacct --dump --fdump · c24b46e7
  Danny Auble authored Nov 08, 2012
  
  c24b46e7
- Added new DebugFlags - Energy for AcctGatherEnergy plugins. · f6cabc1f
  Danny Auble authored Nov 08, 2012
```
Signed-off-by: Danny Auble <da@schedmd.com>
```
  f6cabc1f
07 Nov, 2012 5 commits

News for srun update · d14a872a
Danny Auble authored Nov 07, 2012

d14a872a
CRAY - Replace srun.pl with launch/aprun plugin to use srun to wrap the · 422a411e
Danny Auble authored Nov 07, 2012
```
aprun process instead of a perl script.
```
422a411e

Modify default log timestamp pto conform to RFC 5424 format · 4b941731

Janne Blomqvist authored Nov 07, 2012

the attached patch changes the default timestamp format in logfiles to conform to RFC 5424 (the current version of the syslog RFC). It is identical to the current default "ISO 8601" timestamp used by slurm, with the exception that the timezone offset is appended. This has the benefits of

1) It's unambiguous.

2) Avoids potential confusion for admins running cluster(s) in different timezones.

3) Might help debug issues related to DST transitions. (More on that later..)

(To be pedantic, a RFC 5424 timestamp is still a valid ISO 8601 timestamp, but the converse is not necessarily true. So there is RFC 3339 which is a "profile" of ISO 8601, that is a subset, recommended for internet protocols. The RFC 5424 timestamp, in turn, is a subset of the RFC 3339 timestamps.)

The previous behavior of can be used by running configure with the

--disable-rfc5424time

flag.

4b941731

BGQ - validate correct ntasks_per_node · 7eb1a451
Danny Auble authored Nov 06, 2012

7eb1a451
BGQ - Fix issue when running srun outside of an allocation and only · 9e25da94
Danny Auble authored Nov 06, 2012
```
specifying the number of tasks and not the number of nodes.
```
9e25da94

05 Nov, 2012 1 commit

Cray - Improve signal handling for spawned tasks on job cancel · a98b849a

Morris Jette authored Nov 05, 2012

On job kill requeust, send SIGCONT, SIGTERM, wait KillWait and send
SIGKILL. Previously just sent SIGKILL to tasks.

a98b849a