Commits · 98c10b371ec8b644a004e430ac06eb9c171e1f8c · Manuel G. Marciani / ces_slurm_simulator

07 Aug, 2014 7 commits
- Modify AuthInfo conf parameter · 98c10b37
  Morris Jette authored Aug 07, 2014
```
Modify AuthInfo configuration parameter to accept credential lifetime
and socket path options. Previously it accepted a socket path only.
```
  98c10b37
- Update test1.53.prog.c program. · c6dc0d73
  David Bigagli authored Aug 07, 2014
  
  c6dc0d73
- Merge remote-tracking branch 'origin/slurm-14.03' · 69b9c815
  Danny Auble authored Aug 07, 2014
  
  69b9c815
- If step exitcode hasn't been set display with sacct the -2 instead · 5b063617
  Danny Auble authored Aug 07, 2014
```
of acting like it is a signal and exitcode.
```
  5b063617
- As per sbatch and srun documentation when the --signal option is used · 9b926023
  David Bigagli authored Aug 07, 2014
```
signal only the steps and unless, in the case, of a batch job B is
specified in which case signal only the batch script.
```
  9b926023
- If a batch script is requeued and running steps get correct exit code/signal · 0da01963
  Danny Auble authored Aug 07, 2014
```
previous it was always -2.
```
  0da01963
- enhance node reboot state info · c358dcd4
  Morris Jette authored Aug 07, 2014
```
Add node state string suffix of "$" to identify nodes in maintenance
reservation or scheduled for reboot. This applies to scontrol, sinfo,
and sview commands.
Enable scontrol to clear a nodes's scheduled reboot by setting its state
to "RESUME".
```
  c358dcd4
06 Aug, 2014 9 commits
- Merge remote-tracking branch 'origin/slurm-14.03' · ced28b7d
  Danny Auble authored Aug 06, 2014
```
Conflicts:
	src/common/slurm_protocol_defs.c
```
  ced28b7d
- Fix various memory leaks in sview · 4182d609
  Danny Auble authored Aug 06, 2014
  
  4182d609
- Fix minor memory leaks when freeing node_info_t structure. · 7f8452af
  Danny Auble authored Aug 06, 2014
  
  7f8452af
- Merge branch 'slurm-14.03' · 43be9223
  Morris Jette authored Aug 06, 2014
  
  43be9223
- Apply BatchStartTimeout config to srun · aedf1e3c
  Morris Jette authored Aug 06, 2014
```
Apply BatchStartTimeout configuration to task launch and avoid aborting
srun commands due to long running Prolog scripts.
bug 978
```
  aedf1e3c
- Improve slurm.conf man page · 55871524
  Morris Jette authored Aug 06, 2014
```
Provide better description of the slurm.conf configuration parameter
BatchStartTimeout.
bug 979
```
  55871524
- Disable a test based upon configuration · cf742af3
  Morris Jette authored Aug 06, 2014
```
Disable a partition test if JobSubmitPlugins=all_partitions
```
  cf742af3
- Merge branch 'slurm-14.03' · 3e2cb7c6
  Morris Jette authored Aug 05, 2014
  
  3e2cb7c6
- Set noe state to DOWN on reboot · 612e70cc
  Morris Jette authored Aug 05, 2014
```
When nodes scheduled for reboot, set state to DOWN rather than FUTURE so
they are still visible to sinfo. State set to IDLE after reboot completes.
bug 1007
```
  612e70cc
05 Aug, 2014 13 commits

srun executable resolved based upon compute node · 485e712b

Morris Jette authored Aug 05, 2014

Srun executable names beginning with "." will be resolved based upon the
working directory and path on the compute node rather than the submit node.

485e712b

Update NEWS file · d65d9172
David Bigagli authored Aug 05, 2014

d65d9172
Update NEWS file and fix compiler warning: · 12d0031f
David Bigagli authored Aug 05, 2014
```
suggest parentheses around assignment used as truth value.
```
12d0031f

Try to load libslurm.so only when necessary. · 40dabac2

Mehdi Dogguy authored Aug 05, 2014

The code tries to load libslurm.so even if precedent dlopen calls
succeeded. The code is structured so that we have to "return;" as
soon as a dlopen succeeds.

40dabac2

Add missing bracket in a test · b4617772
David Gloe authored Aug 05, 2014

b4617772
Merge branch 'slurm-14.03' · b6b1ff5b
Morris Jette authored Aug 05, 2014

b6b1ff5b
Describe who can post to mailing lists · 7353e804
Morris Jette authored Aug 05, 2014

7353e804
Disable a test without slurmdbd · a7e868ed
Morris Jette authored Aug 05, 2014

a7e868ed

step record purge fix · daa1ccf9

Morris Jette authored Aug 05, 2014

This corrects logic introduced yesterday in commit
6f89dc9d which introduced a double
free of step records, at least on job requeue.
bug 1012

daa1ccf9

Added comments · 00d66a2a

Morris Jette authored Aug 05, 2014

Describe restrictions on specific job and step record purging functions
with respect to "cleaning" flag used for Node Health Check on Cray systems.

00d66a2a

call select_g_step_finish() even for finished jobs · 6f89dc9d

Morris Jette authored Aug 04, 2014

Always call select_g_step_finish() when terminating a job step,
even if the job is also being terminated. This is needed for Cray
systems.
bug 1012

6f89dc9d

requeue state mode · d040244d

Morris Jette authored Aug 04, 2014

When a job is requeued, call deallocate_nodes() with a job state
if COMPLETING. Previously it was called with a state of JOB_REQUEUE,
which could be problematic for step complete function calls (which
I am working on fixing now).

d040244d

Refactor step complete logic · 6765d317
Morris Jette authored Aug 04, 2014
```
Remove some duplicate code. No change in functionality.
```
6765d317

04 Aug, 2014 6 commits

Simple purge of step list with job · 6fe300dd

Morris Jette authored Aug 04, 2014

When a job record is purged, simply purge the step list rather than possibly invoking a node health check on Cray systems.

6fe300dd

Add function that purges step list · fc2cc171
Morris Jette authored Aug 04, 2014
```
No checking or other operations are performed on this list, just a purge.
```
fc2cc171

Re-use of active job ID error · 2f399247

Morris Jette authored Aug 04, 2014

If an attempt is made to submit a job explicitly using a job ID that already exists, then do not try to purge and re-use it, but return an error. The slow clean-up of job steps on Cray systems due to node health check makes me wary of preserving the existing code. Returning an error seems a safer option.

2f399247

refactor job step delete logic · 4f2b7d3d

Morris Jette authored Aug 04, 2014

Call delete_step_records() before clearing the job's JOB_COMPLETING
state flag. This would make a difference in the case of jobs automatically
requeued based upon their exit code, but probably not in other cases.
Also in the select plugins, check not only for a job state of JOB_COMPLETING,
but also FINISHED states. In either case, we are not in a position to
gracefully clean up the step.

4f2b7d3d

Purely cosmetic mods, comments, etc. · 67fd6876
Morris Jette authored Aug 04, 2014

67fd6876

CPU frequency set race condition fix · 760d94a5

Morris Jette authored Aug 04, 2014

Fix race condition in CPU frequency set with job preemption.
When the preemptor job completed, it would notify the srun, which
would notify the slurmctld, which could resume a preempted job.
That preempted job could reset the CPU frequency before the
preemptor. This change has the slurmstepd resetting a job's
CPU frequency prior to notifying srun of completion, which
eliminates the race condition.
bug 1011

760d94a5

02 Aug, 2014 1 commit
- BGQ runjob env setup fix · a0c6528c
  Morris Jette authored Aug 01, 2014
```
This corrects logic added in commit
738913fa
for BGQ systems only
```
  a0c6528c
01 Aug, 2014 4 commits

Initialize some variables to be safe · 6b6b9a56
Morris Jette authored Aug 01, 2014

6b6b9a56
JobCompType allows "jobcomp/mysql" as valid name but the code used · e175be7e
David Bigagli authored Aug 01, 2014
```
"job_comp/mysql" setting an incorrect default database.
```
e175be7e

Move log message · 4e6c8221

Morris Jette authored Jul 29, 2014

This helps reduce a race condition reported in test1.64. Log termination
message right away rather than trying to terminate the job and then
log the event before the srun program exits

4e6c8221

Match gres type for step · 34cc7231

Morris Jette authored Aug 01, 2014

Previous logic did not work properly to allocate specific GRES
model types to job steps from the matching job model types.

34cc7231