Commits · 22cfb490cb7064e77fb397f2b12b7d89c5c95b09 · Manuel G. Marciani / ces_slurm_simulator

24 Oct, 2013 9 commits
- Remove unneeded debug · 22cfb490
  Danny Auble authored Oct 24, 2013
  
  22cfb490
- Expand slurm.conf man page · 4c3a44ea
  Morris Jette authored Oct 24, 2013
  
  4c3a44ea
- Merge branch 'slurm-2.6' · d1c06ba0
  Morris Jette authored Oct 24, 2013
```
Conflicts:
	NEWS
	src/plugins/proctrack/cgroup/proctrack_cgroup.c
```
  d1c06ba0
- Add info about MySQL configuration · 687a05be
  Morris Jette authored Oct 24, 2013
```
Specifically setting innodb_buffer_pool_size=64
in my.conf
```
  687a05be
- Improve setting of job wait "Reason" field. · cf7ca59b
  Morris Jette authored Oct 24, 2013
```
Without this change a job with a reason of WAIT_PART_DOWN,
WAIT_PART_INACTIVE, WAIT_PART_NODE_LIMIT, WAIT_PART_TIME_LIMIT, or
WAIT_QOS_THRES would not be cleared when that reason no longer
applied.
```
  cf7ca59b
- Improve description of a function · 6f2fa050
  Morris Jette authored Oct 24, 2013
  
  6f2fa050
- Update the FAQ with the requeue of done jobs and special exit status. · 88dfc2d6
  David Bigagli authored Oct 23, 2013
  
  88dfc2d6
- proctrack/cgroup - change from retry to locks · 3c24c236
  Morris Jette authored Oct 23, 2013
```
In the event of a race condition on cgroup create/delete calls
in separate job steps, replace retry logic with a lock. This is
an enhancement of the retry logic recently added to version 2.6,
but the more complex logic (here) is only being added to v13.12.
```
  3c24c236
- Adds check for NULL path in cgroup module · 58d9cbf8
  Morris Jette authored Oct 23, 2013
```
This hardens the code although no such problem has been observed
```
  58d9cbf8
23 Oct, 2013 11 commits

Add test for OverTimeLimit config option · 0006d9ac
Nathan Yee authored Oct 23, 2013

0006d9ac

select/cons_res - performance improvements · ff366da5

Morris Jette authored Oct 22, 2013

Minor code enhancements to select/cons_res:
Replace loop and value set with memcpy
Eliminate redundant zero set of memory being freed

ff366da5

proctrack/cgroup - Fix for race condition · c36d564b

Morris Jette authored Oct 22, 2013

Add cgroup create retry logic in case one step is starting at the
same time as another step is ending and the logic to create
and delete cgroups overlaps.
bug 447

c36d564b

Update TotalView configuration file in FAQ · eb3e8ad8
Dave Henseler authored Oct 22, 2013

eb3e8ad8
Correction to previous commit · a654ad01
Morris Jette authored Oct 22, 2013
```
I did the merge improperly
```
a654ad01

Problem allocating threads with GPUs · 52c2e27f

Morris Jette authored Oct 22, 2013

If a node has GRES and multiple threads per core the select/cons_res
plugin can get stuck in an infinite loop.
See bug 475
Contributed by:
PREVOST Ludovic
NEC HPC Europe

52c2e27f

Add contributor to our web page · f0e25e67
Morris Jette authored Oct 22, 2013

f0e25e67
Document latest patch changes in NEWS · e93a6543
Morris Jette authored Oct 22, 2013

e93a6543

acct_gather_energy/ipmi - Add delay before retry on read error. · cd86abea

Thomas Cadeau authored Oct 22, 2013

If slurmd fails to get IPMI value, then I propose to force to wait 1 second instead of asking BMC again. (Part 3/4 of the patch).
If IPMI init fails when slurmd forces to update the value, then we should not update the value. (Part 4/4 of the patch
Part 1/4 and 2/4 add a security in IPMI init because the function can be call several time.
This force to return SLURM_FAILURE if the first call failed, since the other call will not do anything.

bug 469

cd86abea

Enforce JobRequeue configuration parameter · d7dfa58e
Morris Jette authored Oct 21, 2013
```
Previously a node failure would always requeue the job
```
d7dfa58e
If a job is requeued in SPECIAL_EXIT state allow its dependent · 5c0df112
David Bigagli authored Oct 22, 2013
```
jobs submitted afternotok to run.
```
5c0df112

22 Oct, 2013 10 commits
- proctrack/cgroup - Fix for race condition · 260c5485
  Morris Jette authored Oct 22, 2013
```
Add cgroup create retry logic in case one step is starting at the
same time as another step is ending and the logic to create
and delete cgroups overlaps.
bug 447
```
  260c5485
- Update TotalView configuration file in FAQ · 336e898e
  Dave Henseler authored Oct 22, 2013
  
  336e898e
- Correction to previous commit · c74e81a5
  Morris Jette authored Oct 22, 2013
```
I did the merge improperly
```
  c74e81a5
- Merge branch 'slurm-2.6' of http://github.com/SchedMD/slurm into slurm-2.6 · f0c5c22f
  Morris Jette authored Oct 22, 2013
```
Conflicts:
	NEWS
```
  f0c5c22f
- Problem allocating threads with GPUs · dab7fb02
  Morris Jette authored Oct 22, 2013
```
If a node has GRES and multiple threads per core the select/cons_res
plugin can get stuck in an infinite loop.
See bug 475
Contributed by:
PREVOST Ludovic
NEC HPC Europe
```
  dab7fb02
- Add contributor to our web page · a6fc2633
  Morris Jette authored Oct 22, 2013
  
  a6fc2633
- Document latest patch changes in NEWS · d710fc74
  Morris Jette authored Oct 22, 2013
  
  d710fc74
- acct_gather_energy/ipmi - Add delay before retry on read error. · 802eb9ae
  Thomas Cadeau authored Oct 22, 2013
```
If slurmd fails to get IPMI value, then I propose to force to wait 1 second instead of asking BMC again. (Part 3/4 of the patch).
If IPMI init fails when slurmd forces to update the value, then we should not update the value. (Part 4/4 of the patch
Part 1/4 and 2/4 add a security in IPMI init because the function can be call several time.
This force to return SLURM_FAILURE if the first call failed, since the other call will not do anything.

bug 469
```
  802eb9ae
- Increase timeout on test with poe · b39be798
  Morris Jette authored Oct 21, 2013
  
  b39be798
- Enforce JobRequeue configuration parameter · 351b1f50
  Morris Jette authored Oct 21, 2013
```
Previously a node failure would always requeue the job
```
  351b1f50
21 Oct, 2013 8 commits
- CRAY - only run select_p_job_init on startup · df9fd41d
  Danny Auble authored Oct 21, 2013
  
  df9fd41d
- CRAY - fixes syncing jobs · 3cf85905
  Danny Auble authored Oct 21, 2013
  
  3cf85905
- Describe step memory=0 use in srun man page · 95d73b48
  Morris Jette authored Oct 21, 2013
  
  95d73b48
- Merge branch 'slurm-2.6' · cce0cb34
  Morris Jette authored Oct 21, 2013
  
  cce0cb34
- select/cons_res - allocate cores cyclic across sockets · 0cbcba1a
  Morris Jette authored Oct 21, 2013
```
Restore default behavior of allocating cores to jobs on a cyclic basis
across the sockets unless SelectTypeParameters=CR_CORE_DEFAULT_DIST_BLOCK
or user specifies other distribution options.
Reverts commit 7fcdc7e5
bug 466
```
  0cbcba1a
- Make test more robust · 639eacd7
  Morris Jette authored Oct 21, 2013
```
Expect timing was sometimes causing failures
```
  639eacd7
- Increase default file wait time for SMD · 907c2e11
  Morris Jette authored Oct 21, 2013
```
File delays sometimes larger than 60 seconds
```
  907c2e11
- Merge branch 'slurm-2.6' · 43d74205
  Morris Jette authored Oct 21, 2013
```
Conflicts:
	doc/man/man5/slurm.conf.5
	testsuite/expect/test1.89
	testsuite/expect/test1.90
```
  43d74205
20 Oct, 2013 2 commits

Make slurmd -C format match slurm.conf · e1dc6635
jette authored Oct 19, 2013
```
Change Sockets to SocketsPerBoard and Procs to CPUs
```
e1dc6635

sched/backfill - Prevent invalid memory ref with bf_continue · ea1b316c

jette authored Oct 19, 2013

If the backfill scheduler relinquishes locks and the normal job
scheduler starts a job that the backfill scheduler was actively
working, the backfill scheduler will try to re-schedule that
same job, possibly resulting in an invalid memory reference
or other badness.

ea1b316c