Commits · 17449c066af69441b741110ef51fc2f534272871 · Manuel G. Marciani / ces_slurm_simulator

25 Oct, 2013 8 commits
- Multi-thread the sinfo command (one thread per partition) · 17449c06
  Morris Jette authored Oct 25, 2013
```
Effect is minimal without multiple partitions and larger system sizes.
With 40 partitions and about 600 nodes each, time goes from about
13 secs to 4 secs).
```
  17449c06
- Merge branch 'slurm-2.6' · f56b1db5
  Morris Jette authored Oct 25, 2013
  
  f56b1db5
- hostlist performance improvements · 2dfa3ff6
  Morris Jette authored Oct 25, 2013
```
Reorder some logis in the hostlist functions for performance improvement
specifically for "if (A & B) ..." move the fastest tests first (test
A should take less time than test B).
```
  2dfa3ff6
- Sinfo performance improvements · fd3b75a9
  Morris Jette authored Oct 25, 2013
```
This avoids building hostlist information with NodeHostName and
NodeAddr information unless explisitly requested and can improve
performance for the default mode of operation by about 65%.
```
  fd3b75a9
- Merge remote-tracking branch 'origin/slurm-2.6' · cee7c4f9
  Danny Auble authored Oct 24, 2013
  
  cee7c4f9
- run autogen.sh · 23d329b8
  Danny Auble authored Oct 24, 2013
  
  23d329b8
- Transition automake/autoconf files to modern times · 820225fb
  Danny Auble authored Oct 24, 2013
  
  820225fb
- Correct stdout/err name with job id · 2c2fd9f0
  Morris Jette authored Oct 24, 2013
```
Correct sbatch documentation and job_submit/pbs plugin "%j" is job ID,
not "%J" (which is job_id.step_id).
```
  2c2fd9f0
24 Oct, 2013 9 commits
- Remove unneeded debug · 22cfb490
  Danny Auble authored Oct 24, 2013
  
  22cfb490
- Expand slurm.conf man page · 4c3a44ea
  Morris Jette authored Oct 24, 2013
  
  4c3a44ea
- Merge branch 'slurm-2.6' · d1c06ba0
  Morris Jette authored Oct 24, 2013
```
Conflicts:
	NEWS
	src/plugins/proctrack/cgroup/proctrack_cgroup.c
```
  d1c06ba0
- Add info about MySQL configuration · 687a05be
  Morris Jette authored Oct 24, 2013
```
Specifically setting innodb_buffer_pool_size=64
in my.conf
```
  687a05be
- Improve setting of job wait "Reason" field. · cf7ca59b
  Morris Jette authored Oct 24, 2013
```
Without this change a job with a reason of WAIT_PART_DOWN,
WAIT_PART_INACTIVE, WAIT_PART_NODE_LIMIT, WAIT_PART_TIME_LIMIT, or
WAIT_QOS_THRES would not be cleared when that reason no longer
applied.
```
  cf7ca59b
- Improve description of a function · 6f2fa050
  Morris Jette authored Oct 24, 2013
  
  6f2fa050
- Update the FAQ with the requeue of done jobs and special exit status. · 88dfc2d6
  David Bigagli authored Oct 23, 2013
  
  88dfc2d6
- proctrack/cgroup - change from retry to locks · 3c24c236
  Morris Jette authored Oct 23, 2013
```
In the event of a race condition on cgroup create/delete calls
in separate job steps, replace retry logic with a lock. This is
an enhancement of the retry logic recently added to version 2.6,
but the more complex logic (here) is only being added to v13.12.
```
  3c24c236
- Adds check for NULL path in cgroup module · 58d9cbf8
  Morris Jette authored Oct 23, 2013
```
This hardens the code although no such problem has been observed
```
  58d9cbf8
23 Oct, 2013 11 commits

Add test for OverTimeLimit config option · 0006d9ac
Nathan Yee authored Oct 23, 2013

0006d9ac

select/cons_res - performance improvements · ff366da5

Morris Jette authored Oct 22, 2013

Minor code enhancements to select/cons_res:
Replace loop and value set with memcpy
Eliminate redundant zero set of memory being freed

ff366da5

proctrack/cgroup - Fix for race condition · c36d564b

Morris Jette authored Oct 22, 2013

Add cgroup create retry logic in case one step is starting at the
same time as another step is ending and the logic to create
and delete cgroups overlaps.
bug 447

c36d564b

Update TotalView configuration file in FAQ · eb3e8ad8
Dave Henseler authored Oct 22, 2013

eb3e8ad8
Correction to previous commit · a654ad01
Morris Jette authored Oct 22, 2013
```
I did the merge improperly
```
a654ad01

Problem allocating threads with GPUs · 52c2e27f

Morris Jette authored Oct 22, 2013

If a node has GRES and multiple threads per core the select/cons_res
plugin can get stuck in an infinite loop.
See bug 475
Contributed by:
PREVOST Ludovic
NEC HPC Europe

52c2e27f

Add contributor to our web page · f0e25e67
Morris Jette authored Oct 22, 2013

f0e25e67
Document latest patch changes in NEWS · e93a6543
Morris Jette authored Oct 22, 2013

e93a6543

acct_gather_energy/ipmi - Add delay before retry on read error. · cd86abea

Thomas Cadeau authored Oct 22, 2013

If slurmd fails to get IPMI value, then I propose to force to wait 1 second instead of asking BMC again. (Part 3/4 of the patch).
If IPMI init fails when slurmd forces to update the value, then we should not update the value. (Part 4/4 of the patch
Part 1/4 and 2/4 add a security in IPMI init because the function can be call several time.
This force to return SLURM_FAILURE if the first call failed, since the other call will not do anything.

bug 469

cd86abea

Enforce JobRequeue configuration parameter · d7dfa58e
Morris Jette authored Oct 21, 2013
```
Previously a node failure would always requeue the job
```
d7dfa58e
If a job is requeued in SPECIAL_EXIT state allow its dependent · 5c0df112
David Bigagli authored Oct 22, 2013
```
jobs submitted afternotok to run.
```
5c0df112

22 Oct, 2013 10 commits
- proctrack/cgroup - Fix for race condition · 260c5485
  Morris Jette authored Oct 22, 2013
```
Add cgroup create retry logic in case one step is starting at the
same time as another step is ending and the logic to create
and delete cgroups overlaps.
bug 447
```
  260c5485
- Update TotalView configuration file in FAQ · 336e898e
  Dave Henseler authored Oct 22, 2013
  
  336e898e
- Correction to previous commit · c74e81a5
  Morris Jette authored Oct 22, 2013
```
I did the merge improperly
```
  c74e81a5
- Merge branch 'slurm-2.6' of http://github.com/SchedMD/slurm into slurm-2.6 · f0c5c22f
  Morris Jette authored Oct 22, 2013
```
Conflicts:
	NEWS
```
  f0c5c22f
- Problem allocating threads with GPUs · dab7fb02
  Morris Jette authored Oct 22, 2013
```
If a node has GRES and multiple threads per core the select/cons_res
plugin can get stuck in an infinite loop.
See bug 475
Contributed by:
PREVOST Ludovic
NEC HPC Europe
```
  dab7fb02
- Add contributor to our web page · a6fc2633
  Morris Jette authored Oct 22, 2013
  
  a6fc2633
- Document latest patch changes in NEWS · d710fc74
  Morris Jette authored Oct 22, 2013
  
  d710fc74
- acct_gather_energy/ipmi - Add delay before retry on read error. · 802eb9ae
  Thomas Cadeau authored Oct 22, 2013
```
If slurmd fails to get IPMI value, then I propose to force to wait 1 second instead of asking BMC again. (Part 3/4 of the patch).
If IPMI init fails when slurmd forces to update the value, then we should not update the value. (Part 4/4 of the patch
Part 1/4 and 2/4 add a security in IPMI init because the function can be call several time.
This force to return SLURM_FAILURE if the first call failed, since the other call will not do anything.

bug 469
```
  802eb9ae
- Increase timeout on test with poe · b39be798
  Morris Jette authored Oct 21, 2013
  
  b39be798
- Enforce JobRequeue configuration parameter · 351b1f50
  Morris Jette authored Oct 21, 2013
```
Previously a node failure would always requeue the job
```
  351b1f50
21 Oct, 2013 2 commits
- CRAY - only run select_p_job_init on startup · df9fd41d
  Danny Auble authored Oct 21, 2013
  
  df9fd41d
- CRAY - fixes syncing jobs · 3cf85905
  Danny Auble authored Oct 21, 2013
  
  3cf85905