Commits · 098a1e506820109e5220d946d2514f5b3122495c · Manuel G. Marciani / ces_slurm_simulator

25 Mar, 2014 18 commits
- Revert "select/cray step cleanup" · 098a1e50
  Danny Auble authored Mar 25, 2014
```
This reverts commit 9a2d863c.
```
  098a1e50
- Initialize damp_factor to slurm.conf value to work correctly. · 94f6e193
  Danny Auble authored Mar 25, 2014
  
  94f6e193
- Initialize damp_factor to allow tools like sshare to work correctly · e135acef
  Don Lipari authored Mar 25, 2014
  
  e135acef
- Merge remote-tracking branch 'origin/slurm-2.6' · c67a54f6
  Danny Auble authored Mar 25, 2014
  
  c67a54f6
- Fix test to actually run. · 5225fa0b
  Danny Auble authored Mar 25, 2014
  
  5225fa0b
- Fix a few typos · cd22e50e
  Danny Auble authored Mar 25, 2014
  
  cd22e50e
- Update documentation about Native Cray systems · db5d6308
  Danny Auble authored Mar 25, 2014
  
  db5d6308
- Fix memory leak in slurmctld · 150d4b84
  Danny Auble authored Mar 25, 2014
  
  150d4b84
- Increase sprio's default jobid to 15 width instead of 7 · 45bca565
  Danny Auble authored Mar 25, 2014
  
  45bca565
- Change SLURM -> Slurm · 4311ef48
  Danny Auble authored Mar 24, 2014
  
  4311ef48
- Update configuration web page · e61c6b0d
  Morris Jette authored Mar 25, 2014
```
Update configuration file build web page for Slurm version 14.03
mostly to support Native Cray systems.
```
  e61c6b0d
- fix select_nodeinfo_set_all() of select/linear · 18ca8adf
  Hongjia Cao authored Mar 25, 2014
```
fix the problem that allocated but drained node will be shown mixed by sinfo
```
  18ca8adf
- Expand scontrol hostlsit testing · 70f0d960
  Morris Jette authored Mar 25, 2014
```
Add test for triple bracketed expression
```
  70f0d960
- expand hostlist expression support · 08a97f02
  Morris Jette authored Mar 25, 2014
```
Modify hostlist expressions to accept more than two numeric ranges
(e.g. "row[1-3]rack[0-8]slot[0-63]")
```
  08a97f02
- mysql - Fix invalid memory reference. · 00cabba3
  Danny Auble authored Mar 25, 2014
  
  00cabba3
- Clarify sbatch script option processing · d93450ca
  jette authored Mar 24, 2014
```
See bug 662
```
  d93450ca
- hostlist parse fix · ee767132
  Morris Jette authored Mar 24, 2014
```
If a hostlist expression contained a separator after two open
brackets and one close bracket, this resulted in bad parsing.
Before:
$ scontrol show hostnames a[1-2]b[1,2]
a[1-2]b[1]
2]
After:
$ scontrol show hostnames a[1-2]b[1,2]
a1b1
a1b2
a2b1
a2b2
```
  ee767132
- Describe how to use gprof · d86dc62c
  Morris Jette authored Mar 24, 2014
  
  d86dc62c
24 Mar, 2014 17 commits
- Shorter wait (probably not needed at all) · bc734e08
  Danny Auble authored Mar 24, 2014
  
  bc734e08
- Fix test to actually use the QOS for which the limit was set · acc75e70
  Danny Auble authored Mar 24, 2014
  
  acc75e70
- Use reset_qos_usage · 770cb632
  Danny Auble authored Mar 24, 2014
  
  770cb632
- Add reset_qos_usage as a regression function · 393a7bb4
  Danny Auble authored Mar 24, 2014
  
  393a7bb4
- Better debugging · d8cdf9f2
  Danny Auble authored Mar 24, 2014
  
  d8cdf9f2
- Added sacctmgr mod qos set RawUsage=0 · f7fb80ec
  Danny Auble authored Mar 24, 2014
  
  f7fb80ec
- Make AccountingStorageEnforce=all not include nojobs or nosteps. · acbaab41
  Danny Auble authored Mar 24, 2014
  
  acbaab41
- Fix test to be able to test GrpCpuMins · bd2e0b75
  Danny Auble authored Mar 24, 2014
  
  bd2e0b75
- Better test to work on systems with CR_CORE and using Hyperthreads · 62a515cf
  Danny Auble authored Mar 24, 2014
  
  62a515cf
- Better testing for GrpWall (since it is a decayed value) · 24c37412
  Danny Auble authored Mar 24, 2014
  
  24c37412
- new function to reset usage from the test suite · 4bc6303b
  Danny Auble authored Mar 24, 2014
  
  4bc6303b
- Better debug · 49dd709d
  Danny Auble authored Mar 24, 2014
  
  49dd709d
- Fix GrpWall check. If using AccountingEnforce=Safe you would never be · b5bba0a7
  Danny Auble authored Mar 24, 2014
```
able to run the exact number of cpu minutes in the limit.
```
  b5bba0a7
- switch/nrt: add good message when poe run as root · c3b3ceed
  Morris Jette authored Mar 24, 2014
```
When poe is invoked (under srun) as user root, it generates a
cryptic error message. I've added a clear error message describing
the problem:
error: POE will not run as user root
Rather than just:
ERROR: 0031-620  pm_SSM_write failed in sending the user/environment for taskid 0
```
  c3b3ceed
- Add job array hash table · ac7fabc6
  Morris Jette authored Mar 24, 2014
```
Previous logic would typically do list search to find job array elements.
This commit adds two hash tables for job arrays. The first is based upon
the "base" job ID which is common to all tasks. The second hash table
is based upon the sum of the "base" job ID plus the task ID in the array.
This will substantially improve performance for handling dependencies
with job arrays.
```
  ac7fabc6
- Merge branch 'slurm-2.6' · 3519e792
  Morris Jette authored Mar 24, 2014
  
  3519e792
- job array dependency recovery fix · fca71890
  Morris Jette authored Mar 24, 2014
```
When slurmctld restarted, it would not recover dependencies on
job array elements and would just discard the depenency. This
corrects the parsing problem to recover the dependency. The old code
would print a mesage like this and discard it:
slurmctld: error: Invalid dependencies discarded for job 51: afterany:47_*
```
  fca71890
22 Mar, 2014 1 commit

Fix sview abort when adding/removing columns · fbfd0e4d

Morris Jette authored Mar 22, 2014

When adding or removing columns to most data types (jobs, partitions,
nodes, etc.) on some system types an abort is generated. This appears
to be because when columns displayed change, on some systems that
changes the address of "model", while on others the address does not
change (like my laptops). This fix explicitly sets the last_model to
NULL when the columns are changed rather than relying upon the data
structure's address to change.

fbfd0e4d

21 Mar, 2014 4 commits

NRT - Fix minor typos · 675b25ad
Danny Auble authored Mar 21, 2014

675b25ad

NRT - Fix issue with 1 node jobs. It turns out the network does need to · 440932df

Danny Auble authored Mar 21, 2014

be setup for 1 node jobs. Here are some of the reasons from IBM...

1. PE expects it.
2. For failover, if there was some challenge or difficulty with the
shared-memory method of data transfer, the protocol stack might
want to go through the adapter instead.
3. For flexibility, the protocol stack might want to be able to transfer
data using some variable combination of shared memory and adapter-based
communication, and
4. Possibly most important, for overall performance, it might be that
bandwidth or efficiency (BW per CPU cycles) might be better using the
adapter resources. (An obvious case is for large messages, it might
require a lot fewer CPU cycles to program the DMA engines on the
adapter to move data between tasks, rather than depend on the CPU
to move the data with loads and stores, or page re-mapping -- and
a DMA engine might actually move the data more quickly, if it's well
integrated with the memory system, as it is in the P775 case.)

440932df

minor spacing · 3664ab49
Danny Auble authored Mar 20, 2014

3664ab49
decrease waiting time, 10 seconds is just too long. · 4f26ef82
Danny Auble authored Mar 20, 2014

4f26ef82