Commits · 9d20cf02ee2312e736bee392373e36b9bbb075b9 · Manuel G. Marciani / ces_slurm_simulator

11 Jun, 2015 3 commits

Use correct slurmd spooldir when creating cpu-frequency locks. · 9d20cf02
Brian Christiansen authored Jun 10, 2015
```
Bug 1733
```
9d20cf02

Fix for node reboot/down state · 4e8545b6

Didier GAZEN authored Jun 10, 2015

In your node_mgr fix to keep rebooted nodes down (commit 9cd15dfe), you
forgot to consider the case of nodes that are powered up but are responding after
ResumeTimeout seconds (the maximum time permitted). Such nodes are
marked DOWN (because they didn't respond within ResumeTimeout seconds) than
should become silently available when ReturnToService=1 (as stated in the slurm.conf manual)

With your modification when such nodes are finally responding, they are seen as
rebooted nodes and remain in the DOWN state (with the new reason: Node
unexpectedly rebooted) even when ReturnToService=1 !

Correction of commit 3c2b46af

4e8545b6

Revert commit 3c2b46af · c85f7484
Didier GAZEN authored Jun 10, 2015

c85f7484

10 Jun, 2015 3 commits

Add NEWS for last commit · 30e50e6c
Morris Jette authored Jun 10, 2015

30e50e6c

Fix for node reboot/down state · 3c2b46af

Didier GAZEN authored Jun 10, 2015

My patch to obtain the correct behaviour:

3c2b46af

select/serial gres scheduling fix · f2a08ce7
Morris Jette authored Jun 09, 2015
```
Equivalent fix as e1a00772
for select/serial rather than select/cons_res
```
f2a08ce7

09 Jun, 2015 7 commits
- Search for user in all groups · 93ead71a
  David Bigagli authored Jun 09, 2015
  
  93ead71a
- Fix scheduling inconsistency with GRES · e1a00772
  Morris Jette authored Jun 09, 2015
```
1. I submit a first job that uses 1 GPU:
$ srun --gres gpu:1 --pty bash
$ echo $CUDA_VISIBLE_DEVICES
0

2. while the first one is still running, a 2-GPU job asking for 1 task per node
waits (and I don't really understand why):
$ srun --ntasks-per-node=1 --gres=gpu:2 --pty bash
srun: job 2390816 queued and waiting for resources

3. whereas a 2-GPU job requesting 1 core per socket (so just 1 socket) actually
gets GPUs allocated from two different sockets!
$ srun -n 1  --cores-per-socket=1 --gres=gpu:2 -p testk --pty bash
$ echo $CUDA_VISIBLE_DEVICES
1,2

With this change #2 works the same way as #3.
bug 1725
```
  e1a00772
- Move definitions into alphabetic order · 5f337d38
  Morris Jette authored Jun 09, 2015
  
  5f337d38
- Update broken links in webpages · 4a41e4d7
  Danny Auble authored Jun 09, 2015
  
  4a41e4d7
- Replace /usr/bin with a more managible approach · 321a48b3
  Danny Auble authored Jun 09, 2015
  
  321a48b3
- Corrections to slurm.conf formatting · b373847e
  Morris Jette authored Jun 09, 2015
  
  b373847e
- In test4.3 unset the SINFO_FORMAT since it conflicts with the --long · 3fad9df1
  David Bigagli authored Jun 08, 2015
```
option.
```
  3fad9df1
05 Jun, 2015 8 commits
- Correct eof/wait logic in a test · 752a33db
  Morris Jette authored Jun 05, 2015
  
  752a33db
- More wording changes in addition to commit 3d11b90f · 619c4372
  Danny Auble authored Jun 05, 2015
  
  619c4372
- update unsubscription methods · 3d11b90f
  Danny Auble authored Jun 05, 2015
  
  3d11b90f
- Revert "Fix issue where command line options were parsed twice in sbatch." · b37004e2
  Danny Auble authored Jun 02, 2015
```
Only going to do this in the master as it may affect scripts.

This reverts commit 454f78e6.

Conflicts:
	NEWS
```
  b37004e2
- Update gres.conf description of file regular expressions · 22b7f1ad
  Morris Jette authored Jun 05, 2015
```
bug 1724
```
  22b7f1ad
- Small typo in expect code for test28.[345]. · e053d6d4
  Nicolas Joly authored Jun 05, 2015
  
  e053d6d4
- Spelling fixes in testsuite. · 6f56f61b
  Nicolas Joly authored Jun 05, 2015
  
  6f56f61b
- Update some LLNL links to SchedMD · ed84f96c
  Morris Jette authored Jun 05, 2015
  
  ed84f96c
04 Jun, 2015 8 commits
- Partially modify the commit 971d0021 . · 707268a5
  David Bigagli authored Jun 04, 2015
  
  707268a5
- Remove old code. · 05976915
  David Bigagli authored Jun 04, 2015
  
  05976915
- Move around some code to be cleaner · d0f6c4ac
  Morris Jette authored Jun 04, 2015
  
  d0f6c4ac
- fix test if unsufficient resources · 05eadb57
  Veronique Legrand authored Jun 04, 2015
```
Previously the test would generate an error if the default partition
  contained less than 3 nodes
bug 1720
```
  05eadb57
- Fix parsing for NetBSD sleep error message · ee72ee8c
  Nicolas Joly authored Jun 04, 2015
  
  ee72ee8c
- Cut and Paste error with variables in If statement · 61ad32e8
  Nancy Kritkausky authored Jun 04, 2015
  
  61ad32e8
- Fix broken build on non Cray. · df6fce57
  David Bigagli authored Jun 03, 2015
  
  df6fce57
- Fix sacctmgr archive loading of older versions. · bf07cfcc
  David Bigagli authored Jun 03, 2015
  
  bf07cfcc
03 Jun, 2015 1 commit

switch/cray: Refine PMI_CRAY_NO_SMP_ENV set · ef66b2eb

Morris Jette authored Jun 03, 2015

switch/cray: Refine logic to set PMI_CRAY_NO_SMP_ENV environment variable.
Rather than testing for the task distribution option, test the actual
task IDs to see fi they are monotonically increasing across all nodes.
Based upon idea from Brian Gilmer (Cray).

ef66b2eb

02 Jun, 2015 3 commits
- Fix issue where command line options were parsed twice in sbatch. · 454f78e6
  Danny Auble authored Jun 02, 2015
  
  454f78e6
- Fix issue where sbatch would set ntasks-per-node to 0 making any srun · 9f67ad99
  Danny Auble authored Jun 02, 2015
```
afterward cause a divide by zero error.
```
  9f67ad99
- When deleting a job from the system set the job_id to 0 to avoid memory · 0b007678
  Danny Auble authored Jun 01, 2015
```
corruption if thread uses the pointer basing validity off the id.

Bug 1710
```
  0b007678
01 Jun, 2015 3 commits
- Update NEWS. · c3383298
  David Bigagli authored Jun 01, 2015
  
  c3383298
- Fix squeue -o %X output to correctly handle NO_VAL and suffix. · 1cee0d58
  Nicolas Joly authored Jun 01, 2015
  
  1cee0d58
- Disable test when only one job can run at a time · 1361c7cf
  Morris Jette authored Jun 01, 2015
```
Disable test with select/linear and only one node
```
  1361c7cf
30 May, 2015 1 commit
- CRAY - Remove libpmi from rpm install · 374f2db9
  Danny Auble authored May 29, 2015
  
  374f2db9
29 May, 2015 3 commits
- Fix race condition where last array task might not get updated in the db. · d95f1ed6
  Brian Christiansen authored May 29, 2015
```
Bug 1495
```
  d95f1ed6
- select/linear: Correct CPU count · 58623ec7
  Morris Jette authored May 29, 2015
```
Correct count of CPUs allocated to job on system with hyperthreads.
The bug was introduced in commit a6d3074d
On a system with hyperthreads:
srun -n1 --ntasks-per-core=1 hostname
you would get:
slurmctld: error: job_update_cpu_cnt: cpu_cnt underflow on job_id 67072
```
  58623ec7
- Fix sreport core dump. · 75339f3e
  David Bigagli authored May 29, 2015
  
  75339f3e