Commits · f0df81dc99580644e9c472f1bc0b883728e809a4 · Manuel G. Marciani / ces_slurm_simulator

09 Mar, 2012 1 commit
- switch to assert to avoid C++ issue · f0df81dc
  Danny Auble authored Mar 08, 2012
  
  f0df81dc
08 Mar, 2012 10 commits
- BGQ - if doing mesh we have to have the starting corner be the first · 07f72dbd
  Danny Auble authored Mar 08, 2012
```
midplane added.  If we don't do that then we are hosed.  So we just always
add it first to avoid issues.
```
  07f72dbd
- BLUEGENE - only load state file when recovering state. · 403b61cb
  Danny Auble authored Mar 08, 2012
  
  403b61cb
- BGQ - fix starting location to use the block allocators start which is · b411c9cc
  Danny Auble authored Mar 08, 2012
```
always right and much easier to come by.
```
  b411c9cc
- BLUEGENE - fixed potential deadlock issue when a nodeboard goes down and · 06074ffb
  Danny Auble authored Mar 08, 2012
```
people are polling the system at the exact same time.
```
  06074ffb
- add spacing · 0cd38685
  Danny Auble authored Mar 08, 2012
  
  0cd38685
- BGQ - if a cnode goes into available state don't worry about killing jobs · 8397b363
  Danny Auble authored Mar 08, 2012
```
that just would be silly.
```
  8397b363
- BGQ - clear bit even if there is no cnode_err_cnt just to be sure. · 6f239c95
  Danny Auble authored Mar 08, 2012
  
  6f239c95
- Add uptime to "slurmd -C" output · 1027de07
  Morris Jette authored Mar 07, 2012
  
  1027de07
- Remove test logic dating back to early slurm development · cd6dff5f
  Morris Jette authored Mar 07, 2012
  
  cd6dff5f
- Cosmetic changes · 02510edc
  Morris Jette authored Mar 07, 2012
  
  02510edc
07 Mar, 2012 4 commits
- Cosmetic mods · 481f5a96
  Morris Jette authored Mar 07, 2012
  
  481f5a96
- Add sacct test using node names · 07e0dcf9
  Morris Jette authored Mar 07, 2012
  
  07e0dcf9
- give example how to define frontend nodes · 6bf04fab
  Danny Auble authored Mar 06, 2012
  
  6bf04fab
- FRONTEND - fix issue where if a compute node was in a down state and · caadbfcb
  Danny Auble authored Mar 06, 2012
```
an admin updates the node to idle/resume the compute nodes will go
instantly to idle instead of idle* which means no response.
```
  caadbfcb
06 Mar, 2012 9 commits
- revert 10a5de6a · df5935e6
  Danny Auble authored Mar 06, 2012
  
  df5935e6
- BLUEGENE - make it so the epilog runs until slurmctld tells it the job is · 9c461154
  Danny Auble authored Mar 06, 2012
```
gone.  Previously it had a timelimit which has proven to not be the right
thing.
```
  9c461154
- BGQ - catch errors from the kill option of the runjob client. · 2a56fd6d
  Danny Auble authored Mar 06, 2012
  
  2a56fd6d
- BGQ - update documentation on sub-block limitations. · 0ec5b62a
  Danny Auble authored Mar 06, 2012
  
  0ec5b62a
- BLUEGENE - added more debugging when a job's magic is bad. · 95d6e227
  Danny Auble authored Mar 06, 2012
  
  95d6e227
- BGQ - make new function to check if job is finished before purge · 10a5de6a
  Danny Auble authored Mar 06, 2012
  
  10a5de6a
- BGQ - changed default from NO_VAL to 0 · 1adc20ee
  Danny Auble authored Mar 06, 2012
  
  1adc20ee
- Commenet out reference to slurchemy tool · 0154de8b
  Morris Jette authored Mar 06, 2012
  
  0154de8b
- Fix typo on name · 7eb8e3f3
  Morris Jette authored Mar 06, 2012
  
  7eb8e3f3
02 Mar, 2012 7 commits

Mods in priority/multifactor for prio=1 · b223af49

Morris Jette authored Mar 02, 2012

In SLURM verstion 2.4, we now schedule jobs at priority=1 and no longer treat
it as a special case.

b223af49

Cosmetic mods to priority logic · 0810353e
Morris Jette authored Mar 02, 2012

0810353e
Merge branch 'slurm-2.3' · ec372e00
Morris Jette authored Mar 02, 2012

ec372e00
cray/srun wrapper, don't use aprun -q by default · ea9adc17
Morris Jette authored Mar 02, 2012
```
In cray/srun wrapper, only include aprun "-q" option when srun "--quiet"
option is used.
```
ea9adc17
Change a slurmd msg from info() to debug() · 73f915bf
Morris Jette authored Mar 01, 2012

73f915bf
Merge branch 'slurm-2.3' · c06064bc
Morris Jette authored Mar 01, 2012

c06064bc

Fix for possible SEGV · ed56303c

Morris Jette authored Mar 01, 2012

Here's what seems to have happened:

- A job was pending, waiting for resources.
- slurm.conf was changed to remove some nodes, and a scontrol reconfigure was done.
- As a result of the reconfigure, the pending job became non-runnable, due to "Requested node configuration is not available". The scheduler set the job state to JOB_FAILED and called delete_job_details.
- scontrol reconfigure was done again.
- read_slurm_conf called _restore_job_dependencies.
- _restore_job_dependencies called build_feature_list for each job in the job list
- When build_feature_list tried to reference the now deleted job details for the failed job, it got a segmentation fault.

The problem was reported by a customer on Slurm 2.2.7.  I have not been able to reproduce it on 2.4.0-pre3, although the relevant code looks the same. There may be a timing window. The attached patch attempts to fix the problem by adding a check to _restore_job_dependencies.  If the job state is JOB_FAILED, the job is skipped.

Regards,
Martin

This is an alternative solutionh to bug316980fix.patch

ed56303c

01 Mar, 2012 1 commit
- Fix build so "make dist" works · 1a6e3fbf
  Morris Jette authored Feb 29, 2012
  
  1a6e3fbf
29 Feb, 2012 1 commit
- Fix bug in cray/srun wrapper stdin/out/err file handling. · 2ca7a0fc
  Morris Jette authored Feb 29, 2012
  
  2ca7a0fc
28 Feb, 2012 7 commits
- Merge branch 'slurm-2.3' · 811c7a20
  Morris Jette authored Feb 28, 2012
  
  811c7a20
- Cosmetic mods · 5d076769
  Morris Jette authored Feb 28, 2012
  
  5d076769
- Fix for missing bracket · e86ecf17
  Morris Jette authored Feb 28, 2012
  
  e86ecf17
- Merge branch 'slurm-2.3' · ad401e35
  Morris Jette authored Feb 28, 2012
  
  ad401e35
- Note recent SLURM changes. · 38619c30
  Morris Jette authored Feb 28, 2012
  
  38619c30
- Added a partition field in Lua job submit plugin · 2ea3ce4f
  Rémi Palancher authored Feb 28, 2012
```
Added default_time field in partition records in Lua job submit
plugin.
```
  2ea3ce4f
- Added a new Lua library name to try in plugins · 64f29ea6
  Rémi Palancher authored Feb 28, 2012
```
Added a new Lua library name to try loading with dlopen() in Lua
based plugins.
```
  64f29ea6