Commits · 337db3f103245ad9e2de1aa92ae21b8fb4d1476e · Manuel G. Marciani / ces_slurm_simulator

25 Jun, 2012 6 commits
- BLUEGENE - fix issue if a cable was in an error state make it so we can · 337db3f1
  Danny Auble authored Jun 25, 2012
```
check if a block is still makable if the cable wasn't in error.
```
  337db3f1
- BLUEGENE - remove xassert if num_unused_cpus isn't correct · 013a496b
  Danny Auble authored Jun 25, 2012
  
  013a496b
- BLUEGENE - fix possible race condition if cleaning up a block and the · 66c0a2b3
  Danny Auble authored Jun 25, 2012
```
removal of the job on the block failed.
```
  66c0a2b3
- same as last patch for alloc_cpus -> cpus_alloc · 9955b7df
  Danny Auble authored Jun 25, 2012
  
  9955b7df
- Fix bug when querying accounting looking for a job node size. · bbb4e741
  Danny Auble authored Jun 25, 2012
  
  bbb4e741
- Add FAQ about job start time estimate · 31dcc0c2
  Rod Schultz authored Jun 24, 2012
  
  31dcc0c2
22 Jun, 2012 4 commits
- remove NEWS item missed from commit · 86196b70
  Danny Auble authored Jun 22, 2012
```
29d79ef8
```
  86196b70
- BLUEGENE - alter node count correctly if not given but task count is. · a92947d6
  Danny Auble authored Jun 22, 2012
  
  a92947d6
- BLUEGENE - fix race condition where if a nodeboard/card goes down at the · ea8ca91d
  Danny Auble authored Jun 22, 2012
```
same time a block is destroyed and that block just happens to be the
smallest overlapping block over the bad hardware.
```
  ea8ca91d
- Move logic to always give the first · c79cd503
  Danny Auble authored Jun 22, 2012
  
  c79cd503
21 Jun, 2012 4 commits
- Run backfill scheduler when only one job to set estimated start time · 6ad5a6b3
  Morris Jette authored Jun 21, 2012
  
  6ad5a6b3
- Fix for too small malloc in step layout logic · b3ac9da5
  Morris Jette authored Jun 20, 2012
  
  b3ac9da5
- Add error message on error condition · 70a77c02
  Morris Jette authored Jun 20, 2012
  
  70a77c02
- Revert commit 7fc48554 · 29d79ef8
  Morris Jette authored Jun 20, 2012
```
The underlying problem is in the sched plugin logic in SLURM v2.4
```
  29d79ef8
20 Jun, 2012 4 commits
- BGQ - fix issue where if a user was asking for tasks and ntasks-per-node · 74daee90
  Danny Auble authored Jun 20, 2012
```
but not node count the node count is correctly figured out.
```
  74daee90
- Fix bug in gang scheduling table initialization · 7fc48554
  Morris Jette authored Jun 20, 2012
```
Without this fix, gang scheduling mode could start without creating
a list resulting in an assert when jobs are submitted.
```
  7fc48554
- Mods for zero size allocation · 273dffec
  Morris Jette authored Jun 20, 2012
```
This change permits a user to get a zero size allocation by specifying
a task count of zero with no node count specification.
```
  273dffec
- Cosmetic mods. No logic changes. · a0ad146d
  Morris Jette authored Jun 20, 2012
  
  a0ad146d
18 Jun, 2012 3 commits
- update scontrol update block docs. · 16cf94c8
  Danny Auble authored Jun 18, 2012
  
  16cf94c8
- Fix issues on large jobs (>64k tasks) to have the correct counter type when · cd025504
  Danny Auble authored Jun 18, 2012
```
packing the step layout structure.
```
  cd025504
- BGQ - fix for if a request comes in smaller than the smallest block and · 1b7035dc
  Danny Auble authored Jun 18, 2012
```
we must use a small block instead of a shared midplane block.
```
  1b7035dc
15 Jun, 2012 2 commits
- BLUEGENE - update documentation for block state changing · ee753e09
  Danny Auble authored Jun 15, 2012
  
  ee753e09
- Clarify PrologSlurmctld configuration parameter use · 6d064d1e
  Morris Jette authored Jun 15, 2012
  
  6d064d1e
13 Jun, 2012 4 commits
- BGQ - quiter debug when the real time server comes back but there are · 9c0ca8db
  Danny Auble authored Jun 13, 2012
```
still messages we find when we poll but haven't given it back to the real
time yet.
```
  9c0ca8db
- fix typo. · bb94cf0d
  Danny Auble authored Jun 13, 2012
  
  bb94cf0d
- fix test to work correctly with mpich2 · 1ae40087
  Danny Auble authored Jun 13, 2012
  
  1ae40087
- Improve memory consumption on step layouts with high task count. · f0d470e6
  Danny Auble authored Jun 13, 2012
  
  f0d470e6
12 Jun, 2012 3 commits
- update last modified times · 73689f4f
  Danny Auble authored Jun 12, 2012
  
  73689f4f
- update select plugin to be correct · 70da2d61
  Nathan Yee authored Jun 12, 2012
  
  70da2d61
- BGQ - Added information on how to setup the runjob_mux to run as SlurmUser. · 68e797cd
  Danny Auble authored Jun 12, 2012
  
  68e797cd
11 Jun, 2012 2 commits
- minor changes to the file · ce07c509
  Danny Auble authored Jun 11, 2012
  
  ce07c509
- Initial patch adding cgroup web page from Martin Perry, Bull · b28bb581
  Martin Perry authored Jun 11, 2012
  
  b28bb581
07 Jun, 2012 1 commit
- update docs for sacct --dump · 82bc3b12
  Danny Auble authored Jun 06, 2012
  
  82bc3b12
05 Jun, 2012 4 commits

Permit held job to be modified · 1fee4fe4

Phil Eckert authored Jun 05, 2012

I was doing some checking to find out why the the 2.4 branch and
master branch of schedmd was not allowing held jobs to be modified,
when attempting to do so,  scontrol would return:

slurm_update error: Requested partition configuration not available now

I did some debugging and found that it was caused by code added to the tail end
of job_limits_check() in job_mgr.c. It  had this addition:

        } else if (job_ptr->priority == 0) {   /* user or administrator hold */
                fail_reason = WAIT_HELD;
        }

It is causes all modifications done by scontrol on held jobs, to fail.

1fee4fe4

Quieting down the job_mgr · 2cde8d3f

Don Lipari authored Jun 05, 2012

I'd like to propose quieting down the job_mgr a tad.  This is a refinement to:
https://github.com/SchedMD/slurm/commit/30a986f4c600291876f4ec3e3949934512f2cba5

2cde8d3f

BGQ - When using an old IBM driver cnodes that go into error because of · 07843fd3
Danny Auble authored Jun 04, 2012
```
a job kill timeout aren't always reported to the system.  This is now
handled by the runjob_mux plugin.
```
07843fd3
BGQ - move variable to avoid warning when not on real BGQ system · 88aa72ad
Danny Auble authored Jun 01, 2012

88aa72ad

04 Jun, 2012 1 commit

Document enforcement of job's --mem option · 54b63642

Rod Schultz authored Jun 04, 2012

I'd like to add the following disclaimer to the documentation of the --mem option to the salloc/sbatch/srun commands. There is currently similar wording in the slurm.conf file, but I've received a bug report in which the memory limits were exceeded (until the next accounting poll).

NOTE: Enforcement of memory limits currently requires enabling of accounting,
which samples memory use on a periodic basis (data need not be stored, just collected).
A task may exceed the memory limit until the next periodic accounting sample.

Rod Schultz, Bull

54b63642

01 Jun, 2012 2 commits
- remove debug from last commit · 2bcc3b2e
  Danny Auble authored Jun 01, 2012
  
  2bcc3b2e
- BGQ - better fix for making new blocks when nodeboard goes down and using · c0fb0bbb
  Danny Auble authored Jun 01, 2012
```
sub-blocks.
```
  c0fb0bbb