Commits · c79cd5038d0d88b1faf6431dc338227dbf98dd53 · Manuel G. Marciani / ces_slurm_simulator

22 Jun, 2012 1 commit
- Move logic to always give the first · c79cd503
  Danny Auble authored Jun 22, 2012
  
  c79cd503
21 Jun, 2012 4 commits
- Run backfill scheduler when only one job to set estimated start time · 6ad5a6b3
  Morris Jette authored Jun 21, 2012
  
  6ad5a6b3
- Fix for too small malloc in step layout logic · b3ac9da5
  Morris Jette authored Jun 20, 2012
  
  b3ac9da5
- Add error message on error condition · 70a77c02
  Morris Jette authored Jun 20, 2012
  
  70a77c02
- Revert commit 7fc48554 · 29d79ef8
  Morris Jette authored Jun 20, 2012
```
The underlying problem is in the sched plugin logic in SLURM v2.4
```
  29d79ef8
20 Jun, 2012 4 commits
- BGQ - fix issue where if a user was asking for tasks and ntasks-per-node · 74daee90
  Danny Auble authored Jun 20, 2012
```
but not node count the node count is correctly figured out.
```
  74daee90
- Fix bug in gang scheduling table initialization · 7fc48554
  Morris Jette authored Jun 20, 2012
```
Without this fix, gang scheduling mode could start without creating
a list resulting in an assert when jobs are submitted.
```
  7fc48554
- Mods for zero size allocation · 273dffec
  Morris Jette authored Jun 20, 2012
```
This change permits a user to get a zero size allocation by specifying
a task count of zero with no node count specification.
```
  273dffec
- Cosmetic mods. No logic changes. · a0ad146d
  Morris Jette authored Jun 20, 2012
  
  a0ad146d
18 Jun, 2012 3 commits
- update scontrol update block docs. · 16cf94c8
  Danny Auble authored Jun 18, 2012
  
  16cf94c8
- Fix issues on large jobs (>64k tasks) to have the correct counter type when · cd025504
  Danny Auble authored Jun 18, 2012
```
packing the step layout structure.
```
  cd025504
- BGQ - fix for if a request comes in smaller than the smallest block and · 1b7035dc
  Danny Auble authored Jun 18, 2012
```
we must use a small block instead of a shared midplane block.
```
  1b7035dc
15 Jun, 2012 2 commits
- BLUEGENE - update documentation for block state changing · ee753e09
  Danny Auble authored Jun 15, 2012
  
  ee753e09
- Clarify PrologSlurmctld configuration parameter use · 6d064d1e
  Morris Jette authored Jun 15, 2012
  
  6d064d1e
13 Jun, 2012 4 commits
- BGQ - quiter debug when the real time server comes back but there are · 9c0ca8db
  Danny Auble authored Jun 13, 2012
```
still messages we find when we poll but haven't given it back to the real
time yet.
```
  9c0ca8db
- fix typo. · bb94cf0d
  Danny Auble authored Jun 13, 2012
  
  bb94cf0d
- fix test to work correctly with mpich2 · 1ae40087
  Danny Auble authored Jun 13, 2012
  
  1ae40087
- Improve memory consumption on step layouts with high task count. · f0d470e6
  Danny Auble authored Jun 13, 2012
  
  f0d470e6
12 Jun, 2012 3 commits
- update last modified times · 73689f4f
  Danny Auble authored Jun 12, 2012
  
  73689f4f
- update select plugin to be correct · 70da2d61
  Nathan Yee authored Jun 12, 2012
  
  70da2d61
- BGQ - Added information on how to setup the runjob_mux to run as SlurmUser. · 68e797cd
  Danny Auble authored Jun 12, 2012
  
  68e797cd
11 Jun, 2012 2 commits
- minor changes to the file · ce07c509
  Danny Auble authored Jun 11, 2012
  
  ce07c509
- Initial patch adding cgroup web page from Martin Perry, Bull · b28bb581
  Martin Perry authored Jun 11, 2012
  
  b28bb581
07 Jun, 2012 1 commit
- update docs for sacct --dump · 82bc3b12
  Danny Auble authored Jun 06, 2012
  
  82bc3b12
05 Jun, 2012 4 commits

Permit held job to be modified · 1fee4fe4

Phil Eckert authored Jun 05, 2012

I was doing some checking to find out why the the 2.4 branch and
master branch of schedmd was not allowing held jobs to be modified,
when attempting to do so,  scontrol would return:

slurm_update error: Requested partition configuration not available now

I did some debugging and found that it was caused by code added to the tail end
of job_limits_check() in job_mgr.c. It  had this addition:

        } else if (job_ptr->priority == 0) {   /* user or administrator hold */
                fail_reason = WAIT_HELD;
        }

It is causes all modifications done by scontrol on held jobs, to fail.

1fee4fe4

Quieting down the job_mgr · 2cde8d3f

Don Lipari authored Jun 05, 2012

I'd like to propose quieting down the job_mgr a tad.  This is a refinement to:
https://github.com/SchedMD/slurm/commit/30a986f4c600291876f4ec3e3949934512f2cba5

2cde8d3f

BGQ - When using an old IBM driver cnodes that go into error because of · 07843fd3
Danny Auble authored Jun 04, 2012
```
a job kill timeout aren't always reported to the system.  This is now
handled by the runjob_mux plugin.
```
07843fd3
BGQ - move variable to avoid warning when not on real BGQ system · 88aa72ad
Danny Auble authored Jun 01, 2012

88aa72ad

04 Jun, 2012 1 commit

Document enforcement of job's --mem option · 54b63642

Rod Schultz authored Jun 04, 2012

I'd like to add the following disclaimer to the documentation of the --mem option to the salloc/sbatch/srun commands. There is currently similar wording in the slurm.conf file, but I've received a bug report in which the memory limits were exceeded (until the next accounting poll).

NOTE: Enforcement of memory limits currently requires enabling of accounting,
which samples memory use on a periodic basis (data need not be stored, just collected).
A task may exceed the memory limit until the next periodic accounting sample.

Rod Schultz, Bull

54b63642

01 Jun, 2012 4 commits
- remove debug from last commit · 2bcc3b2e
  Danny Auble authored Jun 01, 2012
  
  2bcc3b2e
- BGQ - better fix for making new blocks when nodeboard goes down and using · c0fb0bbb
  Danny Auble authored Jun 01, 2012
```
sub-blocks.
```
  c0fb0bbb
- BLUEGENE - correct logic to handle splitting a block. · 0a1837a5
  Danny Auble authored Jun 01, 2012
  
  0a1837a5
- BGQ - Fix issue when a nodeboard goes down and you want to combine blocks · 8f429bfb
  Danny Auble authored May 31, 2012
```
to make a larger small block and are running with sub-blocks.
```
  8f429bfb
31 May, 2012 2 commits
- BGQ - Fix checking for IO on a block with new IBM driver V1R1M1 previous · f6bede58
  Danny Auble authored May 31, 2012
```
function didn't always work correctly.
```
  f6bede58
- BGQ - add check for new IBM function to check for IO for a block and · 29de9b72
  Danny Auble authored May 31, 2012
```
rerun autogen.sh
```
  29de9b72
30 May, 2012 3 commits
- BGQ - fix issue where if a step uses the entire allocation and then · d08e2813
  Danny Auble authored May 30, 2012
```
the next step in the allocation only uses part of the allocation it gets
the correct cnodes.
```
  d08e2813
- Fix in scheduling logic that can delay jobs with min/max node counts. · aa7c59d3
  Morris Jette authored May 30, 2012
  
  aa7c59d3
- In etc/init.d/slurm move check for scontrol · 1385a9f0
  Andy Wettstein authored May 30, 2012
```
In etc/init.d/slurm move check for scontrol after sourcing
/etc/sysconfig/slurm. Patch from Andy Wettstein, University of Chicago.
```
  1385a9f0
29 May, 2012 1 commit
- Fix bug that clears job pending reason field · f0324da5
  Don Lipari authored May 29, 2012
  
  f0324da5
25 May, 2012 1 commit

Correct default NodeAddr · fbc0e712

Morris Jette authored May 25, 2012

According to man slurm.conf, the default for NodeAddr is NodeName:

  "By  default, the NodeAddr will be identical in value to NodeName."

However, it seems the default is NodeHostname (when that differs from
NodeName): With the following in slurmnodes.conf:

Nodename=c0-0 NodeHostname=compute-0-0 ...

I get

NodeName=c0-0 Arch=x86_64 CoresPerSocket=2
   CPUAlloc=0 CPUErr=0 CPUTot=4 Features=intel,rack0,hugemem
   Gres=(null)
***
   NodeAddr=compute-0-0 NodeHostName=compute-0-0
***
   OS=Linux RealMemory=3949 Sockets=2
   State=IDLE ThreadsPerCore=1 TmpDisk=10000 Weight=1027
   BootTime=2012-05-08T15:07:08 SlurmdStartTime=2012-05-25T10:30:10

(This is with 2.4.0-0.pre4.)

(We are planning to use cx-y instead of compute-x-y (the rocks default)
on our next cluster, to save some typing.)

--
Regards,
Bjørn-Helge Mevik, dr. scient,
Research Computing Services, University of Oslo

fbc0e712