Commits · 61d6d5371aa4506b3cffd6398b3f42fd4939ab6a · Manuel G. Marciani / ces_slurm_simulator

09 Nov, 2012 1 commit
- BGQ - added --verbose=OFF when srun --quiet is used · 61d6d537
  Danny Auble authored Nov 08, 2012
  
  61d6d537
08 Nov, 2012 2 commits
- start deprecation of sacct --dump --fdump · c24b46e7
  Danny Auble authored Nov 08, 2012
  
  c24b46e7
- Added new DebugFlags - Energy for AcctGatherEnergy plugins. · f6cabc1f
  Danny Auble authored Nov 08, 2012
```
Signed-off-by: Danny Auble <da@schedmd.com>
```
  f6cabc1f
07 Nov, 2012 5 commits

News for srun update · d14a872a
Danny Auble authored Nov 07, 2012

d14a872a
CRAY - Replace srun.pl with launch/aprun plugin to use srun to wrap the · 422a411e
Danny Auble authored Nov 07, 2012
```
aprun process instead of a perl script.
```
422a411e

Modify default log timestamp pto conform to RFC 5424 format · 4b941731

Janne Blomqvist authored Nov 07, 2012

the attached patch changes the default timestamp format in logfiles to conform to RFC 5424 (the current version of the syslog RFC). It is identical to the current default "ISO 8601" timestamp used by slurm, with the exception that the timezone offset is appended. This has the benefits of

1) It's unambiguous.

2) Avoids potential confusion for admins running cluster(s) in different timezones.

3) Might help debug issues related to DST transitions. (More on that later..)

(To be pedantic, a RFC 5424 timestamp is still a valid ISO 8601 timestamp, but the converse is not necessarily true. So there is RFC 3339 which is a "profile" of ISO 8601, that is a subset, recommended for internet protocols. The RFC 5424 timestamp, in turn, is a subset of the RFC 3339 timestamps.)

The previous behavior of can be used by running configure with the

--disable-rfc5424time

flag.

4b941731

BGQ - validate correct ntasks_per_node · 7eb1a451
Danny Auble authored Nov 06, 2012

7eb1a451
BGQ - Fix issue when running srun outside of an allocation and only · 9e25da94
Danny Auble authored Nov 06, 2012
```
specifying the number of tasks and not the number of nodes.
```
9e25da94

05 Nov, 2012 2 commits

Cray - Improve signal handling for spawned tasks on job cancel · a98b849a

Morris Jette authored Nov 05, 2012

On job kill requeust, send SIGCONT, SIGTERM, wait KillWait and send
SIGKILL. Previously just sent SIGKILL to tasks.

a98b849a

Cray - Improve signal handling for spawned tasks on job cancel · 3ff9f17e

Morris Jette authored Nov 05, 2012

On job kill requeust, send SIGCONT, SIGTERM, wait KillWait and send
SIGKILL. Previously just sent SIGKILL to tasks.

3ff9f17e

02 Nov, 2012 3 commits
- Add NEWS and RELEASE_NOTES information about board specification · 375ec443
  Morris Jette authored Nov 02, 2012
  
  375ec443
- Remove duplicate NEWS item · c3fde3ce
  Morris Jette authored Nov 02, 2012
  
  c3fde3ce
- Update NEWS for start of v2.4.5 work · 832ca7df
  Morris Jette authored Nov 02, 2012
  
  832ca7df
31 Oct, 2012 1 commit
- Add AccountingStorageEnforce=safe option to provide method to avoid jobs · 177f85e7
  Mark Nelson authored Oct 31, 2012
```
launching that wouldn't be able to run to completion because of a
GrpCPUMins limit.
```
  177f85e7
29 Oct, 2012 2 commits

Fix bug with topology/tree and job with min-max node count. · e15cab3f

Morris Jette authored Oct 29, 2012

Now try to get max node count rather than minimizing leaf switches used.
For example, if each leaf switch has 8 nodes then a request for -N4-16
would allocate 8 nodes (one leaf switch) rather than 16 nodes over two
leaf switches.

e15cab3f

Cray - Prevent calling basil_confirm more than once per job using a flag. · faa96d55

Morris Jette authored Oct 29, 2012

Anyhow, after applying the patch, I was still running into the same difficulty. Upon a closer look, I saw that I was still receiving the ALPS backend error in the slurmctld.log file. When I examined the code pertaining this and ran some SLURM-independent tests, I found that we were executing the do_basil_confirm function multiple times in the cases where it would fail. My independent tests show precisely the same behaviour; that is, if you make a reservation request, then successfully confirm it and then attempt to confirm it again, you receive this error message. However, the "apstat -rvv" command shows that the ALPS reservation is fine and therefore I concluded that this particular ALPS/BASIL message is more of an informational one and not a "show-stopper." In other words, I can consider the node ready at this point.
As a simple work around, I currently just inserted an if-block immediately after the call to "basil_confirm" in function "do_basil_confirm" in ".../src/plugins/select/cray/basil_interface.c." The if-statment checks for "BE_BACKEND" and if this is the result then it prints an informational message to slurmctld.log and sets the variable rc=0 so that we can consider the node ready. This, now allows my prolog scripts to run and I can clearly see the SLURM message that I had placed in that if-block.
However, I am not certain if we really should just allow this error code to pass through as it seems like it could be a fairly generic code and there could be various other causes of it where we would not wish to allow it to pass. I really only want to limit the number of calls to basil_confirm to one. Perhaps I could add a field to the job_record so that I can mark whether the ALPS reservation had been confirmed or not.

faa96d55

26 Oct, 2012 2 commits
- Accounting - Change empty jobacctinfo structs to not actually be used · d3da7afd
  Danny Auble authored Oct 25, 2012
```
instead of putting 0's into the database we put NO_VALS and have sacct
figure out jobacct_gather wasn't used.
```
  d3da7afd
- Intel MIC processor support added using gres/mic plugin · 9092b68d
  Olli-Pekka Lehto authored Oct 25, 2012
  
  9092b68d
25 Oct, 2012 2 commits
- Correction to slurmdbd communications failure handling logic · 26871b8d
  Morris Jette authored Oct 25, 2012
```
Incorrect error codes returned in some cases, especially if the slurmdbd is down
```
  26871b8d
- Cray - Defer salloc until after PrologSlurmctld completes. · a5645a19
  Morris Jette authored Oct 25, 2012
  
  a5645a19
24 Oct, 2012 1 commit
- smap - spread node information across multiple lines for larger systems. · 2c8bd966
  Morris Jette authored Oct 24, 2012
```
Previously for linux systems all information was placed on a single line.
```
  2c8bd966
23 Oct, 2012 1 commit
- GQ - Cleaner handling of cnode failures when reported through the runjob · f6a33bad
  Danny Auble authored Oct 22, 2012
```
interface instead of through the normal method.
```
  f6a33bad
22 Oct, 2012 3 commits
- BGQ - Fix for printing realtime server debug correctly. · 9054e4e0
  Danny Auble authored Oct 22, 2012
  
  9054e4e0
- Update NEWS to describe recent work by Matthieu Hautreux, CEA on task/cgroup · d65fa557
  Morris Jette authored Oct 22, 2012
  
  d65fa557
- Add ReconfigFlags value of KeepPartState. See "man slurm.conf" for details. · 7fc102c1
  Don Albert authored Oct 22, 2012
  
  7fc102c1
19 Oct, 2012 1 commit
- Permit reservations to allow or deny access by account and/or user. · 91562433
  Morris Jette authored Oct 19, 2012
  
  91562433
18 Oct, 2012 6 commits
- BGQ - Make it so if a nodeboard goes in error any block using that midplane · ea39371a
  Danny Auble authored Oct 18, 2012
```
for passthrough gets removed on a dynamic system.
```
  ea39371a
- BGQ - Add logic to make it so blocks can't use a midplane with a nodeboard · 4b1f6608
  Danny Auble authored Oct 18, 2012
```
in error for passthrough.
```
  4b1f6608
- Add SLURM_NODELIST to env vars available to Prolog and Epilog. · 7040bd62
  Morris Jette authored Oct 18, 2012
  
  7040bd62
- Fixed InactiveLimit math to work correctly · 13a8882a
  Danny Auble authored Oct 17, 2012
  
  13a8882a
- BGQ - Fixed InactiveLimit to work correctly to avoid scenarios where a · 65fef1ff
  Danny Auble authored Oct 17, 2012
```
user's pending allocation was started with srun and then for some reason
the slurmctld was brought down and while it was down the srun was removed.
```
  65fef1ff
- BGQ - Add functionality to make it so we track the actions on a block. · baf267e0
  Danny Auble authored Oct 17, 2012
```
This is needed for when a free request is added to a block but there are
jobs finishing up so we don't start new jobs on the block since they will
fail on start.
```
  baf267e0
17 Oct, 2012 2 commits
- BlueGene - don't change pending job's node count when changing partition. · 2b59f495
  Morris Jette authored Oct 17, 2012
```
Previously the node count would change from c-node count to midplane count
(but still be interpreted as a c-node count).
```
  2b59f495
- Minor formatting changes to priority/multifactor2 plugin · 291c3d86
  jette authored Oct 17, 2012
```
No real changes to logic other than some additional error checking.
```
  291c3d86
16 Oct, 2012 2 commits
- Optimize preemption logic · 90e4dfa5
  Morris Jette authored Oct 16, 2012
```
Preempt jobs only when insufficient idle resources exist to start job,
regardless of the node weight.
```
  90e4dfa5
- Fix for older < glibc 2.4 systems to use euidaccess instead of eaccess. · d9e28215
  Danny Auble authored Oct 15, 2012
  
  d9e28215
05 Oct, 2012 2 commits

Restore gang scheduling functionality. · c60cd749
Morris Jette authored Oct 05, 2012
```
Preemptor was not being scheduled.
Fix for bugzilla #3.
```
c60cd749

Revert commit · 1a5e1936

Morris Jette authored Oct 05, 2012

While this change lets gang scheduling happen, it overallocates
resources from different priority partitions when gang scheduling
is not running.

1a5e1936

04 Oct, 2012 1 commit
- bug in allocating resources with Shared=NO and gang scheduling · 5deba75c
  Morris Jette authored Oct 04, 2012
```
Preemptor was not being scheduled. See bugzilla #3 for details
```
  5deba75c
02 Oct, 2012 1 commit

Correct -mem-per-cpu logic for multiple threads per core · 6a103f2e

Morris Jette authored Oct 02, 2012

See bugzilla bug 132

When using select/cons_res and CR_Core_Memory, hyperthreaded nodes may be
overcommitted on memory when CPU counts are scaled. I've tested 2.4.2 and HEAD
(2.5.0-pre3).

Conditions:
-----------
* SelectType=select/cons_res
* SelectTypeParameters=CR_Core_Memory
* Using threads
  - Ex. "NodeName=linux0 Sockets=1 CoresPerSocket=4 ThreadsPerCore=2
RealMemory=400"

Description:
------------
In the cons_res plugin, _verify_node_state() in job_test.c checks if a node has
sufficient memory for a job. However, the per-CPU memory limits appear to be
scaled by the number of threads. This new value may exceed the available memory
on the node. And, once a node is overcommitted on memory, future memory checks
in _verify_node_state() will always succeed.

Scenario to reproduce:
----------------------
With the example node linux0, we run a single-core job with 250MB/core
    srun --mem-per-cpu=250 sleep 60

cons_res checks that it will fit: ((real - alloc) >= job mem)
    ((400 - 0) >= 250) and the job starts

Then, the memory requirement is doubled:
    "slurmctld: error: cons_res: node linux0 memory is overallocated (500) for
job X"
    "slurmd: scaling CPU count by factor of 2"

This job should not have started

While the first job is still running, we submit a second, identical job
    srun --mem-per-cpu=250 sleep 60

cons_res checks that it will fit:
    ((400 - 500) >= 250), the unsigned int wraps, the test passes, and the job
starts

This second job also should not have started

6a103f2e