Commits · f217a8615ccb4d4f36d2aa0db9223874d20cfe5c · Manuel G. Marciani / ces_slurm_simulator

09 Oct, 2015 2 commits
- Add link to ib2slurm in topology web page · f217a861
  Morris Jette authored Oct 09, 2015
  
  f217a861
- Avoid srun segv on job allocate failure · 4b0e3c75
  Morris Jette authored Oct 09, 2015
```
If a job allocation returns some invalid contents, the pointer
  to the job structure may be NULL. This change preserves the error
  message and avoids a segv.
```
  4b0e3c75
08 Oct, 2015 2 commits

Fix case where if the backup slurmdbd has existing connections when it gives... · 44bb06bc

Brian Christiansen authored Oct 07, 2015

Fix case where if the backup slurmdbd has existing connections when it gives up control that the it would be killed.

If the backup had existing connections when giving up control, it would try to
signal the existing threads by using pthread_kill to send SIGKILL to the
threads. The problem is that SIGKILL doesn't go the thread but the main process
and the backup dbd would be killed.

44bb06bc

Fixed slurmctld not sending cold-start messages correctly to the database · 4ed2f8c6
Danny Auble authored Oct 07, 2015
```
when a cold-start (-c) happens to the slurmctld.
```
4ed2f8c6

07 Oct, 2015 14 commits
- Merge remote-tracking branch 'origin/slurm-14.11' into slurm-15.08 · 2dcc2732
  Danny Auble authored Oct 07, 2015
```
Conflicts:
	src/sacct/options.c
```
  2dcc2732
- Fix sacct -j, (nothing but a comma) to not return all jobs. · d5979ef6
  Danny Auble authored Oct 07, 2015
  
  d5979ef6
- sacctmgr - Don't allow default account associations to be removed · 9f602cba
  Danny Auble authored Oct 07, 2015
```
from a user.

This would cause the slurmctld to cache the old default which wasn't valid
and cause the user to have to request the association always.
```
  9f602cba
- Merge remote-tracking branch 'origin/slurm-14.11' into slurm-15.08 · f5d6b175
  Danny Auble authored Oct 07, 2015
```
Conflicts:
	NEWS
	src/plugins/accounting_storage/mysql/as_mysql_job.c
```
  f5d6b175
- Document sbatch cpu/mem binding env vars · 8e949f72
  Morris Jette authored Oct 07, 2015
```
bug 2009
```
  8e949f72
- Corret plane distribution test · a254c6a5
  Morris Jette authored Oct 07, 2015
```
Each node could have fewer tasks allocated on a node than the plane
  size, which broke the test. The plane size needs to be treated
  as a maximum consecutive rank value.
```
  a254c6a5
- Update documentation for who can set job priority · 0d3ecfe3
  Thomas Cadeau authored Oct 07, 2015
  
  0d3ecfe3
- Allow admin and operator to set job priority at submission · 2dab024d
  Morris Jette authored Oct 07, 2015
  
  2dab024d
- Do not send burst buffer stage out email unless the job uses burst buffers · 3a63b4e0
  Morris Jette authored Oct 07, 2015
```
byg 2013
```
  3a63b4e0
- Update NEWS. · 075668ae
  David Bigagli authored Oct 07, 2015
  
  075668ae
- Fix slurcmtld allowing root to see job steps using squeues -s. · 1026d698
  Hongjia Cao authored Oct 07, 2015
  
  1026d698
- Update NEWS · 170d17d7
  David Bigagli authored Oct 07, 2015
  
  170d17d7
- Fix srun core dump. · 30a5d677
  Hongjia Cao authored Oct 07, 2015
  
  30a5d677
- Fix issue with sacct, printing 0_0 for array's that had finished in the · 75ea13a3
  Danny Auble authored Oct 06, 2015
```
database but the start record hadn't made it yet.
```
  75ea13a3
06 Oct, 2015 13 commits
- Fix for prolog container cgroup · 80dcbf7e
  Morris Jette authored Oct 06, 2015
```
Create a "task" cgroup at job allocation time via the prolog container.
  A dummy "sleep" process will occupy the cgroup so long as the job exits.
bug 1994
```
  80dcbf7e
- Cosmetic changes, no logic changes · 619ec0f1
  Morris Jette authored Oct 06, 2015
  
  619ec0f1
- Cosmetic changes, no logic changes · c4451a1f
  Morris Jette authored Oct 06, 2015
  
  c4451a1f
- Make debug print out correctly · ae323e27
  Danny Auble authored Oct 06, 2015
  
  ae323e27
- MySQL - Improve the code with asking for jobs in a suspended state. · f0f3dfdb
  Danny Auble authored Oct 06, 2015
  
  f0f3dfdb
- Fix spec file to look for mariadb or mysql devel packages for build · 42e22f03
  Danny Auble authored Oct 06, 2015
```
requirements.
```
  42e22f03
- Add acct_gather_energy/ibmaem plugin · 8937f58a
  Axel Auweter authored Oct 06, 2015
```
Add acct_gather_energy/ibmaem plugin for systems with IBM Systems Director
    Active Energy Manager.
```
  8937f58a
- Merge branch 'slurm-14.11' into slurm-15.08 · dd13c747
  Morris Jette authored Oct 06, 2015
```
Conflicts:
	src/slurmctld/job_mgr.c
```
  dd13c747
- Permit job_submit plugin to set a job's priority · 3b5f13fa
  Thomas Cadeau authored Oct 06, 2015
```
bug 2011
```
  3b5f13fa
- Fix for use of uninitialized variable · da84d1d7
  jette authored Oct 05, 2015
```
It would not cause any problem other than excess memory being
allocated, but was found by CLANG.
```
  da84d1d7
- Fix sacct to not return all jobs if the -j option is given with a trailing · 2646e761
  Danny Auble authored Oct 05, 2015
```
','.
```
  2646e761
- Merge branch 'slurm-14.11' into slurm-15.08 · 14ba53b2
  Morris Jette authored Oct 05, 2015
```
Conflicts:
	src/common/proc_args.c
```
  14ba53b2
- Propagate sbatch "--dist=plane=#" option to srun. · 6868906b
  Morris Jette authored Oct 05, 2015
```
bug 1999
```
  6868906b
05 Oct, 2015 4 commits
- Test changes for default memory limit of unlimited · f50c96e7
  Morris Jette authored Oct 05, 2015
```
A configuration of "DefMemPerNode=UNLIMITED" prevented more than
one job from running at a time on a given node, which broke some
tests. These changes prevent the tests from breaking.
```
  f50c96e7
- Fix typo. · bc127394
  david authored Oct 05, 2015
  
  bc127394
- Merge branch 'slurm-14.11' into slurm-15.08 · 721029bb
  jette authored Oct 04, 2015
  
  721029bb
- Include header for clean BGQ/Cray build · 3d601061
  jette authored Oct 04, 2015
  
  3d601061
03 Oct, 2015 2 commits
- Merge branch 'slurm-14.11' into slurm-15.08 · 68d3ae59
  Morris Jette authored Oct 02, 2015
```
Conflicts:
	NEWS
```
  68d3ae59
- Don't requeue RPCs from slurmctld to DOWN nodes · f4ea9dec
  Morris Jette authored Oct 02, 2015
```
Don't requeue RPC going out from slurmctld to DOWN nodes (can generate
    repeating communication errors).
bug 2002
```
  f4ea9dec
02 Oct, 2015 3 commits

Update v15.08.2 NEWS with v14.11.10 work · ff24578a
Morris Jette authored Oct 01, 2015

ff24578a

Don't mark powered down node as not responding · c0bb562a

Morris Jette authored Oct 01, 2015

This will only happen if a PING RPC for the node is already queued
  when the decision is made to power it down, then fails to get
  a response for the ping (since the node is already down).
bug 1995

c0bb562a

Reset job CPU count if CPUs/task ratio increased for mem limit · 29fe3eae

Morris Jette authored Sep 30, 2015

If a job's CPUs/task ratio is increased due to configured MaxMemPerCPU,
then increase it's allocated CPU count in order to enforce CPU limits.
Previous logic would increase/set the cpus_per_task as needed if a
job's --mem-per-cpu was above the configured MaxMemPerCPU, but NOT
increase the min_cpus or max_cpus varilable. This resulted in allocating
the wrong CPU count.

29fe3eae