Commits · 862cc80bacce989b1dd7db48d909bb259c74a024 · Manuel G. Marciani / ces_slurm_simulator

02 Mar, 2015 1 commit
- Correct the initialization of QOS MinCPUs per job limit. · 862cc80b
  David Bigagli authored Mar 02, 2015
  
  862cc80b
27 Feb, 2015 1 commit
- Fix job getting EligibleTime set before meeting dependency requirements. · ab773f65
  Brian Christiansen authored Feb 27, 2015
```
Bug 1476
```
  ab773f65
26 Feb, 2015 1 commit
- Account all CPUs to the batch steps. · cc8c2e3e
  David Bigagli authored Feb 26, 2015
  
  cc8c2e3e
24 Feb, 2015 4 commits

Fix sprio showing wrong priority for job arrays until priority is recalculated. · 423029d8
Brian Christiansen authored Feb 24, 2015
```
Bug 1469
```
423029d8

cray/basil, read mysql creds from /root/.my.conf · 5391b8cc

Nina Suvanphim authored Feb 24, 2015

The /root/.my.cnf would typically contain the login credentials for
root.  If those are needed for Slurm, then it should be checking
that directory.

(In reply to Nina Suvanphim from comment #0)
...
> const char *default_conf_paths[] = {
> "/root/.my.cnf", <<<<<<<<<<<<<<<<<------- add this line
> "/etc/my.cnf", "/etc/opt/cray/MySQL/my.cnf",
> "/etc/mysql/my.cnf", NULL };

I'll also note that typically the $HOME/.my.cnf file would be
checked last rather than first.

5391b8cc

Fix code for apple computers SOL_TCP is not defined · ac0343be
Danny Auble authored Feb 24, 2015

ac0343be
Fix wrong variables used in the wrapper functions needed for systems that · 8d0c9901
Danny Auble authored Feb 24, 2015
```
don't support strong_alias
```
8d0c9901

20 Feb, 2015 1 commit

Fix to GRES NoConsume logic · 33c48ac5

Dorian Krause authored Feb 20, 2015

we came across the following error message in the slurmctld logs when
using non-consumable resources:

error: gres/potion: job 39 dealloc of node node1 bad node_offset 0 count
is 0

The error comes from _job_dealloc():

node_gres_data=0x7f8a18000b70, node_offset=0, gres_name=0x1999e00
"potion", job_id=46,
    node_name=0x1987ab0 "node1") at gres.c:3980
(job_gres_list=0x199b7c0, node_gres_list=0x199bc38, node_offset=0,
job_id=46,
    node_name=0x1987ab0 "node1") at gres.c:4190
job_ptr=0x19e9d50, pre_err=0x7f8a31353cb0 "_will_run_test", remove_all=true)
    at select_linear.c:2091
bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, max_share=1, req_nodes=1,
    preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40) at
select_linear.c:3176
bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2,
    preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40,
exc_core_bitmap=0x0) at select_linear.c:3390
bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2,
    preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40,
exc_core_bitmap=0x0) at node_select.c:588
avail_bitmap=0x7f8a2f910d38, min_nodes=1, max_nodes=1, req_nodes=1,
exc_core_bitmap=0x0)
    at backfill.c:367

The cause of this problem is that _node_state_dup() in gres.c does not
duplicate the no_consume flag.
The cr_ptr passed to _rm_job_from_nodes() is created with _dup_cr()
which calls _node_state_dup().

Below is a simple patch to fix the problem. A "future-proof" alternative
might be to memcpy() from gres_ptr to new_gres and
only handle pointers separately.

33c48ac5

19 Feb, 2015 2 commits
- Load lua-5.2 library if using lua5.2 for lua job submit plugin. · 408c108e
  Brian Christiansen authored Feb 19, 2015
```
Bug 1471
```
  408c108e
- MySQL - Fix potential issue when PrivateData=Usage and a normal user · 9a03f2a5
  Danny Auble authored Feb 18, 2015
```
runs certain sreport reports.
```
  9a03f2a5
18 Feb, 2015 2 commits
- Add SLURM_JOB_GPUS to Prolog · 2e95c20b
  Morris Jette authored Feb 17, 2015
```
Add SLURM_JOB_GPUS environment variable to those available in Prolog.
Also add list of environment variables available in the various
prologs and epilogs on the web page.
bug 1458
```
  2e95c20b
- Print FAIR_TREE in "scontrol show config" output for PriorityFlags. · 27eef95d
  Brian Christiansen authored Feb 17, 2015
  
  27eef95d
17 Feb, 2015 2 commits
- BGQ - Fix issue with job arrays not being handled correctly · 49e0f5f2
  Danny Auble authored Feb 17, 2015
```
in the runjob_mux plugin.
```
  49e0f5f2
- Update NEWS · 6984348d
  Brian Christiansen authored Feb 17, 2015
```
Bug 1461
Commit: 2e2d924e
```
  6984348d
13 Feb, 2015 1 commit
- Fix squeue. · c13e8540
  David Bigagli authored Feb 13, 2015
  
  c13e8540
12 Feb, 2015 3 commits
- Start v14.11.5 NEWS file · 4531ab3f
  Morris Jette authored Feb 12, 2015
  
  4531ab3f
- Fix perlapi tests for libslurm perl module. · ea7a0c7c
  Brian Christiansen authored Feb 12, 2015
  
  ea7a0c7c
- Fix issue with "sreport cluster AccountUtilizationByUser" when using PrivateData=users. · 37b56085
  Brian Christiansen authored Feb 12, 2015
```
Bug 1446
```
  37b56085
11 Feb, 2015 1 commit
- MySQL - If a node state and reason are the same on a node state change · 1685ba56
  Danny Auble authored Feb 11, 2015
```
don't insert a new row in the event table.
```
  1685ba56
10 Feb, 2015 2 commits
- Additional fix to 50e0c84f. · 50b43afd
  Brian Christiansen authored Feb 09, 2015
```
uid's are 0 when associations are loaded.
```
  50b43afd
- Fix segfault in controller when deleting a user association of a user which... · 50e0c84f
  Brian Christiansen authored Feb 09, 2015
```
 Fix segfault in controller when deleting a user association of a user which had been previously removed from the system.

 Bug 1238
```
  50e0c84f
09 Feb, 2015 3 commits

Fix job array task requeue race condition · ae0ba3d8

Morris Jette authored Feb 09, 2015

Fix slurmctld initialization problem which could cause requeue of the last
task in a job array to fail if executed prior to the slurmctld loading
the maximum size of a job array into a variable in the job_mgr.c module.

ae0ba3d8

Fix bug that could lose task of job array · 0efa0ba4

Morris Jette authored Feb 09, 2015

Fix slurmctld job recovery logic which could cause the last task in a job
array to be lost on restart.

0efa0ba4

Fix build for non-standard hwloc location · bd303aff
Nicolas Joly authored Feb 09, 2015

bd303aff

05 Feb, 2015 1 commit
- If a job is requeued because of RequeueExit or RequeueExitHold sent · 3e5d8f8e
  David Bigagli authored Feb 05, 2015
```
event  REQUEUED to slurmdbd.
```
  3e5d8f8e
04 Feb, 2015 3 commits

Report correct job "shared" field value · 3de14946

Morris Jette authored Feb 04, 2015

Previously it was not possible to distinguish between a job needing
exclusive nodes and the default job/partition configuration.

3de14946

job array slurmctld abort fix · 0ff342b5
Morris Jette authored Feb 04, 2015
```
Fix job array logic that can cause slurmctld to abort.
bug 1426
```
0ff342b5

Fix for CUDA v7.0+ · da2fba48

Morris Jette authored Feb 03, 2015

Enable CUDA v7.0+ use with a Slurm configuration of TaskPlugin=task/cgroup
ConstrainDevices=yes (in cgroup.conf). With that configuration
CUDA_VISIBLE_DEVICES will start at 0 rather than the device number.
bug 1421

da2fba48

03 Feb, 2015 6 commits
- Print spurious message about the absence of cgroup.conf at log level · a26eaa64
  David Bigagli authored Feb 03, 2015
```
debug2 instead of info.
```
  a26eaa64
- When a job uses multiple partition set the environment variable · 1f37d3b8
  David Bigagli authored Feb 03, 2015
```
SLURM_JOB_PARTITION to be the one in which the job started.
```
  1f37d3b8
- Correct CUDA_VISIBLE_DEVICES · ff64cf3b
  Morris Jette authored Feb 03, 2015
```
If using proctrack/cgroup and gres/gpu, always start CUDA_VISIBLE_DEVICES
environment variable numbering at 0.
bug 1421
```
  ff64cf3b
- Fix assoc_mgr hash to deal with users that don't have a uid yet when making · b4d5dd1a
  Danny Auble authored Feb 02, 2015
```
reservations.
```
  b4d5dd1a
- ALPS - Fix depth for MemoryAllocation in BASIL with CLE 5.2.3. · eb4f5a6f
  Danny Auble authored Feb 02, 2015
```
This is an add on to commit 2e5142ef.  Servicing Bug 1418.
```
  eb4f5a6f
- When a job using multiple partition starts send to slurmdbd only · 63ca5e06
  David Bigagli authored Feb 02, 2015
```
the partition in which the job runs.
```
  63ca5e06
02 Feb, 2015 3 commits
- Update man pages. · e2f15d2f
  David Bigagli authored Feb 02, 2015
  
  e2f15d2f
- Stop sacct from printing non-existent stat information for · 4eb58287
  Danny Auble authored Feb 02, 2015
```
Front End systems.
```
  4eb58287
- Correct the scontrol man page and command. · eea01cdb
  David Bigagli authored Feb 02, 2015
  
  eea01cdb
31 Jan, 2015 2 commits
- missed from the 14.03 merge · 96bc00ec
  Danny Auble authored Jan 30, 2015
  
  96bc00ec
- Fix database resources so they can add new clusters to them after they have · da8409c0
  Danny Auble authored Jan 30, 2015
```
initially been added.
```
  da8409c0
30 Jan, 2015 1 commit
- Use the slurm_getpwuid_r wrapper. · 7644f970
  David Bigagli authored Jan 30, 2015
  
  7644f970