Commits · fa6de30d03fa00d957dfc4de489dd60783231906 · Manuel G. Marciani / ces_slurm_simulator

24 Feb, 2015 9 commits

Add missing topology/hypercube plugin for SGI · fa6de30d
Michael A. Raymond authored Feb 24, 2015

fa6de30d
Merge branch 'slurm-14.11' · 128148c1
Morris Jette authored Feb 24, 2015

128148c1

cray/basil, read mysql creds from /root/.my.conf · 5391b8cc

Nina Suvanphim authored Feb 24, 2015

The /root/.my.cnf would typically contain the login credentials for
root.  If those are needed for Slurm, then it should be checking
that directory.

(In reply to Nina Suvanphim from comment #0)
...
> const char *default_conf_paths[] = {
> "/root/.my.cnf", <<<<<<<<<<<<<<<<<------- add this line
> "/etc/my.cnf", "/etc/opt/cray/MySQL/my.cnf",
> "/etc/mysql/my.cnf", NULL };

I'll also note that typically the $HOME/.my.cnf file would be
checked last rather than first.

5391b8cc

power/cray development · d32dac43
Morris Jette authored Feb 24, 2015
```
Fix some logic related to power distribution across nodes
```
d32dac43
Merge branch 'slurm-14.11' · 5a133bcb
Morris Jette authored Feb 24, 2015

5a133bcb
Add some SUG photo links · 3d67a89a
Morris Jette authored Feb 24, 2015

3d67a89a
Fix code for apple computers SOL_TCP is not defined · ac0343be
Danny Auble authored Feb 24, 2015

ac0343be
Fix wrong variables used in the wrapper functions needed for systems that · 8d0c9901
Danny Auble authored Feb 24, 2015
```
don't support strong_alias
```
8d0c9901

power/cray development · acdec1f5

Morris Jette authored Feb 23, 2015

Update power management web page: Add notes about powering nodes down/up
Prevent underflow in power distribution logic
Add logic to identify nodes in "ready" state. Only ready nodes can have
  their power caps modified
Don't change power cap if node not in ready state
Various improvements to logging
Refactor code to eliminate duplicate/repeated building of full NID list
Plug some memory leaks

acdec1f5

23 Feb, 2015 1 commit

Fix test for scontrol change · 9cb22140

Morris Jette authored Feb 23, 2015

Modify test 12.7 so that we specify a reason when setting a node DOWN
A recent change to the Slurm code now requires a reason

9cb22140

21 Feb, 2015 1 commit
- power/cray: Read initial caps from capmc · 58da1582
  Morris Jette authored Feb 20, 2015
  
  58da1582
20 Feb, 2015 5 commits

scontrol: Require Reason when setting node DOWN · e7c61bdd
Morris Jette authored Feb 20, 2015

e7c61bdd

power/cray work · 82de9635

Morris Jette authored Feb 20, 2015

Correct capmc arguments to set power cap.
Convert "capmc get_node_energy_counter" to use hostlist expressin rather
   than listing every node in a comma separated list.
Log commands and args run by the plugin via the power_run_script()
   function in src/plugins/power/common/power_common.c.
Use hostlist to build condenced nid list for power cap set/clear functions.

82de9635

Merge branch 'slurm-14.11' · b8fbbf2b
Morris Jette authored Feb 20, 2015

b8fbbf2b

Fix to GRES NoConsume logic · 33c48ac5

Dorian Krause authored Feb 20, 2015

we came across the following error message in the slurmctld logs when
using non-consumable resources:

error: gres/potion: job 39 dealloc of node node1 bad node_offset 0 count
is 0

The error comes from _job_dealloc():

node_gres_data=0x7f8a18000b70, node_offset=0, gres_name=0x1999e00
"potion", job_id=46,
    node_name=0x1987ab0 "node1") at gres.c:3980
(job_gres_list=0x199b7c0, node_gres_list=0x199bc38, node_offset=0,
job_id=46,
    node_name=0x1987ab0 "node1") at gres.c:4190
job_ptr=0x19e9d50, pre_err=0x7f8a31353cb0 "_will_run_test", remove_all=true)
    at select_linear.c:2091
bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, max_share=1, req_nodes=1,
    preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40) at
select_linear.c:3176
bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2,
    preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40,
exc_core_bitmap=0x0) at select_linear.c:3390
bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2,
    preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40,
exc_core_bitmap=0x0) at node_select.c:588
avail_bitmap=0x7f8a2f910d38, min_nodes=1, max_nodes=1, req_nodes=1,
exc_core_bitmap=0x0)
    at backfill.c:367

The cause of this problem is that _node_state_dup() in gres.c does not
duplicate the no_consume flag.
The cr_ptr passed to _rm_job_from_nodes() is created with _dup_cr()
which calls _node_state_dup().

Below is a simple patch to fix the problem. A "future-proof" alternative
might be to memcpy() from gres_ptr to new_gres and
only handle pointers separately.

33c48ac5

power/cray - compute energy consumption via capmc · d500de54
Morris Jette authored Feb 19, 2015

d500de54

19 Feb, 2015 4 commits
- Load lua-5.2 library if using lua5.2 for lua job submit plugin. · 408c108e
  Brian Christiansen authored Feb 19, 2015
```
Bug 1471
```
  408c108e
- Merge branch 'slurm-14.11' · 6b5ae328
  Morris Jette authored Feb 19, 2015
  
  6b5ae328
- Remove vestigial/wrong documentation · 2e7bed24
  Morris Jette authored Feb 19, 2015
```
"If  you  specify a maximum node count and the host list contains more
nodes, the extra node names will be silently ignored."
Not so.
```
  2e7bed24
- MySQL - Fix potential issue when PrivateData=Usage and a normal user · 9a03f2a5
  Danny Auble authored Feb 18, 2015
```
runs certain sreport reports.
```
  9a03f2a5
18 Feb, 2015 11 commits
- Expand explanation in a comment · d0dbd36d
  Morris Jette authored Feb 18, 2015
  
  d0dbd36d
- unititialized job credential fields · 72082f9d
  Morris Jette authored Feb 18, 2015
```
For srun command with the --no-alloc option, the dummy credential
created did not have two new fields (job_constraints and job_gres_list)
initialized, resulting in invalid memory references. Bug introduced
earlier today.
```
  72082f9d
- Added "--mail-type=stage_out" option · b3c8ed49
  Morris Jette authored Feb 18, 2015
```
Added "--mail=stage_out" option to job submission commands to notify user
when burst buffer state out is complete.
```
  b3c8ed49
- add new job_submit/lua fields · d365d37a
  Morris Jette authored Feb 18, 2015
```
Add new job_descriptor fields to the job_submit/lua interface:
clusters, power_flags, and sicp_mode
```
  d365d37a
- Add SLURM_JOB_CONSTAINTS to Prolog env vars · 06db2ded
  Morris Jette authored Feb 18, 2015
```
Add SLURM_JOB_CONSTAINTS to environment variables available to the Prolog.
bug 1458
```
  06db2ded
- Add GPU info to prolog run on job allocation · 6966f77e
  Morris Jette authored Feb 18, 2015
```
Add job credential to "Run Prolog" RPC used with a configuration of
PrologFlags=alloc. This allows the Prolog to be passed identification of
GPUs allocated to the job.
```
  6966f77e
- Add acct_gather_energy_cray.so file to the RPM · 9d5e89f5
  Morris Jette authored Feb 18, 2015
  
  9d5e89f5
- Merge branch 'slurm-14.11' · 3a6caf8b
  Morris Jette authored Feb 17, 2015
  
  3a6caf8b
- Add SLURM_JOB_GPUS to Prolog · 2e95c20b
  Morris Jette authored Feb 17, 2015
```
Add SLURM_JOB_GPUS environment variable to those available in Prolog.
Also add list of environment variables available in the various
prologs and epilogs on the web page.
bug 1458
```
  2e95c20b
- Merge remote-tracking branch 'origin/slurm-14.11' · 6e64725e
  Danny Auble authored Feb 17, 2015
  
  6e64725e
- Print FAIR_TREE in "scontrol show config" output for PriorityFlags. · 27eef95d
  Brian Christiansen authored Feb 17, 2015
  
  27eef95d
17 Feb, 2015 9 commits
- BGQ - Close very small window where a step could of been removed before the · c169c935
  Danny Auble authored Feb 17, 2015
```
runjob happened, and the step was part of an array.  This is an addition to
commit 49e0f5f2
```
  c169c935
- BGQ - Fix issue with job arrays not being handled correctly · 49e0f5f2
  Danny Auble authored Feb 17, 2015
```
in the runjob_mux plugin.
```
  49e0f5f2
- Update NEWS · 6984348d
  Brian Christiansen authored Feb 17, 2015
```
Bug 1461
Commit: 2e2d924e
```
  6984348d
- power/cray - Add descriptive comment · 793d63f1
  Morris Jette authored Feb 17, 2015
  
  793d63f1
- Clean up possible vestigial test file · 45ab4f72
  Morris Jette authored Feb 17, 2015
  
  45ab4f72
- power/cray: Remove debugging/test logic · b22bcdf4
  Morris Jette authored Feb 17, 2015
  
  b22bcdf4
- Update burst buffer web page · 735a513e
  Morris Jette authored Feb 17, 2015
  
  735a513e
- power/cray complete JSON parser for get_power_cap_capabilities · 59d470ed
  Morris Jette authored Feb 17, 2015
```
This completes work on the JSON parser for the capmc
get_power_cap_capabilities command, which reports specifics
about each node's power caps and ranges.
```
  59d470ed
- Add find_node_record2() function, no error if not found · c5d3153f
  Morris Jette authored Feb 17, 2015
```
Add a find_node_record2() function that differs from
find_node_record() in that it does not print an error if
a node name is not found. This is important for use with
the Cray capmc command, which generates information about
not only compute nodes, but also login and service nodes.
Using this function prevents a bunch of errors from the
power/cray plugin when trying to find these non-compute
nodes in the slurm configuration.
```
  c5d3153f