Commits · a38f9544fc48fa5c36497cd5597ca7a9fe86d346 · Manuel G. Marciani / ces_slurm_simulator

12 Mar, 2015 1 commit

Added LaunchParameters configuration parameter · a38f9544

Morris Jette authored Mar 11, 2015

Added LaunchParameters configuration parameter. Have srun command test
locally for the executable file if LaunchParameters=test_exec or the
environment variable SLURM_TEST_EXEC is set. Without this an invalid
command will generate one error message per task launched.

a38f9544

11 Mar, 2015 1 commit

Fix job requeue from completing state · 717c9ec5

Morris Jette authored Mar 11, 2015

Partially revert commit 8d91ae22
The bug was introduced in version 14.11.0-pre4.
bug 1504

717c9ec5

10 Mar, 2015 2 commits
- BGQ - Sanity check given for translating small blocks into slurm bg_records · 04c94415
  Danny Auble authored Mar 10, 2015
```
This is for bug 1514
```
  04c94415
- Fix reports not using the month usage table. · e6b9d2b3
  Brian Christiansen authored Mar 10, 2015
  
  e6b9d2b3
09 Mar, 2015 3 commits
- Make taskplugin=cgroup work for core spec. needed to have task/cgroup · ab244dcc
  Danny Auble authored Mar 06, 2015
```
before.
```
  ab244dcc
- Update srun man page. · 7cd67bca
  David Bigagli authored Mar 09, 2015
  
  7cd67bca
- Changed the implementation of xcpuinfo_abs_to_mac(). · 3bd4b02c
  David Bigagli authored Mar 09, 2015
  
  3bd4b02c
06 Mar, 2015 1 commit
- Fix squeue -L <licenses> not filtering out jobs with licenses. · 02ecc040
  Brian Christiansen authored Mar 06, 2015
```
Bug 1507
```
  02ecc040
05 Mar, 2015 2 commits
- Make it so sched_params isn't read over and over when an epilog complete · d73bb06a
  Danny Auble authored Mar 05, 2015
```
message comes in
```
  d73bb06a
- Introduce nohold_on_prolog_fail. · 2c83bf4e
  David Bigagli authored Mar 05, 2015
  
  2c83bf4e
04 Mar, 2015 2 commits
- Add sockets and cores to TaskPluginParams' autobind option. · 955ce447
  Brian Christiansen authored Mar 04, 2015
```
Bug 1501
```
  955ce447
- Add TaskPluginParam=autobind=threads option. · ea51f870
  Brian Christiansen authored Mar 04, 2015
```
Bug 1501
```
  ea51f870
03 Mar, 2015 5 commits
- MySQL - When requesting cluster resources, only return resources for the · 004746d0
  Danny Auble authored Mar 03, 2015
```
cluster(s) requested.
```
  004746d0
- Set the value of total_cpus not to be zero. · 988889d4
  David Bigagli authored Mar 03, 2015
  
  988889d4
- Fix associations not getting default qos set until after a restart. · 06ea19c4
  Brian Christiansen authored Mar 02, 2015
```
Bug 1492
```
  06ea19c4
- Abort I/O for debugged app launch fail · 49770e20
  Morris Jette authored Mar 02, 2015
```
For job running under a debugger, if the exec of the task fails, then
cancel its I/O and abort immediately rather than waiting 60 seconds for
I/O timeout.
```
  49770e20
- Remove srun --max-launch-time option · b375924e
  Morris Jette authored Mar 02, 2015
```
The option has not been functional or documented since Slurm version 2.0.
```
  b375924e
02 Mar, 2015 2 commits
- Change the level of debug messages. · 971d0021
  David Bigagli authored Mar 02, 2015
  
  971d0021
- Correct the initialization of QOS MinCPUs per job limit. · 862cc80b
  David Bigagli authored Mar 02, 2015
  
  862cc80b
27 Feb, 2015 5 commits

Change default job cred lifetime, from20 to 2 min · 68598c64

Morris Jette authored Feb 27, 2015

This controls how long a requeued job must wait before it can
restart, and 20 minutes is too long in most cases. Administrators
can alter this configuration parameter if needed in case of slow
Prolog or the like.

68598c64

Add AuthInfo option of "cred_expire=#" · 98d6a589
Morris Jette authored Feb 27, 2015
```
Use this to specify the lifetime of a job step credential.
```
98d6a589
Fix job getting EligibleTime set before meeting dependency requirements. · ab773f65
Brian Christiansen authored Feb 27, 2015
```
Bug 1476
```
ab773f65

Insure prolog runs on job rqueue · 42dc54ea

Morris Jette authored Feb 27, 2015

Set the delay time for job requeue to the job credential lifetime (1200
second by default). This insures that prolog runs on every node when a
job is requeued. (This change will slow down launch of re-queued jobs).
Without this change, if a job is restated within 1200 seconds, the nodes
previously used would not run the prolog again, since the job ID is
still seen as active (from the previous execution). It is also advisable
to set the value of DEFAULT_EXPIRATION_WINDOW in src/common/slurm_cred.c
to the lowest value reasonable. We need to add a new configuration parameter
so this is easly changed in the future.

42dc54ea

Display job's estimated NodeCount based off of partition's configured... · ce32018a
Brian Christiansen authored Feb 27, 2015
```
Display job's estimated NodeCount based off of partition's configured resources rather than the whole system's.

Bug 1478
```
ce32018a

26 Feb, 2015 2 commits
- Account all CPUs to the batch steps. · cc8c2e3e
  David Bigagli authored Feb 26, 2015
  
  cc8c2e3e
- task/affinity - fix memory binding for cpusets · 701e5b33
  Morris Jette authored Feb 25, 2015
```
Previously, there was no binding of tasks to the appropriate NUMA.
Based upon work by Josko Plazonic <plazonic@princeton.edu>.
```
  701e5b33
25 Feb, 2015 1 commit

Apply email notifications to entire job arrays · 9fa4909d

Morris Jette authored Feb 25, 2015

Mail notifications on job BEGIN, END and FAIL now apply to a job array as a
whole rather than generating individual email messages for each task in the
job array.

9fa4909d

24 Feb, 2015 4 commits

Fix sprio showing wrong priority for job arrays until priority is recalculated. · 423029d8
Brian Christiansen authored Feb 24, 2015
```
Bug 1469
```
423029d8

cray/basil, read mysql creds from /root/.my.conf · 5391b8cc

Nina Suvanphim authored Feb 24, 2015

The /root/.my.cnf would typically contain the login credentials for
root.  If those are needed for Slurm, then it should be checking
that directory.

(In reply to Nina Suvanphim from comment #0)
...
> const char *default_conf_paths[] = {
> "/root/.my.cnf", <<<<<<<<<<<<<<<<<------- add this line
> "/etc/my.cnf", "/etc/opt/cray/MySQL/my.cnf",
> "/etc/mysql/my.cnf", NULL };

I'll also note that typically the $HOME/.my.cnf file would be
checked last rather than first.

5391b8cc

Fix code for apple computers SOL_TCP is not defined · ac0343be
Danny Auble authored Feb 24, 2015

ac0343be
Fix wrong variables used in the wrapper functions needed for systems that · 8d0c9901
Danny Auble authored Feb 24, 2015
```
don't support strong_alias
```
8d0c9901

20 Feb, 2015 2 commits

scontrol: Require Reason when setting node DOWN · e7c61bdd
Morris Jette authored Feb 20, 2015

e7c61bdd

Fix to GRES NoConsume logic · 33c48ac5

Dorian Krause authored Feb 20, 2015

we came across the following error message in the slurmctld logs when
using non-consumable resources:

error: gres/potion: job 39 dealloc of node node1 bad node_offset 0 count
is 0

The error comes from _job_dealloc():

node_gres_data=0x7f8a18000b70, node_offset=0, gres_name=0x1999e00
"potion", job_id=46,
    node_name=0x1987ab0 "node1") at gres.c:3980
(job_gres_list=0x199b7c0, node_gres_list=0x199bc38, node_offset=0,
job_id=46,
    node_name=0x1987ab0 "node1") at gres.c:4190
job_ptr=0x19e9d50, pre_err=0x7f8a31353cb0 "_will_run_test", remove_all=true)
    at select_linear.c:2091
bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, max_share=1, req_nodes=1,
    preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40) at
select_linear.c:3176
bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2,
    preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40,
exc_core_bitmap=0x0) at select_linear.c:3390
bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2,
    preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40,
exc_core_bitmap=0x0) at node_select.c:588
avail_bitmap=0x7f8a2f910d38, min_nodes=1, max_nodes=1, req_nodes=1,
exc_core_bitmap=0x0)
    at backfill.c:367

The cause of this problem is that _node_state_dup() in gres.c does not
duplicate the no_consume flag.
The cr_ptr passed to _rm_job_from_nodes() is created with _dup_cr()
which calls _node_state_dup().

Below is a simple patch to fix the problem. A "future-proof" alternative
might be to memcpy() from gres_ptr to new_gres and
only handle pointers separately.

33c48ac5

19 Feb, 2015 2 commits
- Load lua-5.2 library if using lua5.2 for lua job submit plugin. · 408c108e
  Brian Christiansen authored Feb 19, 2015
```
Bug 1471
```
  408c108e
- MySQL - Fix potential issue when PrivateData=Usage and a normal user · 9a03f2a5
  Danny Auble authored Feb 18, 2015
```
runs certain sreport reports.
```
  9a03f2a5
18 Feb, 2015 5 commits

Added "--mail-type=stage_out" option · b3c8ed49

Morris Jette authored Feb 18, 2015

Added "--mail=stage_out" option to job submission commands to notify user
when burst buffer state out is complete.

b3c8ed49

Add SLURM_JOB_CONSTAINTS to Prolog env vars · 06db2ded
Morris Jette authored Feb 18, 2015
```
Add SLURM_JOB_CONSTAINTS to environment variables available to the Prolog.
bug 1458
```
06db2ded

Add GPU info to prolog run on job allocation · 6966f77e

Morris Jette authored Feb 18, 2015

Add job credential to "Run Prolog" RPC used with a configuration of
PrologFlags=alloc. This allows the Prolog to be passed identification of
GPUs allocated to the job.

6966f77e

Add SLURM_JOB_GPUS to Prolog · 2e95c20b

Morris Jette authored Feb 17, 2015

Add SLURM_JOB_GPUS environment variable to those available in Prolog.
Also add list of environment variables available in the various
prologs and epilogs on the web page.
bug 1458

2e95c20b

Print FAIR_TREE in "scontrol show config" output for PriorityFlags. · 27eef95d
Brian Christiansen authored Feb 17, 2015

27eef95d