- 12 Mar, 2015 1 commit
-
-
Morris Jette authored
Added LaunchParameters configuration parameter. Have srun command test locally for the executable file if LaunchParameters=test_exec or the environment variable SLURM_TEST_EXEC is set. Without this an invalid command will generate one error message per task launched.
-
- 11 Mar, 2015 1 commit
-
-
Morris Jette authored
Partially revert commit 8d91ae22 The bug was introduced in version 14.11.0-pre4. bug 1504
-
- 10 Mar, 2015 2 commits
-
-
Danny Auble authored
This is for bug 1514
-
Brian Christiansen authored
-
- 09 Mar, 2015 3 commits
-
-
Danny Auble authored
before.
-
David Bigagli authored
-
David Bigagli authored
-
- 06 Mar, 2015 1 commit
-
-
Brian Christiansen authored
Bug 1507
-
- 05 Mar, 2015 2 commits
-
-
Danny Auble authored
message comes in
-
David Bigagli authored
-
- 04 Mar, 2015 2 commits
-
-
Brian Christiansen authored
Bug 1501
-
Brian Christiansen authored
Bug 1501
-
- 03 Mar, 2015 5 commits
-
-
Danny Auble authored
cluster(s) requested.
-
David Bigagli authored
-
Brian Christiansen authored
Bug 1492
-
Morris Jette authored
For job running under a debugger, if the exec of the task fails, then cancel its I/O and abort immediately rather than waiting 60 seconds for I/O timeout.
-
Morris Jette authored
The option has not been functional or documented since Slurm version 2.0.
-
- 02 Mar, 2015 2 commits
-
-
David Bigagli authored
-
David Bigagli authored
-
- 27 Feb, 2015 5 commits
-
-
Morris Jette authored
This controls how long a requeued job must wait before it can restart, and 20 minutes is too long in most cases. Administrators can alter this configuration parameter if needed in case of slow Prolog or the like.
-
Morris Jette authored
Use this to specify the lifetime of a job step credential.
-
Brian Christiansen authored
Bug 1476
-
Morris Jette authored
Set the delay time for job requeue to the job credential lifetime (1200 second by default). This insures that prolog runs on every node when a job is requeued. (This change will slow down launch of re-queued jobs). Without this change, if a job is restated within 1200 seconds, the nodes previously used would not run the prolog again, since the job ID is still seen as active (from the previous execution). It is also advisable to set the value of DEFAULT_EXPIRATION_WINDOW in src/common/slurm_cred.c to the lowest value reasonable. We need to add a new configuration parameter so this is easly changed in the future.
-
Brian Christiansen authored
Display job's estimated NodeCount based off of partition's configured resources rather than the whole system's. Bug 1478
-
- 26 Feb, 2015 2 commits
-
-
David Bigagli authored
-
Morris Jette authored
Previously, there was no binding of tasks to the appropriate NUMA. Based upon work by Josko Plazonic <plazonic@princeton.edu>.
-
- 25 Feb, 2015 1 commit
-
-
Morris Jette authored
Mail notifications on job BEGIN, END and FAIL now apply to a job array as a whole rather than generating individual email messages for each task in the job array.
-
- 24 Feb, 2015 4 commits
-
-
Brian Christiansen authored
Bug 1469
-
Nina Suvanphim authored
The /root/.my.cnf would typically contain the login credentials for root. If those are needed for Slurm, then it should be checking that directory. (In reply to Nina Suvanphim from comment #0) ... > const char *default_conf_paths[] = { > "/root/.my.cnf", <<<<<<<<<<<<<<<<<------- add this line > "/etc/my.cnf", "/etc/opt/cray/MySQL/my.cnf", > "/etc/mysql/my.cnf", NULL }; I'll also note that typically the $HOME/.my.cnf file would be checked last rather than first.
-
Danny Auble authored
-
Danny Auble authored
don't support strong_alias
-
- 20 Feb, 2015 2 commits
-
-
Morris Jette authored
-
Dorian Krause authored
we came across the following error message in the slurmctld logs when using non-consumable resources: error: gres/potion: job 39 dealloc of node node1 bad node_offset 0 count is 0 The error comes from _job_dealloc(): node_gres_data=0x7f8a18000b70, node_offset=0, gres_name=0x1999e00 "potion", job_id=46, node_name=0x1987ab0 "node1") at gres.c:3980 (job_gres_list=0x199b7c0, node_gres_list=0x199bc38, node_offset=0, job_id=46, node_name=0x1987ab0 "node1") at gres.c:4190 job_ptr=0x19e9d50, pre_err=0x7f8a31353cb0 "_will_run_test", remove_all=true) at select_linear.c:2091 bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, max_share=1, req_nodes=1, preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40) at select_linear.c:3176 bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2, preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40, exc_core_bitmap=0x0) at select_linear.c:3390 bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2, preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40, exc_core_bitmap=0x0) at node_select.c:588 avail_bitmap=0x7f8a2f910d38, min_nodes=1, max_nodes=1, req_nodes=1, exc_core_bitmap=0x0) at backfill.c:367 The cause of this problem is that _node_state_dup() in gres.c does not duplicate the no_consume flag. The cr_ptr passed to _rm_job_from_nodes() is created with _dup_cr() which calls _node_state_dup(). Below is a simple patch to fix the problem. A "future-proof" alternative might be to memcpy() from gres_ptr to new_gres and only handle pointers separately.
-
- 19 Feb, 2015 2 commits
-
-
Brian Christiansen authored
Bug 1471
-
Danny Auble authored
runs certain sreport reports.
-
- 18 Feb, 2015 5 commits
-
-
Morris Jette authored
Added "--mail=stage_out" option to job submission commands to notify user when burst buffer state out is complete.
-
Morris Jette authored
Add SLURM_JOB_CONSTAINTS to environment variables available to the Prolog. bug 1458
-
Morris Jette authored
Add job credential to "Run Prolog" RPC used with a configuration of PrologFlags=alloc. This allows the Prolog to be passed identification of GPUs allocated to the job.
-
Morris Jette authored
Add SLURM_JOB_GPUS environment variable to those available in Prolog. Also add list of environment variables available in the various prologs and epilogs on the web page. bug 1458
-
Brian Christiansen authored
-