- 06 Mar, 2015 1 commit
-
-
Brian Christiansen authored
Bug 1507
-
- 05 Mar, 2015 2 commits
-
-
Danny Auble authored
message comes in
-
David Bigagli authored
-
- 04 Mar, 2015 1 commit
-
-
Brian Christiansen authored
Bug 1501
-
- 03 Mar, 2015 4 commits
-
-
Danny Auble authored
cluster(s) requested.
-
David Bigagli authored
-
Brian Christiansen authored
Bug 1492
-
Morris Jette authored
For job running under a debugger, if the exec of the task fails, then cancel its I/O and abort immediately rather than waiting 60 seconds for I/O timeout.
-
- 02 Mar, 2015 2 commits
-
-
David Bigagli authored
-
David Bigagli authored
-
- 27 Feb, 2015 1 commit
-
-
Brian Christiansen authored
Bug 1476
-
- 26 Feb, 2015 1 commit
-
-
David Bigagli authored
-
- 24 Feb, 2015 4 commits
-
-
Brian Christiansen authored
Bug 1469
-
Nina Suvanphim authored
The /root/.my.cnf would typically contain the login credentials for root. If those are needed for Slurm, then it should be checking that directory. (In reply to Nina Suvanphim from comment #0) ... > const char *default_conf_paths[] = { > "/root/.my.cnf", <<<<<<<<<<<<<<<<<------- add this line > "/etc/my.cnf", "/etc/opt/cray/MySQL/my.cnf", > "/etc/mysql/my.cnf", NULL }; I'll also note that typically the $HOME/.my.cnf file would be checked last rather than first.
-
Danny Auble authored
-
Danny Auble authored
don't support strong_alias
-
- 20 Feb, 2015 1 commit
-
-
Dorian Krause authored
we came across the following error message in the slurmctld logs when using non-consumable resources: error: gres/potion: job 39 dealloc of node node1 bad node_offset 0 count is 0 The error comes from _job_dealloc(): node_gres_data=0x7f8a18000b70, node_offset=0, gres_name=0x1999e00 "potion", job_id=46, node_name=0x1987ab0 "node1") at gres.c:3980 (job_gres_list=0x199b7c0, node_gres_list=0x199bc38, node_offset=0, job_id=46, node_name=0x1987ab0 "node1") at gres.c:4190 job_ptr=0x19e9d50, pre_err=0x7f8a31353cb0 "_will_run_test", remove_all=true) at select_linear.c:2091 bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, max_share=1, req_nodes=1, preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40) at select_linear.c:3176 bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2, preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40, exc_core_bitmap=0x0) at select_linear.c:3390 bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2, preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40, exc_core_bitmap=0x0) at node_select.c:588 avail_bitmap=0x7f8a2f910d38, min_nodes=1, max_nodes=1, req_nodes=1, exc_core_bitmap=0x0) at backfill.c:367 The cause of this problem is that _node_state_dup() in gres.c does not duplicate the no_consume flag. The cr_ptr passed to _rm_job_from_nodes() is created with _dup_cr() which calls _node_state_dup(). Below is a simple patch to fix the problem. A "future-proof" alternative might be to memcpy() from gres_ptr to new_gres and only handle pointers separately.
-
- 19 Feb, 2015 2 commits
-
-
Brian Christiansen authored
Bug 1471
-
Danny Auble authored
runs certain sreport reports.
-
- 18 Feb, 2015 2 commits
-
-
Morris Jette authored
Add SLURM_JOB_GPUS environment variable to those available in Prolog. Also add list of environment variables available in the various prologs and epilogs on the web page. bug 1458
-
Brian Christiansen authored
-
- 17 Feb, 2015 2 commits
-
-
Danny Auble authored
in the runjob_mux plugin.
-
Brian Christiansen authored
Bug 1461 Commit: 2e2d924e
-
- 13 Feb, 2015 1 commit
-
-
David Bigagli authored
-
- 12 Feb, 2015 3 commits
-
-
Morris Jette authored
-
Brian Christiansen authored
-
Brian Christiansen authored
Bug 1446
-
- 11 Feb, 2015 1 commit
-
-
Danny Auble authored
don't insert a new row in the event table.
-
- 10 Feb, 2015 2 commits
-
-
Brian Christiansen authored
uid's are 0 when associations are loaded.
-
Brian Christiansen authored
Fix segfault in controller when deleting a user association of a user which had been previously removed from the system. Bug 1238
-
- 09 Feb, 2015 3 commits
-
-
Morris Jette authored
Fix slurmctld initialization problem which could cause requeue of the last task in a job array to fail if executed prior to the slurmctld loading the maximum size of a job array into a variable in the job_mgr.c module.
-
Morris Jette authored
Fix slurmctld job recovery logic which could cause the last task in a job array to be lost on restart.
-
Nicolas Joly authored
-
- 05 Feb, 2015 1 commit
-
-
David Bigagli authored
event REQUEUED to slurmdbd.
-
- 04 Feb, 2015 3 commits
-
-
Morris Jette authored
Previously it was not possible to distinguish between a job needing exclusive nodes and the default job/partition configuration.
-
Morris Jette authored
Fix job array logic that can cause slurmctld to abort. bug 1426
-
Morris Jette authored
Enable CUDA v7.0+ use with a Slurm configuration of TaskPlugin=task/cgroup ConstrainDevices=yes (in cgroup.conf). With that configuration CUDA_VISIBLE_DEVICES will start at 0 rather than the device number. bug 1421
-
- 03 Feb, 2015 3 commits
-
-
David Bigagli authored
debug2 instead of info.
-
David Bigagli authored
SLURM_JOB_PARTITION to be the one in which the job started.
-
Morris Jette authored
If using proctrack/cgroup and gres/gpu, always start CUDA_VISIBLE_DEVICES environment variable numbering at 0. bug 1421
-