- 03 Mar, 2015 1 commit
-
-
Morris Jette authored
For job running under a debugger, if the exec of the task fails, then cancel its I/O and abort immediately rather than waiting 60 seconds for I/O timeout.
-
- 02 Mar, 2015 4 commits
-
-
David Bigagli authored
-
David Bigagli authored
-
Danny Auble authored
-
Danny Auble authored
-
- 27 Feb, 2015 5 commits
-
-
Nicolas Joly authored
Add missing arguments to slurm_sched_p_newalloc/slurm_sched_p_freealloc documentation.
-
Nicolas Joly authored
-
Nicolas Joly authored
-
Morris Jette authored
-
Brian Christiansen authored
Bug 1476
-
- 26 Feb, 2015 2 commits
-
-
David Bigagli authored
-
Morris Jette authored
Improved logging and some code restructuring. No change in logic.
-
- 25 Feb, 2015 4 commits
-
-
David Bigagli authored
This reverts commit e24a418b.
-
David Bigagli authored
-
Morris Jette authored
-
Morris Jette authored
This is a variation on commit 5391b8cc Check $HOME/.my.cnf last rather than first to follow more standard search order
-
- 24 Feb, 2015 5 commits
-
-
Brian Christiansen authored
Bug 1469
-
Nina Suvanphim authored
The /root/.my.cnf would typically contain the login credentials for root. If those are needed for Slurm, then it should be checking that directory. (In reply to Nina Suvanphim from comment #0) ... > const char *default_conf_paths[] = { > "/root/.my.cnf", <<<<<<<<<<<<<<<<<------- add this line > "/etc/my.cnf", "/etc/opt/cray/MySQL/my.cnf", > "/etc/mysql/my.cnf", NULL }; I'll also note that typically the $HOME/.my.cnf file would be checked last rather than first.
-
Morris Jette authored
-
Danny Auble authored
-
Danny Auble authored
don't support strong_alias
-
- 20 Feb, 2015 1 commit
-
-
Dorian Krause authored
we came across the following error message in the slurmctld logs when using non-consumable resources: error: gres/potion: job 39 dealloc of node node1 bad node_offset 0 count is 0 The error comes from _job_dealloc(): node_gres_data=0x7f8a18000b70, node_offset=0, gres_name=0x1999e00 "potion", job_id=46, node_name=0x1987ab0 "node1") at gres.c:3980 (job_gres_list=0x199b7c0, node_gres_list=0x199bc38, node_offset=0, job_id=46, node_name=0x1987ab0 "node1") at gres.c:4190 job_ptr=0x19e9d50, pre_err=0x7f8a31353cb0 "_will_run_test", remove_all=true) at select_linear.c:2091 bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, max_share=1, req_nodes=1, preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40) at select_linear.c:3176 bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2, preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40, exc_core_bitmap=0x0) at select_linear.c:3390 bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2, preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40, exc_core_bitmap=0x0) at node_select.c:588 avail_bitmap=0x7f8a2f910d38, min_nodes=1, max_nodes=1, req_nodes=1, exc_core_bitmap=0x0) at backfill.c:367 The cause of this problem is that _node_state_dup() in gres.c does not duplicate the no_consume flag. The cr_ptr passed to _rm_job_from_nodes() is created with _dup_cr() which calls _node_state_dup(). Below is a simple patch to fix the problem. A "future-proof" alternative might be to memcpy() from gres_ptr to new_gres and only handle pointers separately.
-
- 19 Feb, 2015 3 commits
-
-
Brian Christiansen authored
Bug 1471
-
Morris Jette authored
"If you specify a maximum node count and the host list contains more nodes, the extra node names will be silently ignored." Not so.
-
Danny Auble authored
runs certain sreport reports.
-
- 18 Feb, 2015 2 commits
-
-
Morris Jette authored
Add SLURM_JOB_GPUS environment variable to those available in Prolog. Also add list of environment variables available in the various prologs and epilogs on the web page. bug 1458
-
Brian Christiansen authored
-
- 17 Feb, 2015 4 commits
-
-
Danny Auble authored
runjob happened, and the step was part of an array. This is an addition to commit 49e0f5f2
-
Danny Auble authored
in the runjob_mux plugin.
-
Brian Christiansen authored
Bug 1461 Commit: 2e2d924e
-
Morris Jette authored
See bug 1461
-
- 13 Feb, 2015 2 commits
-
-
David Bigagli authored
-
Morris Jette authored
If call was made to change a node's state to the same state it was already in and set its reason to the same value it already had, then an accounting record was generated. If a script, say NodeHealthCheck is repeatedly setting a node state (say DRAIN), it could generate a huge number of redundant accounting records. This eliminates these redundant records. related to bug 1437
-
- 12 Feb, 2015 4 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Brian Christiansen authored
-
Brian Christiansen authored
Bug 1446
-
- 11 Feb, 2015 1 commit
-
-
Danny Auble authored
don't insert a new row in the event table.
-
- 10 Feb, 2015 2 commits
-
-
Brian Christiansen authored
uid's are 0 when associations are loaded.
-
Morris Jette authored
The backfill scheduler build a queue of eligible job/partition information and then proceeds to determine when and where those jobs will start. The backfill scheduler can be configured to periodically release locks in order to let other operations take place. If the partition(s) associated with one of those jobs changes during one of those periods, the job will still be considered for scheduling in the old partition until the backfill scheduler starts over with a new job/partition list. This change to the backfill scheduler validates each job's partition in from the list based upon current information (considering any partition changes). See bug 1436
-