- 03 Mar, 2015 1 commit
-
-
Morris Jette authored
The option has not been functional or documented since Slurm version 2.0.
-
- 27 Feb, 2015 5 commits
-
-
Morris Jette authored
This controls how long a requeued job must wait before it can restart, and 20 minutes is too long in most cases. Administrators can alter this configuration parameter if needed in case of slow Prolog or the like.
-
Morris Jette authored
Use this to specify the lifetime of a job step credential.
-
Brian Christiansen authored
Bug 1476
-
Morris Jette authored
Set the delay time for job requeue to the job credential lifetime (1200 second by default). This insures that prolog runs on every node when a job is requeued. (This change will slow down launch of re-queued jobs). Without this change, if a job is restated within 1200 seconds, the nodes previously used would not run the prolog again, since the job ID is still seen as active (from the previous execution). It is also advisable to set the value of DEFAULT_EXPIRATION_WINDOW in src/common/slurm_cred.c to the lowest value reasonable. We need to add a new configuration parameter so this is easly changed in the future.
-
Brian Christiansen authored
Display job's estimated NodeCount based off of partition's configured resources rather than the whole system's. Bug 1478
-
- 26 Feb, 2015 2 commits
-
-
David Bigagli authored
-
Morris Jette authored
Previously, there was no binding of tasks to the appropriate NUMA. Based upon work by Josko Plazonic <plazonic@princeton.edu>.
-
- 25 Feb, 2015 1 commit
-
-
Morris Jette authored
Mail notifications on job BEGIN, END and FAIL now apply to a job array as a whole rather than generating individual email messages for each task in the job array.
-
- 24 Feb, 2015 4 commits
-
-
Brian Christiansen authored
Bug 1469
-
Nina Suvanphim authored
The /root/.my.cnf would typically contain the login credentials for root. If those are needed for Slurm, then it should be checking that directory. (In reply to Nina Suvanphim from comment #0) ... > const char *default_conf_paths[] = { > "/root/.my.cnf", <<<<<<<<<<<<<<<<<------- add this line > "/etc/my.cnf", "/etc/opt/cray/MySQL/my.cnf", > "/etc/mysql/my.cnf", NULL }; I'll also note that typically the $HOME/.my.cnf file would be checked last rather than first.
-
Danny Auble authored
-
Danny Auble authored
don't support strong_alias
-
- 20 Feb, 2015 2 commits
-
-
Morris Jette authored
-
Dorian Krause authored
we came across the following error message in the slurmctld logs when using non-consumable resources: error: gres/potion: job 39 dealloc of node node1 bad node_offset 0 count is 0 The error comes from _job_dealloc(): node_gres_data=0x7f8a18000b70, node_offset=0, gres_name=0x1999e00 "potion", job_id=46, node_name=0x1987ab0 "node1") at gres.c:3980 (job_gres_list=0x199b7c0, node_gres_list=0x199bc38, node_offset=0, job_id=46, node_name=0x1987ab0 "node1") at gres.c:4190 job_ptr=0x19e9d50, pre_err=0x7f8a31353cb0 "_will_run_test", remove_all=true) at select_linear.c:2091 bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, max_share=1, req_nodes=1, preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40) at select_linear.c:3176 bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2, preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40, exc_core_bitmap=0x0) at select_linear.c:3390 bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2, preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40, exc_core_bitmap=0x0) at node_select.c:588 avail_bitmap=0x7f8a2f910d38, min_nodes=1, max_nodes=1, req_nodes=1, exc_core_bitmap=0x0) at backfill.c:367 The cause of this problem is that _node_state_dup() in gres.c does not duplicate the no_consume flag. The cr_ptr passed to _rm_job_from_nodes() is created with _dup_cr() which calls _node_state_dup(). Below is a simple patch to fix the problem. A "future-proof" alternative might be to memcpy() from gres_ptr to new_gres and only handle pointers separately.
-
- 19 Feb, 2015 2 commits
-
-
Brian Christiansen authored
Bug 1471
-
Danny Auble authored
runs certain sreport reports.
-
- 18 Feb, 2015 5 commits
-
-
Morris Jette authored
Added "--mail=stage_out" option to job submission commands to notify user when burst buffer state out is complete.
-
Morris Jette authored
Add SLURM_JOB_CONSTAINTS to environment variables available to the Prolog. bug 1458
-
Morris Jette authored
Add job credential to "Run Prolog" RPC used with a configuration of PrologFlags=alloc. This allows the Prolog to be passed identification of GPUs allocated to the job.
-
Morris Jette authored
Add SLURM_JOB_GPUS environment variable to those available in Prolog. Also add list of environment variables available in the various prologs and epilogs on the web page. bug 1458
-
Brian Christiansen authored
-
- 17 Feb, 2015 2 commits
-
-
Danny Auble authored
in the runjob_mux plugin.
-
Brian Christiansen authored
Bug 1461 Commit: 2e2d924e
-
- 14 Feb, 2015 1 commit
-
-
Danny Auble authored
-
- 13 Feb, 2015 1 commit
-
-
David Bigagli authored
-
- 12 Feb, 2015 4 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Brian Christiansen authored
-
Brian Christiansen authored
Bug 1446
-
- 11 Feb, 2015 2 commits
-
-
Danny Auble authored
don't insert a new row in the event table.
-
Nathan Yee authored
Bug 1321
-
- 10 Feb, 2015 2 commits
-
-
Brian Christiansen authored
uid's are 0 when associations are loaded.
-
Brian Christiansen authored
Fix segfault in controller when deleting a user association of a user which had been previously removed from the system. Bug 1238
-
- 09 Feb, 2015 4 commits
-
-
Morris Jette authored
Fix slurmctld initialization problem which could cause requeue of the last task in a job array to fail if executed prior to the slurmctld loading the maximum size of a job array into a variable in the job_mgr.c module.
-
Morris Jette authored
Fix slurmctld job recovery logic which could cause the last task in a job array to be lost on restart.
-
Nicolas Joly authored
-
Morris Jette authored
In order to support inter-cluster job dependencies, the MaxJobID configuration parameter default value has been reduced from 4,294,901,760 to 2,147,418,112 and it's maximum value is now 2,147,463,647. ANY JOBS WITH A JOB ID ABOVE 2,147,463,647 WILL BE PURGED WHEN SLURM IS UPGRADED FROM AN OLDER VERSION!
-
- 06 Feb, 2015 2 commits
-
-
Morris Jette authored
-- Add job submission command options: --sicp (available for inter-cluster dependencies) and --power (specify power management options) to salloc, sbatch, and srun commands. -- Add DebugFlags option of SICP (inter-cluster option logging). -- Added DebugFlags value "SICP". -- Added job_descriptor field of "cluster".
-
David Bigagli authored
-