- 24 Feb, 2015 9 commits
-
-
Michael A. Raymond authored
-
Morris Jette authored
-
Nina Suvanphim authored
The /root/.my.cnf would typically contain the login credentials for root. If those are needed for Slurm, then it should be checking that directory. (In reply to Nina Suvanphim from comment #0) ... > const char *default_conf_paths[] = { > "/root/.my.cnf", <<<<<<<<<<<<<<<<<------- add this line > "/etc/my.cnf", "/etc/opt/cray/MySQL/my.cnf", > "/etc/mysql/my.cnf", NULL }; I'll also note that typically the $HOME/.my.cnf file would be checked last rather than first.
-
Morris Jette authored
Fix some logic related to power distribution across nodes
-
Morris Jette authored
-
Morris Jette authored
-
Danny Auble authored
-
Danny Auble authored
don't support strong_alias
-
Morris Jette authored
Update power management web page: Add notes about powering nodes down/up Prevent underflow in power distribution logic Add logic to identify nodes in "ready" state. Only ready nodes can have their power caps modified Don't change power cap if node not in ready state Various improvements to logging Refactor code to eliminate duplicate/repeated building of full NID list Plug some memory leaks
-
- 23 Feb, 2015 1 commit
-
-
Morris Jette authored
Modify test 12.7 so that we specify a reason when setting a node DOWN A recent change to the Slurm code now requires a reason
-
- 21 Feb, 2015 1 commit
-
-
Morris Jette authored
-
- 20 Feb, 2015 5 commits
-
-
Morris Jette authored
-
Morris Jette authored
Correct capmc arguments to set power cap. Convert "capmc get_node_energy_counter" to use hostlist expressin rather than listing every node in a comma separated list. Log commands and args run by the plugin via the power_run_script() function in src/plugins/power/common/power_common.c. Use hostlist to build condenced nid list for power cap set/clear functions.
-
Morris Jette authored
-
Dorian Krause authored
we came across the following error message in the slurmctld logs when using non-consumable resources: error: gres/potion: job 39 dealloc of node node1 bad node_offset 0 count is 0 The error comes from _job_dealloc(): node_gres_data=0x7f8a18000b70, node_offset=0, gres_name=0x1999e00 "potion", job_id=46, node_name=0x1987ab0 "node1") at gres.c:3980 (job_gres_list=0x199b7c0, node_gres_list=0x199bc38, node_offset=0, job_id=46, node_name=0x1987ab0 "node1") at gres.c:4190 job_ptr=0x19e9d50, pre_err=0x7f8a31353cb0 "_will_run_test", remove_all=true) at select_linear.c:2091 bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, max_share=1, req_nodes=1, preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40) at select_linear.c:3176 bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2, preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40, exc_core_bitmap=0x0) at select_linear.c:3390 bitmap=0x7f8a18001ad0, min_nodes=1, max_nodes=1, req_nodes=1, mode=2, preemptee_candidates=0x0, preemptee_job_list=0x7f8a2f910c40, exc_core_bitmap=0x0) at node_select.c:588 avail_bitmap=0x7f8a2f910d38, min_nodes=1, max_nodes=1, req_nodes=1, exc_core_bitmap=0x0) at backfill.c:367 The cause of this problem is that _node_state_dup() in gres.c does not duplicate the no_consume flag. The cr_ptr passed to _rm_job_from_nodes() is created with _dup_cr() which calls _node_state_dup(). Below is a simple patch to fix the problem. A "future-proof" alternative might be to memcpy() from gres_ptr to new_gres and only handle pointers separately.
-
Morris Jette authored
-
- 19 Feb, 2015 4 commits
-
-
Brian Christiansen authored
Bug 1471
-
Morris Jette authored
-
Morris Jette authored
"If you specify a maximum node count and the host list contains more nodes, the extra node names will be silently ignored." Not so.
-
Danny Auble authored
runs certain sreport reports.
-
- 18 Feb, 2015 11 commits
-
-
Morris Jette authored
-
Morris Jette authored
For srun command with the --no-alloc option, the dummy credential created did not have two new fields (job_constraints and job_gres_list) initialized, resulting in invalid memory references. Bug introduced earlier today.
-
Morris Jette authored
Added "--mail=stage_out" option to job submission commands to notify user when burst buffer state out is complete.
-
Morris Jette authored
Add new job_descriptor fields to the job_submit/lua interface: clusters, power_flags, and sicp_mode
-
Morris Jette authored
Add SLURM_JOB_CONSTAINTS to environment variables available to the Prolog. bug 1458
-
Morris Jette authored
Add job credential to "Run Prolog" RPC used with a configuration of PrologFlags=alloc. This allows the Prolog to be passed identification of GPUs allocated to the job.
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Add SLURM_JOB_GPUS environment variable to those available in Prolog. Also add list of environment variables available in the various prologs and epilogs on the web page. bug 1458
-
Danny Auble authored
-
Brian Christiansen authored
-
- 17 Feb, 2015 9 commits
-
-
Danny Auble authored
runjob happened, and the step was part of an array. This is an addition to commit 49e0f5f2
-
Danny Auble authored
in the runjob_mux plugin.
-
Brian Christiansen authored
Bug 1461 Commit: 2e2d924e
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
This completes work on the JSON parser for the capmc get_power_cap_capabilities command, which reports specifics about each node's power caps and ranges.
-
Morris Jette authored
Add a find_node_record2() function that differs from find_node_record() in that it does not print an error if a node name is not found. This is important for use with the Cray capmc command, which generates information about not only compute nodes, but also login and service nodes. Using this function prevents a bunch of errors from the power/cray plugin when trying to find these non-compute nodes in the slurm configuration.
-