- 31 May, 2018 1 commit
-
-
Morris Jette authored
Also eliminate some possible fatal errors on reconfig or internal error
-
- 30 May, 2018 4 commits
-
-
Morris Jette authored
-
Morris Jette authored
If two arrays differ in size, make them the same size to perform bit_or/and/and_not operations on the full array
-
Morris Jette authored
-
Morris Jette authored
-
- 29 May, 2018 3 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
- 28 May, 2018 1 commit
-
-
Morris Jette authored
-
- 18 May, 2018 2 commits
-
-
Morris Jette authored
create src/common/tres_frequency.[ch] module based upon cpu_frequency.[ch] modify launch RPCs to pass the value from slurmctld to slurmstepd validate --gpu-freq value from salloc, sbatch, and srun
-
Morris Jette authored
Add v18.08 versions of un/pack functions for REQUEST_LAUNCH_TASKS and REQUEST_BATCH_JOB_LAUNCH RPCs
-
- 17 May, 2018 3 commits
-
-
Morris Jette authored
-
Morris Jette authored
Use tres_per_job/node/socket/task instead
-
Morris Jette authored
Completely remove "gres" field from step record in slurmctld and step info message. Use "tres_per_node", "tres_per_socket", etc.
-
- 16 May, 2018 9 commits
-
-
Morris Jette authored
This bug was introduced in commit 4a9bffe1 Test12.7 was resulting in the logging of error messages of this sort: sacct: error: slurmdb_ave_tres_usage: couldn't make tres_list from '' This was due to the tres_usage_in_ave and tres_usage_out_ave fields being empty ('\0') if the job's cpu count is zero, which makes calculation of averages impossible. bug 2782
-
Alejandro Sanchez authored
Bug 5174.
-
Dan Barke authored
Since having 'nocreate' would override the following option: create 640 slurm root Bug 5174.
-
Morris Jette authored
Add node_features plugin function "node_features_p_reboot_weight()" to return the node weight to be used for a compute node that requires reboot for use (e.g. to change the NUMA mode of a KNL node). Add NodeRebootWeight parameter to knl.conf configuration file.
-
Morris Jette authored
If ReturnToService=2 is configured, the test could generate an error changing node state to resume after setting it to down. The reason is if the node communicates with slurmctld, then its state will automatically be changed from down to idle and resuming an idle node triggers an error.
-
Alejandro Sanchez authored
Bug 5168.
-
Alejandro Sanchez authored
Previously the default paths continued to be tested even when new ones were requested. This had as a consequence that if any of the new paths was the same as any of the default ones (i.e. /usr or /usr/local), the configure script was incorrectly erroring out specifying that a version of PMIx was already found in a previous path. Bug 5168.
-
Morris Jette authored
Variable initialization plus cosmetic work
-
Morris Jette authored
Rename gres_per_job for step to gres_per_step Remove job gres gres_name_type_id field Build step gres data structure
-
- 14 May, 2018 2 commits
-
-
Morris Jette authored
Prevent run-away jobs
-
Morris Jette authored
-
- 11 May, 2018 7 commits
-
-
Morris Jette authored
-
Morris Jette authored
This is not currently supported and no date for support has been set.
-
Morris Jette authored
If burst_buffer.conf has GetSysState configured to a non-standard location, but GetSysStatus is not configured that is likely indicative of a bad configuration rather than a Slurm failure.
-
Morris Jette authored
Gracefully fail if salloc does not get job allocation
-
Alejandro Sanchez authored
-
Alejandro Sanchez authored
Introduced in bf4cb0b1.
-
Danny Auble authored
-
- 10 May, 2018 8 commits
-
-
Tim Wickberg authored
Support for AIX was removed before 17.02.
-
Morris Jette authored
-
Morris Jette authored
-
Alejandro Sanchez authored
-
Alejandro Sanchez authored
First issue was identified on multi partition requests. job_limits_check() was overriding the original memory requests, so the next partition Slurm validating limits against was not using the original values. The solution consists in adding three members to job_details struct to preserve the original requests. This issue is reported in bug 4895. Second issue was memory enforcement behavior being different depending on job the request issued against a reservation or not. Third issue had to do with the automatic adjustments Slurm did underneath when the memory request exceeded the limit. These adjustments included increasing pn_min_cpus (even incorrectly beyond the number of cpus available on the nodes) or different tricks increasing cpus_per_task and decreasing mem_per_cpu. Fourth issue was identified when requesting the special case of 0 memory, which was handled inside the select plugin after the partition validations and thus that could be used to incorrectly bypass the limits. Issues 2-4 were identified in bug 4976. Patch also includes an entire refactor on how and when job memory is is both set to default values (if not requested initially) and how and when limits are validated. Co-authored-by: Dominik Bartkiewicz <bart@schedmd.com>
-
Danny Auble authored
The slurmctld doesn't need to send the fini message, and actually if it does things get messed up as the slurmdbd will close the database connection prematurely. Up till now we would print an error on the slurmctld saying we couldn't send the FINI.
-
Danny Auble authored
partition is removed then the slurmdbd comes up and we go refresh the tres pointers and try to deference the part_ptr. Related to commit de7eac9a. Bug 5136
-
Danny Auble authored
and move the agent into the accounting_storage/slurmdbd plugin. This should be cleaner going forward and will be easier to maintain.
-