- 31 Mar, 2011 7 commits
-
-
Moe Jette authored
-
Moe Jette authored
be required.
-
Moe Jette authored
-
Moe Jette authored
This fixes a bug in adding the configuration file parer. The cray_conf structure must always be created, since we are also using the plugin in stepdmgr context. The observed causes were core dumps and the inability to run batch jobs (since trying to confirm the ALPS reservation with a NULL cray_conf->apbasil resulted in segfaults).
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
- 30 Mar, 2011 19 commits
-
-
Danny Auble authored
-
Danny Auble authored
BLUEGENE - fixed some issues where a block could mistakenly be freed in memory when it shouldn't of.
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
longer time than necessary after restart.
-
Danny Auble authored
-
-
Danny Auble authored
BLUEGENE - Added back a lock when creating dynamic blocks to be more thread safe on larger systems with heavy load.
-
Moe Jette authored
-
Moe Jette authored
fizzy_equal(qos_ptr->usage_thres, X). This deals with imprecision in storage for a double, especially with respect to un/pack across machine architectures.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
Fix associations/qos for when adding back a previously deleted object the object will be cleared of all old limits.
-
Danny Auble authored
-
Moe Jette authored
#include <slurm/slurm.h> to #include "slurm/slurm.h" so that the original source gets searched first
-
https://eris.llnl.gov/svn/slurm/branches/cgroups_Matthieu/Don Lipari authored
+ -- Added proctrack/cgroup and task/cgroup plugins written by Matthieu + Hautreux, CEA.
-
Moe Jette authored
-
- 29 Mar, 2011 14 commits
-
-
-
Moe Jette authored
The man page for slurm.conf, select/cons_res parameter SelectTypeParameters, values CR_Socket and CR_Socket_Memory states the following: "Note that jobs requesting one CPU will only be given access to that one CPU" I think this statement is incorrect, or at least very misleading to users. A job requesting one CPU will only be allocated one CPU, but unless task/affinity is enabled or some other CPU binding mechanism is used, the job can access all of the CPUs on the node. That is, a task that is distributed to the node can run on any of the CPUs on the node, not just on the one CPU that was allocated to its job. I propose the following patch to replace "given access to" with "allocated". Regards, Martin Perry
-
Moe Jette authored
printed if memory is not allocated.
-
Danny Auble authored
-
Moe Jette authored
by behaving like slurmctld (truncation of double value) and rounding double-valued components otherwise. I have tested this and observed that it improves the accuracy. priority/multifactor: minimize rounding errors This fixes a rounding problem introduced in an earlier patch, 26_PRIO_print-negative-sprio.diff "sprio: print overall priority value even if it is less than 0", and minimizes other sources of rounding errors in the computation of floating-point sprio factors. Summary of issues fixed by this patch: -------------------------------------- * when assembling the job_ptr->priority (the squeue -o %Q output), truncation happens when converting from double to uint32_t (fractions are discarded); * the priority components are all double-valued, hence it would minimize accumulation of rounding errors to display rounded values (using %.0f); * these values are displayed using _print_int(), for all integral values passed to this function, there is no change in the output. Example showing the minimization of rounding errors: ---------------------------------------------------- -> The difference is visible when comparing the `priority' value with the sum (age + jobsize + partition - nice), rounding the factors ('after' result) improves the accuracy. Before: palu ~> sprio JOBID USER PRIORITY AGE FAIRSHARE JOBSIZE PARTITION QOS NICE 11698 vondele 14526 113 0 4289 10000 0 -123 11711 vondele 14495 81 0 4289 10000 0 -123 11712 sukysj 11248 80 0 236 10000 0 -931 11728 piccinal 20065 7 0 56 10000 0 -10000 11740 piccinal 20122 7 0 113 10000 0 -10000 11742 piccinal 20349 7 0 340 10000 0 -10000 After: palu build> ./sprio -l JOBID USER PRIORITY AGE FAIRSHARE JOBSIZE PARTITION QOS NICE 11698 vondele 14526 113 0 4290 10000 0 -123 11711 vondele 14495 82 0 4290 10000 0 -123 11712 sukysj 11248 80 0 237 10000 0 -931 11728 piccinal 20065 8 0 57 10000 0 -10000 11740 piccinal 20122 8 0 114 10000 0 -10000 11742 piccinal 20349 8 0 341 10000 0 -10000 Other changes: -------------- * declared _print_{int,norm} static, since only referenced in print.c.
-
Moe Jette authored
-
Moe Jette authored
We have installed the same file this morning on all our systems (including a non-Cray cluster which also is SuSe based). I have verified that the limits get picked up by looking at /proc/$(pidof slurmd)/limits. select/cray: override ulimits on SuSe based system This provides a sample /etc/sysconfig/slurm file to override ulimits on Suse systems such as Cray. Since slurm respects limits configured by the system administrator, and since Cray/SuSe systems (in contrast to Debian-based systems) do not automatically exempt processes owned by the super-user from pam_limits configured in /etc/security/limits.conf, it can (and did) happen on Cray systems that such limits cause premature and counter-intuitive interaction with slurmd frontend nodes. The provided file overrides limits, using sensible defaults which have been inspired by the defaults set for processes owned by user root
-
Moe Jette authored
did this in preparation for the migration from PBS which will start next week. sbatch: support mpp.* PBS variants This adds support for Cray-specific PBS directives: * mppwidth: Task width (corresponds to --ntasks). This is not directly mapped, depends on the other parameters. * mppmem: Memory in units of k/m/g. Default unit is Mbyte, kbyte units are rounded up to the next Mbyte. Actual amount depends on mppnppn. * mppdepth: Task depth, maps into --cpus-per-task. * mppnppn: Processing elements per node, maps into --ntasks-per-node. * mppnodes: Nodelist. In contrast to PBS, requires nid%05u prefix, i.e the comma-separated list contains single entries nid%05u and/or ranges nid%05u-nid%05u.
-
Moe Jette authored
value (due to uint32_t conversion) in sprio. Helpful when fine-tuning weight parameters. sprio: print overall priority value even if it is less than 0 With some combinations of component values and low weight factors, it can happen that the priority computed by the priority/multifactor plugin lies below 0 (and would be rounded up to 2). When this condition happens, the negative values are difficult to interpret and can give the wrong impression that the resulting priority is very large (due to the conversion into a large unsigned number). In our tests we found it more helpful to display the negative priority value: a user can know that SLURM does not use negative values, having the absolute value gives a better indication how much weight to add to the other factors so that the overall priority centers around 0. Before: JOBID USER PRIORITY AGE FAIRSHARE JOBSIZE PARTITION QOS NICE 9968 sukysj 8955 218 0 236 0 0 -8500 10065 amsmax 4294957826 9 0 340 0 0 9821 10066 amsmax 4294957826 9 0 340 0 0 9821 10067 amsmax 4294957826 9 0 340 0 0 9821 10068 amsmax 4294957826 9 0 340 0 0 9821 10069 amsmax 4294957826 9 0 340 0 0 9821 10070 amsmax 4294957826 9 0 340 0 0 9821 After: JOBID USER PRIORITY AGE FAIRSHARE JOBSIZE PARTITION QOS NICE 9968 sukysj 8955 218 0 236 0 0 -8500 10065 amsmax -9470 9 0 340 0 0 9821 10066 amsmax -9470 9 0 340 0 0 9821 10067 amsmax -9470 9 0 340 0 0 9821 10068 amsmax -9470 9 0 340 0 0 9821 10069 amsmax -9470 9 0 340 0 0 9821 10070 amsmax -9470 9 0 340 0 0 9821
-
Moe Jette authored
set directly (since the priority factor fields are 0). i rity/multifactor: skip jobs whose priority has been set directly This avoids displaying "house numbers" in sprio if the priority has been set directly, as in the following example for aghasemi (whose group is a "bottom-feeder" with a fixed priority of 10): palu> squeue JOBID USER ACCOUNT NAME PARTITION ST REASON START_TIME TIME TIME_LEFT NODES PRIORITY 6971 robinson g13 cp2k day PD Resources 2011-03-16T13:09 0:00 40:00 35 10327 6983 rpopescu s190 bash day PD Resources N/A 0:00 1:00:00 1 8254 6958 aghasemi s142 poslow007 day PD Priority 2011-03-16T15:28 0:00 1:00:00 108 10 palu> sprio JOBID USER PRIORITY AGE FAIRSHARE JOBSIZE PARTITION QOS NICE 6958 aghasemi 10000 0 0 0 0 0 -10000 6964 rpopescu 8353 71 0 56 0 0 -8225 6971 robinson 10327 63 0 1988 0 0 -8276 ...
-
Moe Jette authored
this is ongoing, whenever I see something, I add it to such a patch.
-
Moe Jette authored
-
Danny Auble authored
-
Danny Auble authored
-