- 03 Apr, 2011 3 commits
-
-
Moe Jette authored
This audits the select/cray code so that it does not accidentally dereference a NULL job_ptr. This instance happens once, upon restart of slurmctld (detailed description below). Similar checks are also in place in other select plugins, in any case it is better to check this. Almost all cases use xassert(), the only exception is p_job_fini(), which assumes NULL means there is nothing to be finalized.
-
Moe Jette authored
When running in multiple-slurmd mode, the actual hardware configuration reported by the slurmd is ignored, and internal entries (via register_front_ends() just use 1 as dummy value for CPUs, sockets, cores, and threads. On a dual-core service node this lead to continual warning messages like [2011-04-01T10:06:40] Node configuration differs from hardware Procs=1:2(hw) Sockets=1:1(hw) CoresPerSocket=1:2(hw) ThreadsPerCore=1:1(hw) [2011-04-01T10:07:24] Node configuration differs from hardware Procs=1:2(hw) Sockets=1:1(hw) CoresPerSocket=1:2(hw) ThreadsPerCore=1:1(hw) Since validate_nodes_via_front_end() ignores the reported values, it is safe to use the actual hardware configuration here, which also helps with taking stock of the current cluster configuration (e.g. via scontrol show slurmd). After applying this patch, the slurmds report without warnings as [2011-04-01T12:03:38] slurmd version 2.3.0-pre4 started [2011-04-01T12:03:38] slurmd started on Fri 01 Apr 2011 12:03:38 +0200 [2011-04-01T12:03:38] Procs=2 Sockets=1 Cores=2 Threads=1 Memory=3886 TmpDisk=1943 Uptime=14355
-
Moe Jette authored
This caused segfaults/core dumps when the slurmd/slurmctld unloaded the select/cray plugin.
-
- 02 Apr, 2011 1 commit
-
-
Don Lipari authored
-
- 01 Apr, 2011 4 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
- 31 Mar, 2011 10 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Moe Jette authored
Also add the function to sview. log_init() is now one of the first statements for all commands and daemons.
-
Moe Jette authored
-
Moe Jette authored
be required.
-
Moe Jette authored
-
Moe Jette authored
This fixes a bug in adding the configuration file parer. The cray_conf structure must always be created, since we are also using the plugin in stepdmgr context. The observed causes were core dumps and the inability to run batch jobs (since trying to confirm the ALPS reservation with a NULL cray_conf->apbasil resulted in segfaults).
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
- 30 Mar, 2011 19 commits
-
-
Danny Auble authored
-
Danny Auble authored
BLUEGENE - fixed some issues where a block could mistakenly be freed in memory when it shouldn't of.
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
longer time than necessary after restart.
-
Danny Auble authored
-
-
Danny Auble authored
BLUEGENE - Added back a lock when creating dynamic blocks to be more thread safe on larger systems with heavy load.
-
Moe Jette authored
-
Moe Jette authored
fizzy_equal(qos_ptr->usage_thres, X). This deals with imprecision in storage for a double, especially with respect to un/pack across machine architectures.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
Fix associations/qos for when adding back a previously deleted object the object will be cleared of all old limits.
-
Danny Auble authored
-
Moe Jette authored
#include <slurm/slurm.h> to #include "slurm/slurm.h" so that the original source gets searched first
-
https://eris.llnl.gov/svn/slurm/branches/cgroups_Matthieu/Don Lipari authored
+ -- Added proctrack/cgroup and task/cgroup plugins written by Matthieu + Hautreux, CEA.
-
Moe Jette authored
-
- 29 Mar, 2011 3 commits
-
-
-
Moe Jette authored
The man page for slurm.conf, select/cons_res parameter SelectTypeParameters, values CR_Socket and CR_Socket_Memory states the following: "Note that jobs requesting one CPU will only be given access to that one CPU" I think this statement is incorrect, or at least very misleading to users. A job requesting one CPU will only be allocated one CPU, but unless task/affinity is enabled or some other CPU binding mechanism is used, the job can access all of the CPUs on the node. That is, a task that is distributed to the node can run on any of the CPUs on the node, not just on the one CPU that was allocated to its job. I propose the following patch to replace "given access to" with "allocated". Regards, Martin Perry
-
Moe Jette authored
printed if memory is not allocated.
-