- 04 Apr, 2011 6 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
-
- 03 Apr, 2011 8 commits
-
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
options and not really tested, but this is a good start
-
Moe Jette authored
This avoids a warning message which is repeated on each reconfiguration of slurm and which is due to a dangling group configuration in LDAP entries. The error occurs when traversing the secondary group members of a given group name, when trying to add these to a configured group. If these secondary group members have no valid login (e.g. disabled via LDAP configuration), the error is repeated on each reconfigure of slurm. The error is harmless: since the users have no valid login, they can not log into the system anyway. I have raised the issue described below with our LDAP admin, there was no reply (likely since not important enough). Since slurm is not a tool to debug the work of system administrators, and since the secondary group members can not log in anyway, this patch replaces the error message with a comment; it leaves untouched the positive case of found secondary group members that have successfully been added to a configured group due to having a valid passwd/LDAP login entry. Here is the case which gets repeated on our system, showing that each error message corresponds to a 'no such user' error when trying to look up the user id: ----------------------------------------------------------------------------------------------- [2011-03-29T08:19:35] error: Could not find user baradmin in configured group csstaff [2011-03-29T08:19:35] error: Could not find user mvalle in configured group csstaff [2011-03-29T08:19:35] error: Could not find user puradm in configured group csstaff [2011-03-29T08:19:35] error: Could not find user ggobbi in configured group csappli [2011-03-29T08:19:35] error: Could not find user mvalle in configured group csappli ----------------------------------------------------------------------------------------------- palu2:0 ~>getent group csstaff csstaff:*:1000:baradmin,biddisco,jfavre,mvalle,puradm palu2:0 ~>id baradmin id: baradmin: No such user palu2:1 ~>id mvalle id: mvalle: No such user palu2:1 ~>id puradm id: puradm: No such user ==> The secondary group members 'biddisco' and 'jfavre' are ok, no warnings. ----------------------------------------------------------------------------------------------- palu2:1 ~>getent group csappli csappli:*:1010:ajocksch,alam,amangili,annaloro,biddisco,cordery,cponti,fgilles,ggobbi,grenker,jfavre,mgg,mvalle,nstring,piccinal,robinson,soumagne,tack,tadrian,uvaretto,wsawyer palu2:0 ~>id ggobbi id: ggobbi: No such user
-
Moe Jette authored
When running in multiple-slurmd mode, the actual hardware configuration reported by the slurmd is ignored, and internal entries (via register_front_ends() just use 1 as dummy value for CPUs, sockets, cores, and threads. On a dual-core service node this lead to continual warning messages like [2011-04-01T10:06:40] Node configuration differs from hardware Procs=1:2(hw) Sockets=1:1(hw) CoresPerSocket=1:2(hw) ThreadsPerCore=1:1(hw) [2011-04-01T10:07:24] Node configuration differs from hardware Procs=1:2(hw) Sockets=1:1(hw) CoresPerSocket=1:2(hw) ThreadsPerCore=1:1(hw)
-
Moe Jette authored
This audits the select/cray code so that it does not accidentally dereference a NULL job_ptr. This instance happens once, upon restart of slurmctld (detailed description below). Similar checks are also in place in other select plugins, in any case it is better to check this. Almost all cases use xassert(), the only exception is p_job_fini(), which assumes NULL means there is nothing to be finalized.
-
Moe Jette authored
When running in multiple-slurmd mode, the actual hardware configuration reported by the slurmd is ignored, and internal entries (via register_front_ends() just use 1 as dummy value for CPUs, sockets, cores, and threads. On a dual-core service node this lead to continual warning messages like [2011-04-01T10:06:40] Node configuration differs from hardware Procs=1:2(hw) Sockets=1:1(hw) CoresPerSocket=1:2(hw) ThreadsPerCore=1:1(hw) [2011-04-01T10:07:24] Node configuration differs from hardware Procs=1:2(hw) Sockets=1:1(hw) CoresPerSocket=1:2(hw) ThreadsPerCore=1:1(hw) Since validate_nodes_via_front_end() ignores the reported values, it is safe to use the actual hardware configuration here, which also helps with taking stock of the current cluster configuration (e.g. via scontrol show slurmd). After applying this patch, the slurmds report without warnings as [2011-04-01T12:03:38] slurmd version 2.3.0-pre4 started [2011-04-01T12:03:38] slurmd started on Fri 01 Apr 2011 12:03:38 +0200 [2011-04-01T12:03:38] Procs=2 Sockets=1 Cores=2 Threads=1 Memory=3886 TmpDisk=1943 Uptime=14355
-
Moe Jette authored
This caused segfaults/core dumps when the slurmd/slurmctld unloaded the select/cray plugin.
-
- 02 Apr, 2011 1 commit
-
-
Don Lipari authored
-
- 01 Apr, 2011 4 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
- 31 Mar, 2011 10 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Moe Jette authored
Also add the function to sview. log_init() is now one of the first statements for all commands and daemons.
-
Moe Jette authored
-
Moe Jette authored
be required.
-
Moe Jette authored
-
Moe Jette authored
This fixes a bug in adding the configuration file parer. The cray_conf structure must always be created, since we are also using the plugin in stepdmgr context. The observed causes were core dumps and the inability to run batch jobs (since trying to confirm the ALPS reservation with a NULL cray_conf->apbasil resulted in segfaults).
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
- 30 Mar, 2011 11 commits
-
-
Danny Auble authored
-
Danny Auble authored
BLUEGENE - fixed some issues where a block could mistakenly be freed in memory when it shouldn't of.
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
longer time than necessary after restart.
-
Danny Auble authored
-