- 15 Jan, 2016 1 commit
-
-
Morris Jette authored
Fix for configuration of "AuthType=munge" and "AuthInfo=socket=..." with alternate munge socket path. bug 2348
-
- 14 Jan, 2016 2 commits
-
-
Janne Blomqvist authored
The initgroups()/getgrouplist() caching in slurmd is changed to not require enumeration, instead individual entries are cached when first needed. This cache is always enabled, thus the CacheGroups configuration setting has been removed. The time that each cache entry is considered valid is determined by the GroupUpdateTime configuration parameter. scontrol reconfig will purge the cache. The default value for the GroupUpdateForce configuration parameter has changed, as systems where /etc/group contains all the groups instead of some external system like NIS, LDAP are nowadays probably the exception rather than the rule. For slurmctld, the group cache still uses enumeration, but this is needed only to take care of special situations like multiple groups with the same GID. With enumeration disabled, group caching still works otherwise. validate_groups() does a little more optional work in order to handle the case where the user p...
-
Morris Jette authored
If a node is out of memory, then the malloc performed by slurmstepd periodically may fail, killing the slurmstepd and orphaning it's processes. bug 2341
-
- 13 Jan, 2016 2 commits
-
-
Morris Jette authored
Backfill scheduling fix: If a job can't be started due to a "group" resource limit, rather than reserve resources for it when the next job ends, don't reserve any resources for it. The problem with the original logic is that if a lot of resources are reserved for such pending jobs, then jobs futher down the queue may defered when they really can and should be started. An ideal solution would track all of the TRES resources through time as jobs start and end, but we don't have that logic in the backfill scheduler and don't want that extra overhead in the backfill scheduler. bugs 2326 and 2282
-
Alejandro Sanchez authored
bug 2303
-
- 12 Jan, 2016 5 commits
-
-
Tim Wickberg authored
Handle unexpectedly large lines for hostlists. (Bug 2333.) While here rework to avoid extraneous xstrcat calls by using xstrfmtcat instead of snprintf + xstrcat. Collapse line end into own string for readability. No performance or functional change, aside from removing possible line truncation (which will silence additional Coverity warnings). Removes a double xfree() in slurm_sprint_reservation_info().
-
Morris Jette authored
When a reservation is created or updated, compress user provided node names using hostlist functions (e.g. translate user input of "Nodes=tux1,tux2" into "Nodes=tux[1-2]"). bug 2333
-
Tim Wickberg authored
Match behavior of other PBS-like resource managers. Bug 2330.
-
Alejandro Sanchez authored
-
Dorian Krause authored
Don't allow user specified reservation names to disrupt the normal reservation sequeuece numbering scheme. bug 2318
-
- 11 Jan, 2016 6 commits
-
-
Danny Auble authored
-
Danny Auble authored
anything. The slurmd will process things correctly after the fact.
-
Morris Jette authored
The restriction from Cray has been lifted. bug 2317
-
Tim Wickberg authored
Otherwise upgrading slurm on a compute node while tasks are running will cause plugin mismatch, as slurmstepd would not load the library until task completion before. Bug 2319.
-
Nathan Yee authored
Bug 2228
-
Morris Jette authored
The restriction from Cray has been lifted. bug 2317
-
- 08 Jan, 2016 2 commits
-
-
Tim Wickberg authored
Update NEWS file for final removal of Sun Constellation, Elan, and IBM Federation (switch/nrt plugin replaces). Clean up documentation and few outstanding ifdef blocks. Unless you were defining HAVE_SUN_CONST there are no functional changes.
-
Tim Wickberg authored
Otherwise upgrading slurm on a compute node while tasks are running will cause plugin mismatch, as slurmstepd would not load the library until task completion before. Bug 2319.
-
- 07 Jan, 2016 7 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Tim Wickberg authored
Bug 2314.
-
Danny Auble authored
this happens anywhere in the code but just incase it ever does, lets fix it.
-
Morris Jette authored
This can be caused by a core reservation on nodes which get taken out of the system or fail. bug 2296
-
Danny Auble authored
-
Morris Jette authored
Add "features_act" field (currently active features) to the node information. Output of scontrol, sinfo, and sview changed accordingly. The field previously displayed as "Features" is now "AvailableFeatures" while the new field is displayed as "ActiveFeatures".
-
- 06 Jan, 2016 8 commits
-
-
Danny Auble authored
-
Tim Wickberg authored
cnodes can be reserved directly since 14.11. The plugin itself printed warnings that it would be removed circa 15.08, following through before 16.05.
-
Brian Gilmer authored
Cray: Not running the Node Health Check after every job and step is now the default. Configure SelectTypeParameters with the NHC and/or NHC_STEP to run them.
-
Tim Wickberg authored
salloc/sbatch/srun did not mention this. Also reference OverTimeLimit as another option affecting the final run time. Bug 2309.
-
Morris Jette authored
-
Morris Jette authored
-
Danny Auble authored
the job starts update the cpus_per_task appropriately. This also moves update num_tasks to after the setting of node counts on an update. It didn't appear to matter, but the cpus_per_task and pn_min_cpus had to be figured out after the cpus and nodes were set but before tasks. Bug 2302
-
Morris Jette authored
Add an "scontrol top <jobid>" command to re-order the priorities of a user's pending jobs. May be disabled with the "disable_user_top" option in the SchedulerParameters configuration parameter. bug 1133
-
- 05 Jan, 2016 3 commits
-
-
Morris Jette authored
burst_buffer/cray - Improve tracking of allocated resources to handle race condition when reading state while buffer allocation is in progress. Also initialize a mutex
-
Danny Auble authored
DBD for the first time. The corruption is only noticed at shutdown. Bug 2293
-
Morris Jette authored
-
- 04 Jan, 2016 4 commits
-
-
Morris Jette authored
Set job's reason to "Priority" when higher priority job in that partition (or reservation) can not start rather than leaving the reason set to "Resources". bug 2285
-
Morris Jette authored
The partition-specific SelectTypeParameters parameter can now be used to change the memory allocation tracking specification in the global SelectTypeParameters configuration parameter. Supported partition-specific values are CR_Core, CR_Core_Memory, CR_Socket and CR_Socket_Memory. If the global SelectTypeParameters value includes memory allocation management and the partition-specific value does not, then memory allocation management for that partition will NOT be supported (i.e. memory can be over-allocated). Likewise the global SelectTypeParameters might not include memory management while the partition-specific value does. bug 2239
-
Danny Auble authored
error message.
-
Danny Auble authored
-