1. 15 Jan, 2016 1 commit
  2. 14 Jan, 2016 5 commits
    • Morris Jette's avatar
      Fix for Partition access control · fc1a68af
      Morris Jette authored
      Previously if partition limits enforcement was not configured,
        then a job submitted to a partition it could not access (say
        due to AllowGroups, AllowUsers, etc.) would not be rejected,
        but would be allocated resources and run. This bug was
        introduced in commit edf3880c
      fc1a68af
    • Morris Jette's avatar
      Fix for leak in gid cache logic · 254fa751
      Morris Jette authored
      254fa751
    • Morris Jette's avatar
      Cosmetic changes to gid cache work · a6daf947
      Morris Jette authored
      a6daf947
    • Janne Blomqvist's avatar
      Rework group caching to work better in environments with enumeration disabled. · 48a4cdf8
      Janne Blomqvist authored
      The initgroups()/getgrouplist() caching in slurmd is changed to not require enumeration, instead individual entries are cached when first needed. This cache is always enabled, thus the CacheGroups configuration setting has been removed. The time that each cache entry is considered valid is determined by the GroupUpdateTime configuration parameter. scontrol reconfig will purge the cache. The default value for the GroupUpdateForce configuration parameter has changed, as systems where /etc/group contains all the groups instead of some external system like NIS, LDAP are nowadays probably the exception rather than the rule.
      
      For slurmctld, the group cache still uses enumeration, but this is needed only to take care of special situations like multiple groups with the same GID. With enumeration disabled, group caching still works otherwise. validate_groups() does a little more optional work in order to handle the case where the user primary group is in the AllowGroups list, but getgrnam_r() does not return that user as a group member.
      
      bug 1629
      48a4cdf8
    • Morris Jette's avatar
      Avoid slurmstepd abort if malloc fails for accounting · 360fb080
      Morris Jette authored
      If a node is out of memory, then the malloc performed by slurmstepd
        periodically may fail, killing the slurmstepd and orphaning it's
        processes.
      bug 2341
      360fb080
  3. 13 Jan, 2016 4 commits
    • Morris Jette's avatar
      Merge branch 'slurm-15.08' · 3ad0a2d6
      Morris Jette authored
      3ad0a2d6
    • Morris Jette's avatar
      backfill scheduling with group limits fix · 3ee1632f
      Morris Jette authored
      Backfill scheduling fix: If a job can't be started due to a "group" resource
          limit, rather than reserve resources for it when the next job ends, don't
          reserve any resources for it. The problem with the original logic is that
          if a lot of resources are reserved for such pending jobs, then jobs futher
          down the queue may defered when they really can and should be started. An
          ideal solution would track all of the TRES resources through time as jobs
          start and end, but we don't have that logic in the backfill scheduler and
          don't want that extra overhead in the backfill scheduler.
      bugs 2326 and 2282
      3ee1632f
    • Alejandro Sanchez's avatar
      Add more partition info to "scontrol write config" · f428705b
      Alejandro Sanchez authored
      bug 2303
      f428705b
    • Morris Jette's avatar
      Improve slurmstepd logging · f774442e
      Morris Jette authored
      f774442e
  4. 12 Jan, 2016 14 commits
  5. 11 Jan, 2016 16 commits