1. 20 Jan, 2016 4 commits
    • Morris Jette's avatar
      Merge branch 'slurm-15.08' · dee2d99d
      Morris Jette authored
      dee2d99d
    • Morris Jette's avatar
      Prevent job_cnt_run · f76586bf
      Morris Jette authored
      It was previously triggered by executing "scontrol reconfig" on a
        front-end system while there was a job in completing state.
      f76586bf
    • Morris Jette's avatar
      Properly track resources for suspended jobs on reconfig · 21c52d2f
      Morris Jette authored
      Properly account for memory, CPUs and GRES when slurmctld is reconfigured
          while there is a suspended job. Previous logic would add the CPUs, but not
          memory or GPUs. This would result in underflow/overflow errors in select
          cons_res plugin.
      bug 2353
      21c52d2f
    • Morris Jette's avatar
      Correct handling of front-end running job count · e5a61746
      Morris Jette authored
      The counter is really intended to reflect the count of running or
        suspended jobs rather than running jobs alone. Previous logic
        would report an underflow for the "job_cnt_run" variable if
        1. job submitted
        2. job suspended
        3. scontrol reconfig
        4. job cancelled
      e5a61746
  2. 19 Jan, 2016 3 commits
    • Morris Jette's avatar
      Improve select/cons_res logging · 82f61b0d
      Morris Jette authored
      Log the length of bitmaps in addition to the bits set.
      Also increase the string length used for logging.
      82f61b0d
    • Morris Jette's avatar
      Fix for socket allocations and specialized cores · a260397a
      Morris Jette authored
      Previous logic would prevent allocation of sockets to a job unless the
      entire socket was available. If there were any specialized cores, the
      socket was treated as being not available and unusable. For example,
      if a node had 2 sockets, then a job requesting 2 specialized cores
      would reserve one core on each of the two sockets and render the job
      not runnable.
      a260397a
    • Morris Jette's avatar
      Remove redundant sinfo logic · 5e08b4d1
      Morris Jette authored
      There was logic in sinfo's print state function that determined
      if the state was MIXED. This logic was duplicated logic from the
      _query_server() function in sinfo.c and has been removed. Also
      note the logic was already gone from the "short state" print
      function (I noticed the discrepeancy in the print functions,
      but discovered they both printed the correct state information).
      5e08b4d1
  3. 18 Jan, 2016 2 commits
  4. 17 Jan, 2016 1 commit
  5. 16 Jan, 2016 2 commits
  6. 15 Jan, 2016 13 commits
  7. 14 Jan, 2016 8 commits
    • Morris Jette's avatar
      Merge branch 'slurm-14.11' into slurm-15.08 · c17396d7
      Morris Jette authored
      c17396d7
    • Morris Jette's avatar
      fix AuthInfo with alternate munge socket location · f3d54f99
      Morris Jette authored
      Fix for configuration of "AuthType=munge" and "AuthInfo=socket=..." with
          alternate munge socket path.
      bug 2348
      f3d54f99
    • Morris Jette's avatar
      Fix for Partition access control · fc1a68af
      Morris Jette authored
      Previously if partition limits enforcement was not configured,
        then a job submitted to a partition it could not access (say
        due to AllowGroups, AllowUsers, etc.) would not be rejected,
        but would be allocated resources and run. This bug was
        introduced in commit edf3880c
      fc1a68af
    • Morris Jette's avatar
      Fix for leak in gid cache logic · 254fa751
      Morris Jette authored
      254fa751
    • Morris Jette's avatar
      Cosmetic changes to gid cache work · a6daf947
      Morris Jette authored
      a6daf947
    • Janne Blomqvist's avatar
      Rework group caching to work better in environments with enumeration disabled. · 48a4cdf8
      Janne Blomqvist authored
      The initgroups()/getgrouplist() caching in slurmd is changed to not require enumeration, instead individual entries are cached when first needed. This cache is always enabled, thus the CacheGroups configuration setting has been removed. The time that each cache entry is considered valid is determined by the GroupUpdateTime configuration parameter. scontrol reconfig will purge the cache. The default value for the GroupUpdateForce configuration parameter has changed, as systems where /etc/group contains all the groups instead of some external system like NIS, LDAP are nowadays probably the exception rather than the rule.
      
      For slurmctld, the group cache still uses enumeration, but this is needed only to take care of special situations like multiple groups with the same GID. With enumeration disabled, group caching still works otherwise. validate_groups() does a little more optional work in order to handle the case where the user primary group is in the AllowGroups list, but getgrnam_r() does not return that user as a group member.
      
      bug 1629
      48a4cdf8
    • Morris Jette's avatar
      Avoid slurmstepd abort if malloc fails for accounting · 360fb080
      Morris Jette authored
      If a node is out of memory, then the malloc performed by slurmstepd
        periodically may fail, killing the slurmstepd and orphaning it's
        processes.
      bug 2341
      360fb080
    • Morris Jette's avatar
      Avoid slurmstepd abort if malloc fails for accounting · d5400aa5
      Morris Jette authored
      If a node is out of memory, then the malloc performed by slurmstepd
        periodically may fail, killing the slurmstepd and orphaning it's
        processes.
      bug 2341
      d5400aa5
  8. 13 Jan, 2016 4 commits
    • Morris Jette's avatar
      Merge branch 'slurm-15.08' · 3ad0a2d6
      Morris Jette authored
      3ad0a2d6
    • Morris Jette's avatar
      backfill scheduling with group limits fix · 3ee1632f
      Morris Jette authored
      Backfill scheduling fix: If a job can't be started due to a "group" resource
          limit, rather than reserve resources for it when the next job ends, don't
          reserve any resources for it. The problem with the original logic is that
          if a lot of resources are reserved for such pending jobs, then jobs futher
          down the queue may defered when they really can and should be started. An
          ideal solution would track all of the TRES resources through time as jobs
          start and end, but we don't have that logic in the backfill scheduler and
          don't want that extra overhead in the backfill scheduler.
      bugs 2326 and 2282
      3ee1632f
    • Alejandro Sanchez's avatar
      Add more partition info to "scontrol write config" · f428705b
      Alejandro Sanchez authored
      bug 2303
      f428705b
    • Morris Jette's avatar
      Improve slurmstepd logging · f774442e
      Morris Jette authored
      f774442e
  9. 12 Jan, 2016 3 commits
    • Tim Wickberg's avatar
      Merge branch 'slurm-15.08' · 3a575902
      Tim Wickberg authored
      Conflicts:
      	src/api/partition_info.c
      3a575902
    • Tim Wickberg's avatar
      Cleanup slurm_sprint_partition_info / slurm_sprint_reservation_info · 581e811c
      Tim Wickberg authored
      Handle unexpectedly large lines for hostlists. (Bug 2333.)
      
      While here rework to avoid extraneous xstrcat calls by using
      xstrfmtcat instead of snprintf + xstrcat. Collapse line end into
      own string for readability.
      
      No performance or functional change, aside from removing possible
      line truncation (which will silence additional Coverity warnings).
      
      Removes a double xfree() in slurm_sprint_reservation_info().
      581e811c
    • Morris Jette's avatar
      Compress reservation host names · 30a8150c
      Morris Jette authored
      When a reservation is created or updated, compress user provided node names
          using hostlist functions (e.g. translate user input of "Nodes=tux1,tux2"
          into "Nodes=tux[1-2]").
      bug 2333
      30a8150c