1. 16 May, 2018 6 commits
    • Morris Jette's avatar
      Add reboot node weight parameter · 5ca16284
      Morris Jette authored
      Add node_features plugin function "node_features_p_reboot_weight()" to
         return the node weight to be used for a compute node that requires reboot
         for use (e.g. to change the NUMA mode of a KNL node).
      Add NodeRebootWeight parameter to knl.conf configuration file.
      5ca16284
    • Morris Jette's avatar
      Make a test more robust · b3c73b97
      Morris Jette authored
      If ReturnToService=2 is configured, the test could generate an error
      changing node state to resume after setting it to down. The reason
      is if the node communicates with slurmctld, then its state will
      automatically be changed from down to idle and resuming an idle
      node triggers an error.
      b3c73b97
    • Alejandro Sanchez's avatar
      Run autogen.sh after previous commit. · 525125b1
      Alejandro Sanchez authored
      Bug 5168.
      525125b1
    • Alejandro Sanchez's avatar
      PMIx - override default paths at configure time if --with-pmix is used. · 0b7bbc73
      Alejandro Sanchez authored
      Previously the default paths continued to be tested even when new ones
      were requested. This had as a consequence that if any of the new paths
      was the same as any of the default ones (i.e. /usr or /usr/local), the
      configure script was incorrectly erroring out specifying that a version
      of PMIx was already found in a previous path.
      
      Bug 5168.
      0b7bbc73
    • Morris Jette's avatar
      Fixes for test7.17 · 59be3d83
      Morris Jette authored
      Variable initialization plus cosmetic work
      59be3d83
    • Morris Jette's avatar
      Add some tres-step related logic · 22bf408d
      Morris Jette authored
      Rename gres_per_job for step to gres_per_step
      Remove job gres gres_name_type_id field
      Build step gres data structure
      22bf408d
  2. 14 May, 2018 2 commits
  3. 11 May, 2018 7 commits
  4. 10 May, 2018 8 commits
    • Tim Wickberg's avatar
      Remove AIX pieces from testsuite. · 5cfbd15d
      Tim Wickberg authored
      Support for AIX was removed before 17.02.
      5cfbd15d
    • Morris Jette's avatar
      Merge branch 'slurm-17.11' · fa40dbd6
      Morris Jette authored
      fa40dbd6
    • Morris Jette's avatar
      dc7ca7be
    • Alejandro Sanchez's avatar
      Merge branch 'slurm-17.11' · 1ab63842
      Alejandro Sanchez authored
      1ab63842
    • Alejandro Sanchez's avatar
      Fix different issues when requesting memory per cpu/node. · bf4cb0b1
      Alejandro Sanchez authored
      
      
      First issue was identified on multi partition requests. job_limits_check()
      was overriding the original memory requests, so the next partition
      Slurm validating limits against was not using the original values. The
      solution consists in adding three members to job_details struct to
      preserve the original requests. This issue is reported in bug 4895.
      
      Second issue was memory enforcement behavior being different depending on
      job the request issued against a reservation or not.
      
      Third issue had to do with the automatic adjustments Slurm did underneath
      when the memory request exceeded the limit. These adjustments included
      increasing pn_min_cpus (even incorrectly beyond the number of cpus
      available on the nodes) or different tricks increasing cpus_per_task and
      decreasing mem_per_cpu.
      
      Fourth issue was identified when requesting the special case of 0 memory,
      which was handled inside the select plugin after the partition validations
      and thus that could be used to incorrectly bypass the limits.
      
      Issues 2-4 were identified in bug 4976.
      
      Patch also includes an entire refactor on how and when job memory is
      is both set to default values (if not requested initially) and how and
      when limits are validated.
      
      Co-authored-by: default avatarDominik Bartkiewicz <bart@schedmd.com>
      bf4cb0b1
    • Danny Auble's avatar
      Fix erroneous error on slurmctld shutdown. · 924fa548
      Danny Auble authored
      The slurmctld doesn't need to send the fini message, and actually if
      it does things get messed up as the slurmdbd will close the database
      connection prematurely.  Up till now we would print an error on the
      slurmctld saying we couldn't send the FINI.
      924fa548
    • Danny Auble's avatar
      Fix segfault if running assoc mgr cache and a job finishes and the · 10169937
      Danny Auble authored
      partition is removed then the slurmdbd comes up and we go refresh
      the tres pointers and try to deference the part_ptr.
      
      Related to commit de7eac9a.
      
      Bug 5136
      10169937
    • Danny Auble's avatar
      Split up src/common/slurmdbd_defs into slurmdbd_[defs|pack] · b1ff4342
      Danny Auble authored
      and move the agent into the accounting_storage/slurmdbd plugin.
      
      This should be cleaner going forward and will be easier to maintain.
      b1ff4342
  5. 09 May, 2018 17 commits