1. 10 Mar, 2016 11 commits
    • Morris Jette's avatar
      52f7256d
    • Morris Jette's avatar
      Merge branch 'slurm-15.08' · 87df5a43
      Morris Jette authored
      Conflicts:
      	NEWS
      87df5a43
    • Morris Jette's avatar
      cray job requeue bug · 536c8451
      Morris Jette authored
      Fix Cray NHC spawning on job requeue. Previous logic would leave nodes
      allocated to a requeued job as non-usable on job termination.
      
      Specifically, each job has a "cleaning/cleaned" flag. Once a job
      terminates, the cleaning flag is set, then after the job node health
      check completes, the value gets set to cleaned. If the job is requeued,
      on its second (or subsequent) termination, the select/cray plugin
      is called to launch the NHC. The plugin sees the "cleaned" flag
      already set, it then logs:
      error: select_p_job_fini: Cleaned flag already set for job 1283858, this should never happen
      and returns, never launching the NHC. Since the termination of the
      job NHC triggers releasing job resources (CPUs, memory, and GRES),
      those resources are never released for use by other jobs.
      
      Bug 2384
      536c8451
    • David Gloe's avatar
      Correctly parse nids in slurmconfgen_smw.py · e050806e
      David Gloe authored
      An error in slurmconfgen_smw.py caused it to parse the nic as the nid.
      On some systems those values differ, causing the generated slurm.conf file to
      be incorrect.
      
      Bug 2532.
      e050806e
    • Tim Wickberg's avatar
      Remove unneeded check introduced in 897c4b27 · 8072b2cb
      Tim Wickberg authored
      _set_collectors() already has a run_in_daemon("slurmd") that
      precludes this from being an issue.
      8072b2cb
    • Bill Brophy's avatar
      Fix route/topology plugin to prevent segfault in sbcast. · 0dfc924c
      Bill Brophy authored
      route_p_split_hostlist was not thread-safe, and would cause
      one of several segfaults depending on where in the initialization
      code each thread was.
      
      Bug 2495.
      0dfc924c
    • Tim Wickberg's avatar
      Fix displayed value for RoutePlugin. · db8491f1
      Tim Wickberg authored
      Was incorrectly displaying "(null)" even when loaded successfully.
      db8491f1
    • Morris Jette's avatar
      Add NEWS for commit 3bb2e602 · a0be0dc5
      Morris Jette authored
      a0be0dc5
    • Morris Jette's avatar
      Cray Datawarp job requeue bug fix · 3bb2e602
      Morris Jette authored
      burst_buffer/cray plugin: Prevent a requeued job from being restarted while
          file stage-out is still in progress. Previous logic could restart the job
          and not perform a new stage-in.
      bug 2584, comment #45
      3bb2e602
    • Morris Jette's avatar
      Merge pull request #149 from supermanue/patch-1 · c54cffe5
      Morris Jette authored
      possible bug in smap Makefile
      c54cffe5
    • Manuel Rodríguez-Pascual's avatar
      possible bug in smap Makefile · ddeddbfb
      Manuel Rodríguez-Pascual authored
      LIBS can have a previous value, as depicted in ./configure --help
      
      "Some influential environment variables:
      (...)
       LIBS        libraries to pass to the linker, e.g. -l<library>
      "
      Original assignation to LIBS overwrites this value. With this edition, the user defined flags and NCURSES ones are both employed by the linker.
      ddeddbfb
  2. 09 Mar, 2016 3 commits
    • Morris Jette's avatar
      cray job requeue bug · fec5e03b
      Morris Jette authored
      Fix Cray NHC spawning on job requeue. Previous logic would leave nodes
      allocated to a requeued job as non-usable on job termination.
      
      Specifically, each job has a "cleaning/cleaned" flag. Once a job
      terminates, the cleaning flag is set, then after the job node health
      check completes, the value gets set to cleaned. If the job is requeued,
      on its second (or subsequent) termination, the select/cray plugin
      is called to launch the NHC. The plugin sees the "cleaned" flag
      already set, it then logs:
      error: select_p_job_fini: Cleaned flag already set for job 1283858, this should never happen
      and returns, never launching the NHC. Since the termination of the
      job NHC triggers releasing job resources (CPUs, memory, and GRES),
      those resources are never released for use by other jobs.
      
      Bug 2384
      fec5e03b
    • David Gloe's avatar
      Correctly parse nids in slurmconfgen_smw.py · 88ccc111
      David Gloe authored
      An error in slurmconfgen_smw.py caused it to parse the nic as the nid.
      On some systems those values differ, causing the generated slurm.conf file to
      be incorrect.
      
      Bug 2532.
      88ccc111
    • Morris Jette's avatar
      sbcast default buffer size set to 8MB · a06452f2
      Morris Jette authored
      This matches the documentation
      a06452f2
  3. 08 Mar, 2016 8 commits
  4. 07 Mar, 2016 4 commits
  5. 05 Mar, 2016 7 commits
  6. 04 Mar, 2016 7 commits