1. 09 Apr, 2014 2 commits
    • Morris Jette's avatar
      defer job scheduling in some cases · 2413c339
      Morris Jette authored
      Rather than immediately invoking an execution of the scheduling logic on
      every event type that can enable the execution of a new job, queue its
      execution. This permits faster execution of some operations, such as
      modifying large counts of jobs, by executing the scheduling logic less
      frequently, but still in a timely fashion.
      2413c339
    • Danny Auble's avatar
      Fix issue with sinfo when -o is used without the %P option. · 39d94ae9
      Danny Auble authored
      If you have multiple partitions the output from
      
      sinfo -o "%D %F"
      
      would have unexpected results, hardly ever correct.
      39d94ae9
  2. 08 Apr, 2014 5 commits
  3. 07 Apr, 2014 4 commits
  4. 05 Apr, 2014 1 commit
  5. 04 Apr, 2014 3 commits
  6. 03 Apr, 2014 5 commits
    • Danny Auble's avatar
      Fix issue where associations weren't correct if backup takes control and · 9368ff2d
      Danny Auble authored
      new associations were added since it was started.
      9368ff2d
    • Morris Jette's avatar
      Permit root to raise hard limits · 9293cbf2
      Morris Jette authored
      Permit user root to propagate resource limits higher than the hard limit
      slurmd has on that compute node has (i.e. raise both current and maximum
      limits).
      bug 674674674674674674
      9293cbf2
    • Morris Jette's avatar
      Defer scheduling for many batch jobs · 57fa06bb
      Morris Jette authored
      Permit multiple batch job submissions to be made for each run of the
      scheduler logic if the job submissions occur at the nearly same time.
      bug 616
      57fa06bb
    • Morris Jette's avatar
      launch/poe - fix network value · 01fecf4d
      Morris Jette authored
      if an job step's network value is set by poe, either by directly
      executing poe or srun launching poe, that value was not being
      propagated to the job step creation RPC and the network was not
      being set up for the proper protocol (e.g. mpi, lapi, pami, etc.).
      The previous logic would only work if the srun execute line
      explicitly set the protocol using the --network option.
      01fecf4d
    • Morris Jette's avatar
      Defer scheduling for many batch jobs · dd4aa1c3
      Morris Jette authored
      Permit multiple batch job submissions to be made for each run of the
      scheduler logic if the job submissions occur at the nearly same time.
      bug 616
      dd4aa1c3
  7. 02 Apr, 2014 2 commits
    • David Bigagli's avatar
      Update NEWS and squeue man page. · 247c3ce0
      David Bigagli authored
      247c3ce0
    • Morris Jette's avatar
      launch/poe - fix network value · ad7100b8
      Morris Jette authored
      if an job step's network value is set by poe, either by directly
      executing poe or srun launching poe, that value was not being
      propagated to the job step creation RPC and the network was not
      being set up for the proper protocol (e.g. mpi, lapi, pami, etc.).
      The previous logic would only work if the srun execute line
      explicitly set the protocol using the --network option.
      ad7100b8
  8. 31 Mar, 2014 2 commits
  9. 28 Mar, 2014 3 commits
  10. 27 Mar, 2014 3 commits
  11. 26 Mar, 2014 2 commits
  12. 25 Mar, 2014 2 commits
  13. 24 Mar, 2014 4 commits
    • Danny Auble's avatar
      Added sacctmgr mod qos set RawUsage=0 · f7fb80ec
      Danny Auble authored
      f7fb80ec
    • Danny Auble's avatar
    • Morris Jette's avatar
      Add job array hash table · ac7fabc6
      Morris Jette authored
      Previous logic would typically do list search to find job array elements.
      This commit adds two hash tables for job arrays. The first is based upon
      the "base" job ID which is common to all tasks. The second hash table
      is based upon the sum of the "base" job ID plus the task ID in the array.
      This will substantially improve performance for handling dependencies
      with job arrays.
      ac7fabc6
    • Morris Jette's avatar
      job array dependency recovery fix · fca71890
      Morris Jette authored
      When slurmctld restarted, it would not recover dependencies on
      job array elements and would just discard the depenency. This
      corrects the parsing problem to recover the dependency. The old code
      would print a mesage like this and discard it:
      slurmctld: error: Invalid dependencies discarded for job 51: afterany:47_*
      fca71890
  14. 22 Mar, 2014 1 commit
    • Morris Jette's avatar
      Fix sview abort when adding/removing columns · fbfd0e4d
      Morris Jette authored
      When adding or removing columns to most data types (jobs, partitions,
      nodes, etc.) on some system types an abort is generated. This appears
      to be because when columns displayed change, on some systems that
      changes the address of "model", while on others the address does not
      change (like my laptops). This fix explicitly sets the last_model to
      NULL when the columns are changed rather than relying upon the data
      structure's address to change.
      fbfd0e4d
  15. 21 Mar, 2014 1 commit
    • Danny Auble's avatar
      NRT - Fix issue with 1 node jobs. It turns out the network does need to · 440932df
      Danny Auble authored
      be setup for 1 node jobs.  Here are some of the reasons from IBM...
      
      1. PE expects it.
      2. For failover, if there was some challenge or difficulty with the
         shared-memory method of data transfer, the protocol stack might
         want to go through the adapter instead.
      3. For flexibility, the protocol stack might want to be able to transfer
         data using some variable combination of shared memory and adapter-based
         communication, and
      4. Possibly most important, for overall performance, it might be that
         bandwidth or efficiency (BW per CPU cycles) might be better using the
         adapter resources.  (An obvious case is for large messages, it might
         require a lot fewer CPU cycles to program the DMA engines on the
         adapter to move data between tasks, rather than depend on the CPU
         to move the data with loads and stores, or page re-mapping -- and
         a DMA engine might actually move the data more quickly, if it's well
         integrated with the memory system, as it is in the P775 case.)
      440932df