1. 17 Jun, 2014 1 commit
    • Morris Jette's avatar
      Correct CPU ID on Power7 · ebaa4366
      Morris Jette authored
      Correct logic to support Power7 processor with 1 or 2 threads per core
      (CPU IDs are not consecutive).
      bug 891
      ebaa4366
  2. 03 Jun, 2014 1 commit
    • Morris Jette's avatar
      scale jobs mem-per-cpu limit · 0fbdb9e2
      Morris Jette authored
      If a job --mem-per-cpu limit exceeds the partition or system limit, then
      scale the job's memory limit and CPUs per task to satisfy the limit.
      bug 848
      0fbdb9e2
  3. 19 May, 2014 1 commit
    • Morris Jette's avatar
      Properly handle job requeue options · 68a4bfd7
      Morris Jette authored
      Properly enforce job --requeue and --norequeue options. Previous
      logic was in three places not doing so (either ignoring the value,
      ANDing it with the JobRequeue configuration option or using the
      JobRequeue configuration option by itself).
      bug 821
      68a4bfd7
  4. 12 May, 2014 2 commits
  5. 09 May, 2014 1 commit
  6. 08 May, 2014 1 commit
    • Morris Jette's avatar
      Correct sinfo sort fields options · ff518ad1
      Morris Jette authored
      Correct sinfo --sort fields to match documentation: E => Reason,
      H -> Reason Time (new), R -> Partition Name, u/U -> Reason user (new)
      ff518ad1
  7. 06 May, 2014 1 commit
  8. 05 May, 2014 4 commits
  9. 02 May, 2014 2 commits
  10. 30 Apr, 2014 1 commit
    • Morris Jette's avatar
      switch/nrt - CAU and RMDA tracking correction · 6f66fdef
      Morris Jette authored
      Switch/nrt - Properly track usage of CAU and RDMA resources with multiple
      tasks per compute node. Previous logic would allocate resources once per
      task and then deallocate once per node, leaking CMA and RDMA resources
      and preventing their use by future jobs.
      6f66fdef
  11. 18 Apr, 2014 1 commit
    • Morris Jette's avatar
      switch/nrt - free partial allocation · a197a1da
      Morris Jette authored
      On switch resource allocation failure, free partial allocation.
      Failure mode was CAU could be allocated on some nodes, but not
      others. The CAU allocated on nodes and switches up to the failure
      point were never released.
      a197a1da
  12. 08 Apr, 2014 4 commits
  13. 07 Apr, 2014 3 commits
  14. 05 Apr, 2014 1 commit
  15. 04 Apr, 2014 3 commits
  16. 03 Apr, 2014 2 commits
  17. 02 Apr, 2014 1 commit
    • Morris Jette's avatar
      launch/poe - fix network value · ad7100b8
      Morris Jette authored
      if an job step's network value is set by poe, either by directly
      executing poe or srun launching poe, that value was not being
      propagated to the job step creation RPC and the network was not
      being set up for the proper protocol (e.g. mpi, lapi, pami, etc.).
      The previous logic would only work if the srun execute line
      explicitly set the protocol using the --network option.
      ad7100b8
  18. 31 Mar, 2014 1 commit
  19. 26 Mar, 2014 1 commit
  20. 25 Mar, 2014 1 commit
  21. 24 Mar, 2014 1 commit
    • Morris Jette's avatar
      job array dependency recovery fix · fca71890
      Morris Jette authored
      When slurmctld restarted, it would not recover dependencies on
      job array elements and would just discard the depenency. This
      corrects the parsing problem to recover the dependency. The old code
      would print a mesage like this and discard it:
      slurmctld: error: Invalid dependencies discarded for job 51: afterany:47_*
      fca71890
  22. 21 Mar, 2014 1 commit
    • Danny Auble's avatar
      NRT - Fix issue with 1 node jobs. It turns out the network does need to · 440932df
      Danny Auble authored
      be setup for 1 node jobs.  Here are some of the reasons from IBM...
      
      1. PE expects it.
      2. For failover, if there was some challenge or difficulty with the
         shared-memory method of data transfer, the protocol stack might
         want to go through the adapter instead.
      3. For flexibility, the protocol stack might want to be able to transfer
         data using some variable combination of shared memory and adapter-based
         communication, and
      4. Possibly most important, for overall performance, it might be that
         bandwidth or efficiency (BW per CPU cycles) might be better using the
         adapter resources.  (An obvious case is for large messages, it might
         require a lot fewer CPU cycles to program the DMA engines on the
         adapter to move data between tasks, rather than depend on the CPU
         to move the data with loads and stores, or page re-mapping -- and
         a DMA engine might actually move the data more quickly, if it's well
         integrated with the memory system, as it is in the P775 case.)
      440932df
  23. 20 Mar, 2014 2 commits
  24. 19 Mar, 2014 2 commits
  25. 18 Mar, 2014 1 commit