1. 16 Aug, 2013 2 commits
  2. 15 Aug, 2013 11 commits
  3. 14 Aug, 2013 16 commits
  4. 13 Aug, 2013 11 commits
    • Morris Jette's avatar
      3ae38af8
    • Morris Jette's avatar
      Update reservation web page for new v2.6 features · e6e03b45
      Morris Jette authored
      core reservations and reservation prolog/epilog
      e6e03b45
    • John Thiltges's avatar
      8ab9d720
    • Morris Jette's avatar
      1c9c3be2
    • Morris Jette's avatar
      a13deb8a
    • Morris Jette's avatar
      Add contributor to team web page · 56be32a3
      Morris Jette authored
      56be32a3
    • Michael Gutteridge's avatar
      sched/wiki2 - Correct CPU load reported to Moab · 18ac1981
      Michael Gutteridge authored
      I'm running Slurm 2.6.0 and MWM 7.2.4 in our test cluster at the moment.  I happened to notice that node load reporting wasn't consistent- periodically you'd see a "sane" load reported in Moab, but most of the time the reported load was zero despite an accurate CPULoad value reported by "scontrol show node".
      
      Finally got to digging into this.  It appears that the only time load was being reported properly was in the Moab scheduling cycle directly after slurmctld did a node ping.  In subsequent scheduling cycles the load (again, as reported by Moab) was back to zero.
      
      The node ping is significant as that is the only time the node is updated- since the wiki2 interface only reports records that change, and the load record isn't changed, it isn't reported in the queries after the node ping.
      
      Judging from this behavior, I'm guessing that Moab does not store the load value- every time it queries resources in Slurm it sets the node's load back to zero.
      
      I've altered src/plugins/sched/wiki2/get_nodes.c slightly- basically moved the section that reports CPULOAD above the check for updated info (update_time > last_node_update).
      
      So I don't know if this is the appropriate way to fix it.  The wiki specification that Adaptive has published doesn't seem to indicate how this should function.  Either MWM should assume the last value reported is still accurate or Slurm needs to report it for every wiki GETNODES command.
      
      Anyway, the patch is attached, it seems to be working for me, and I've rolled it into our debian build directory.  YMMV.
      
      Michael
      18ac1981
    • jette's avatar
      select/cons_res - Add test for zero node allocation · e180d341
      jette authored
      I don't see how this could happen, but it might explain something
      reported by Harvard University. In any case, this could prevent
      an infinite loop if the task distribution funciton is passed a
      job allocation with zero nodes.
      e180d341
    • Morris Jette's avatar
      Merge branch 'slurm-2.5' into slurm-2.6 · 5f3d85ce
      Morris Jette authored
      5f3d85ce
    • jette's avatar
      select/cons_res - Avoid extraneous "oversubscribe" error messages · 302d8b3f
      jette authored
      This problem was reported by Harvard University and could be
      reproduced with a command line of "srun -N1 --tasks-per-node=2 -O id".
      With other job types, the error message could be logged many times
      for each job. This change logs the error once per job and only if
      the job request does not include the -O/--overcommit option.
      302d8b3f
    • Danny Auble's avatar
      Take older conversion out of the mysql code. This was needed when we · 3503950f
      Danny Auble authored
      went from the old single table per enterprise style to that of separate
      tables per clusters.  (2.0 -> 2.*).  If people are still running <2.2 they
      really need to upgrade (long before this), and can get the translations
      by upgrading to >=2.1.0 before they install this version.
      3503950f