1. 10 Aug, 2015 3 commits
    • Morris Jette's avatar
      slurm.spec fix for layouts/power plugin · 4f859a38
      Morris Jette authored
      Fix slurm.spec file for recent layouts change
      Refactor bb_pack_bufs() and bb_alloc_job_rec() functions
      4f859a38
    • Morris Jette's avatar
      burst_buffer/cray work · a782d5ce
      Morris Jette authored
      Add partition information to buffer record for limit enforcement
      a782d5ce
    • jette's avatar
      burst_buffer/cray work · 36fe8992
      jette authored
      Add account and QOS to burst buffer data structures for TRES limit work
      sview: Correct size units
      sview: Fix free of uninitialized iterator logic
      burst_buffer.conf: Fix parsing of DefaultPool parameter
      36fe8992
  2. 09 Aug, 2015 2 commits
  3. 08 Aug, 2015 2 commits
    • jette's avatar
      Add partition name to job startup log · 2dee3bb7
      jette authored
      2dee3bb7
    • Morris Jette's avatar
      burst_buffer/cray work · 23ea85a7
      Morris Jette authored
      Move limits checks to burst_buffer/common
      Disable persistent burst buffer with numeric value
      Refactor bb_p_job_get_est_start() logic
      Split out persistent buffer create/delete into a pthread
      Fix job buffer size parsing logic
      Get through stage-in and support for non-Cray systems (e.g. VM)
      23ea85a7
  4. 07 Aug, 2015 1 commit
    • Morris Jette's avatar
      burst_buffer/cray work · c20fa424
      Morris Jette authored
      Add new buffer state of "BB_STATE_PENDING"
      Refactor job buffer state structure
      Refactor job buffer creation environment
      Rename "bb_hash" to "bb_ahash"
      Set up job state hash table
      Add magic cookies to data structures
      Rename bb_find_job_rec() to  bb_find_alloc_rec()
      Rename bb_free_rec() to bb_free_alloc_buf()
      c20fa424
  5. 06 Aug, 2015 4 commits
  6. 05 Aug, 2015 2 commits
  7. 04 Aug, 2015 4 commits
  8. 03 Aug, 2015 5 commits
  9. 31 Jul, 2015 5 commits
  10. 30 Jul, 2015 4 commits
  11. 29 Jul, 2015 1 commit
  12. 28 Jul, 2015 3 commits
  13. 27 Jul, 2015 4 commits
    • Morris Jette's avatar
      Merge branch 'slurm-14.11' · 4fcdc65c
      Morris Jette authored
      4fcdc65c
    • Brian Christiansen's avatar
      Optmized queries with additional db indexes. · f1ed6616
      Brian Christiansen authored
      Bug 1819
      
      Composite indexes search left to right. E.g. an index of (inx1, inx2, inx3) will search from left to right. inx2 can't be used in a where statement by itself, it requires inx1 to be present (inx3 can be optional). For the rollup index having time_end first speeds up the below query. The actual rollup queries still benefit from the original rollup index.
      
      sacct -S 07/22-09:41:36 -E 07/22-09:42:37 -i 1-4 -ojobid,start,end,nnodes,nodelist -n -a:
      mysql> explain select t1.account, t1.array_max_tasks, t1.array_task_str, t1.cpus_alloc, t1.cpus_req, t1.derived_ec, t1.derived_es, t1.exit_code, t1.id_array_job, t1.id_array_task, t1.id_assoc, t1.id_block, t1.id_group, t1.id_job, t1.id_qos, t1.id_resv, t3.resv_name, t1.id_user, t1.id_wckey, t1.job_db_inx, t1.job_name, t1.kill_requid, t1.mem_req, t1.node_inx, t1.nodelist, t1.nodes_alloc, t1.partition, t1.priority, t1.state, t1.time_eligible, t1.time_end, t1.time_start, t1.time_submit, t1.time_suspended, t1.timelimit, t1.track_steps, t1.wckey, t1.gres_alloc, t1.gres_req, t1.gres_used, t2.acct, t2.lft, t2.user from compy_job_table as t1 left join compy_assoc_table as t2 on t1.id_assoc=t2.id_assoc left join compy_resv_table as t3  on t1.id_resv=t3.id_resv  where ((t1.nodes_alloc between 1 and 4)) && ((t1.time_eligible < 1437550957 && (t1.time_end >= 1437550896 || t1.time_end = 0))) group by id_job, time_submit desc;
      +----+-------------+-------+--------+---------------+---------+---------+------------------------+--------+----------------------------------------------+
      | id | select_type | table | type   | possible_keys | key     | key_len | ref                    | rows   | Extra                                        |
      +----+-------------+-------+--------+---------------+---------+---------+------------------------+--------+----------------------------------------------+
      |  1 | SIMPLE      | t1    | ALL    | id_job,rollup | NULL    | NULL    | NULL                   | 120953 | Using where; Using temporary; Using filesort |
      |  1 | SIMPLE      | t2    | eq_ref | PRIMARY       | PRIMARY | 4       | slurm_1412.t1.id_assoc |      1 | Using where                                  |
      |  1 | SIMPLE      | t3    | ref    | PRIMARY       | PRIMARY | 4       | slurm_1412.t1.id_resv  |      1 | NULL                                         |
      +----+-------------+-------+--------+---------------+---------+---------+------------------------+--------+----------------------------------------------+
      3 rows in set (0.00 sec)
      
      mysql> explain select t1.account, t1.array_max_tasks, t1.array_task_str, t1.cpus_alloc, t1.cpus_req, t1.derived_ec, t1.derived_es, t1.exit_code, t1.id_array_job, t1.id_array_task, t1.id_assoc, t1.id_block, t1.id_group, t1.id_job, t1.id_qos, t1.id_resv, t3.resv_name, t1.id_user, t1.id_wckey, t1.job_db_inx, t1.job_name, t1.kill_requid, t1.mem_req, t1.node_inx, t1.nodelist, t1.nodes_alloc, t1.partition, t1.priority, t1.state, t1.time_eligible, t1.time_end, t1.time_start, t1.time_submit, t1.time_suspended, t1.timelimit, t1.track_steps, t1.wckey, t1.gres_alloc, t1.gres_req, t1.gres_used, t2.acct, t2.lft, t2.user from compy_job_table as t1 left join compy_assoc_table as t2 on t1.id_assoc=t2.id_assoc left join compy_resv_table as t3  on t1.id_resv=t3.id_resv  where ((t1.nodes_alloc between 1 and 4)) && ((t1.time_eligible < 1437550957 && (t1.time_end >= 1437550896 || t1.time_end = 0))) group by id_job, time_submit desc;
      +----+-------------+-------+--------+-----------------------+---------+---------+------------------------+------+---------------------------------------------------------------------+
      | id | select_type | table | type   | possible_keys         | key     | key_len | ref                    | rows | Extra                                                               |
      +----+-------------+-------+--------+-----------------------+---------+---------+------------------------+------+---------------------------------------------------------------------+
      |  1 | SIMPLE      | t1    | range  | id_job,rollup,rollup2 | rollup2 | 8       | NULL                   |    6 | Using index condition; Using where; Using temporary; Using filesort |
      |  1 | SIMPLE      | t2    | eq_ref | PRIMARY               | PRIMARY | 4       | slurm_1412.t1.id_assoc |    1 | Using where                                                         |
      |  1 | SIMPLE      | t3    | ref    | PRIMARY               | PRIMARY | 4       | slurm_1412.t1.id_resv  |    1 | NULL                                                                |
      +----+-------------+-------+--------+-----------------------+---------+---------+------------------------+------+---------------------------------------------------------------------+
      3 rows in set (0.00 sec)
      
      rollup:
      mysql> explain select job.job_db_inx, job.id_job, job.id_assoc, job.id_wckey, job.array_task_pending, job.time_eligible, job.time_start, job.time_end, job.time_suspended, job.cpus_alloc, job.cpus_req, job.id_resv, SUM(step.consumed_energy) from compy_job_table as job left outer join compy_step_table as step on job.job_db_inx=step.job_db_inx and (step.id_step>=0) where (job.time_eligible < 1420102800 && (job.time_end >= 1420099200 || job.time_end = 0)) group by job.job_db_inx order by job.id_assoc, job.time_eligible;
      +----+-------------+-------+-------+--------------------------------------------------------------------------------+---------+---------+---------------------------+------+--------------------------------------------------------+
      | id | select_type | table | type  | possible_keys                                                                  | key     | key_len | ref                       | rows | Extra                                                  |
      +----+-------------+-------+-------+--------------------------------------------------------------------------------+---------+---------+---------------------------+------+--------------------------------------------------------+
      |  1 | SIMPLE      | job   | range | PRIMARY,id_job,rollup,rollup2,wckey,qos,association,array_job,reserv,sacct_def | rollup  | 4       | NULL                      |    1 | Using index condition; Using temporary; Using filesort |
      |  1 | SIMPLE      | step  | ref   | PRIMARY                                                                        | PRIMARY | 4       | slurm_1412.job.job_db_inx |    1 | Using where                                            |
      +----+-------------+-------+-------+--------------------------------------------------------------------------------+---------+---------+---------------------------+------+--------------------------------------------------------+
      2 rows in set (0.01 sec)
      
      A plain sacct is sped up by moving time_end into the middle of the index (ex. id_user, time_end, time_eligible). sacct_def is for sacct calls with a state specified, sacct_def2 is for a plain sacct call.
      
      plain sacct:
      mysql> explain select t1.account, t1.array_max_tasks, t1.array_task_str, t1.cpus_alloc, t1.cpus_req, t1.derived_ec, t1.derived_es, t1.exit_code, t1.id_array_job, t1.id_array_task, t1.id_assoc, t1.id_block, t1.id_group, t1.id_job, t1.id_qos, t1.id_resv, t3.resv_name, t1.id_user, t1.id_wckey, t1.job_db_inx, t1.job_name, t1.kill_requid, t1.mem_req, t1.node_inx, t1.nodelist, t1.nodes_alloc, t1.partition, t1.priority, t1.state, t1.time_eligible, t1.time_end, t1.time_start, t1.time_submit, t1.time_suspended, t1.timelimit, t1.track_steps, t1.wckey, t1.gres_alloc, t1.gres_req, t1.gres_used, t2.acct, t2.lft, t2.user from compy_job_table as t1 left join compy_assoc_table as t2 on t1.id_assoc=t2.id_assoc left join compy_resv_table as t3  on t1.id_resv=t3.id_resv  where (t1.id_user='1003') && ((t1.time_end >= 1437548400 || t1.time_end = 0)) group by id_job, time_submit desc;
      +----+-------------+-------+--------+------------------+-----------+---------+------------------------+-------+---------------------------------------------------------------------+
      | id | select_type | table | type   | possible_keys    | key       | key_len | ref                    | rows  | Extra                                                               |
      +----+-------------+-------+--------+------------------+-----------+---------+------------------------+-------+---------------------------------------------------------------------+
      |  1 | SIMPLE      | t1    | ref    | id_job,sacct_def | sacct_def | 4       | const                  | 60476 | Using index condition; Using where; Using temporary; Using filesort |
      |  1 | SIMPLE      | t2    | eq_ref | PRIMARY          | PRIMARY   | 4       | slurm_1412.t1.id_assoc |     1 | Using where                                                         |
      |  1 | SIMPLE      | t3    | ref    | PRIMARY          | PRIMARY   | 4       | slurm_1412.t1.id_resv  |     1 | NULL                                                                |
      +----+-------------+-------+--------+------------------+-----------+---------+------------------------+-------+---------------------------------------------------------------------+
      3 rows in set (0.00 sec)o
      
      mysql> explain select t1.account, t1.array_max_tasks, t1.array_task_str, t1.cpus_alloc, t1.cpus_req, t1.derived_ec, t1.derived_es, t1.exit_code, t1.id_array_job, t1.id_array_task, t1.id_assoc, t1.id_block, t1.id_group, t1.id_job, t1.id_qos, t1.id_resv, t3.resv_name, t1.id_user, t1.id_wckey, t1.job_db_inx, t1.job_name, t1.kill_requid, t1.mem_req, t1.node_inx, t1.nodelist, t1.nodes_alloc, t1.partition, t1.priority, t1.state, t1.time_eligible, t1.time_end, t1.time_start, t1.time_submit, t1.time_suspended, t1.timelimit, t1.track_steps, t1.wckey, t1.gres_alloc, t1.gres_req, t1.gres_used, t2.acct, t2.lft, t2.user from compy_job_table as t1 left join compy_assoc_table as t2 on t1.id_assoc=t2.id_assoc left join compy_resv_table as t3  on t1.id_resv=t3.id_resv  where (t1.id_user='1003') && ((t1.time_end >= 1437548400 || t1.time_end = 0)) group by id_job, time_submit desc;
      +----+-------------+-------+--------+-------------------------------------+------------+---------+------------------------+------+--------------------------------------------------------+
      | id | select_type | table | type   | possible_keys                       | key        | key_len | ref                    | rows | Extra                                                  |
      +----+-------------+-------+--------+-------------------------------------+------------+---------+------------------------+------+--------------------------------------------------------+
      |  1 | SIMPLE      | t1    | range  | id_job,rollup2,sacct_def,sacct_def2 | sacct_def2 | 8       | NULL                   |   68 | Using index condition; Using temporary; Using filesort |
      |  1 | SIMPLE      | t2    | eq_ref | PRIMARY                             | PRIMARY    | 4       | slurm_1412.t1.id_assoc |    1 | Using where                                            |
      |  1 | SIMPLE      | t3    | ref    | PRIMARY                             | PRIMARY    | 4       | slurm_1412.t1.id_resv  |    1 | NULL                                                   |
      +----+-------------+-------+--------+-------------------------------------+------------+---------+------------------------+------+--------------------------------------------------------+
      3 rows in set (0.00 sec)
      
      Adding the sacct_def2 index order didn't affect other queries:
      
      sacct -s CA,CD,F,R:
      mysql> explain select t1.account, t1.array_max_tasks, t1.array_task_str, t1.cpus_alloc, t1.cpus_req, t1.derived_ec, t1.derived_es, t1.exit_code, t1.id_array_job, t1.id_array_task, t1.id_assoc, t1.id_block, t1.id_group, t1.id_job, t1.id_qos, t1.id_resv, t3.resv_name, t1.id_user, t1.id_wckey, t1.job_db_inx, t1.job_name, t1.kill_requid, t1.mem_req, t1.node_inx, t1.nodelist, t1.nodes_alloc, t1.partition, t1.priority, t1.state, t1.time_eligible, t1.time_end, t1.time_start, t1.time_submit, t1.time_suspended, t1.timelimit, t1.track_steps, t1.wckey, t1.gres_alloc, t1.gres_req, t1.gres_used, t2.acct, t2.lft, t2.user from compy_job_table as t1 left join compy_assoc_table as t2 on t1.id_assoc=t2.id_assoc left join compy_resv_table as t3  on t1.id_resv=t3.id_resv  where (t1.id_user='1003') && ((t1.state='4' && (t1.time_end && (t1.time_end >= 1438028802))) || (t1.state='3' && (t1.time_end && (t1.time_end >= 1438028802))) || (t1.state='5' && (t1.time_end && (t1.time_end >= 1438028802))) || (t1.time_start && ((!t1.time_end && t1.state=1) || (1438028802 between t1.time_start and t1.time_end)))) group by id_job, time_submit desc;
      +----+-------------+-------+--------+-------------------------------------+-----------+---------+------------------------+-------+---------------------------------------------------------------------+
      | id | select_type | table | type   | possible_keys                       | key       | key_len | ref                    | rows  | Extra                                                               |
      +----+-------------+-------+--------+-------------------------------------+-----------+---------+------------------------+-------+---------------------------------------------------------------------+
      |  1 | SIMPLE      | t1    | ref    | id_job,rollup2,sacct_def,sacct_def2 | sacct_def | 4       | const                  | 60513 | Using index condition; Using where; Using temporary; Using filesort |
      |  1 | SIMPLE      | t2    | eq_ref | PRIMARY                             | PRIMARY   | 4       | slurm_1412.t1.id_assoc |     1 | Using where                                                         |
      |  1 | SIMPLE      | t3    | ref    | PRIMARY                             | PRIMARY   | 4       | slurm_1412.t1.id_resv  |     1 | NULL                                                                |
      +----+-------------+-------+--------+-------------------------------------+-----------+---------+------------------------+-------+---------------------------------------------------------------------+
      3 rows in set (0.00 sec)
      
      Adding nodes_alloc index speeds up quries with queries like: sacct -i 2-10000
      
      mysql> EXPLAIN SELECT t1.account, t1.array_max_tasks, t1.array_task_str, t1.cpus_req, t1.derived_ec, t1.derived_es, t1.exit_code, t1.id_array_job, t1.id_array_task, t1.id_assoc, t1.id_block, t1.id_group, t1.id_job, t1.id_qos, t1.id_resv, t3.resv_name, t1.id_user, t1.id_wckey, t1.job_db_inx, t1.job_name, t1.kill_requid, t1.mem_req, t1.node_inx, t1.nodelist, t1.nodes_alloc, t1.partition, t1.priority, t1.state, t1.time_eligible, t1.time_end, t1.time_start, t1.time_submit, t1.time_suspended, t1.timelimit, t1.track_steps, t1.wckey, t1.gres_alloc, t1.gres_req, t1.gres_used, t2.acct, t2.lft, t2.user FROM compy_job_table AS t1 LEFT JOIN compy_assoc_table AS t2 ON t1.id_assoc = t2.id_assoc LEFT JOIN compy_resv_table AS t3 ON t1.id_resv = t3.id_resv WHERE ((t1.nodes_alloc between 2 and 10000)) && ((t1.time_start && ((1434384740 BETWEEN t1.time_start AND t1.time_end) || (t1.time_start BETWEEN 1434384740 AND 1434384741)))) GROUP BY id_job , time_submit DESC;
      +----+-------------+-------+--------+----------------+---------+---------+-------------------------+--------+----------------------------------------------+
      | id | select_type | table | type   | possible_keys  | key     | key_len | ref                     | rows   | Extra                                        |
      +----+-------------+-------+--------+----------------+---------+---------+-------------------------+--------+----------------------------------------------+
      |  1 | SIMPLE      | t1    | ALL    | id_job,rollup2 | NULL    | NULL    | NULL                    | 117549 | Using where; Using temporary; Using filesort |
      |  1 | SIMPLE      | t2    | eq_ref | PRIMARY        | PRIMARY | 4       | 1411_master.t1.id_assoc |      1 | NULL                                         |
      |  1 | SIMPLE      | t3    | ref    | PRIMARY        | PRIMARY | 4       | 1411_master.t1.id_resv  |      1 | NULL                                         |
      +----+-------------+-------+--------+----------------+---------+---------+-------------------------+--------+----------------------------------------------+
      3 rows in set (0.00 sec)
      
      mysql> EXPLAIN SELECT t1.account, t1.array_max_tasks, t1.array_task_str, t1.cpus_req, t1.derived_ec, t1.derived_es, t1.exit_code, t1.id_array_job, t1.id_array_task, t1.id_assoc, t1.id_block, t1.id_group, t1.id_job, t1.id_qos, t1.id_resv, t3.resv_name, t1.id_user, t1.id_wckey, t1.job_db_inx, t1.job_name, t1.kill_requid, t1.mem_req, t1.node_inx, t1.nodelist, t1.nodes_alloc, t1.partition, t1.priority, t1.state, t1.time_eligible, t1.time_end, t1.time_start, t1.time_submit, t1.time_suspended, t1.timelimit, t1.track_steps, t1.wckey, t1.gres_alloc, t1.gres_req, t1.gres_used, t2.acct, t2.lft, t2.user FROM compy_job_table AS t1 LEFT JOIN compy_assoc_table AS t2 ON t1.id_assoc = t2.id_assoc LEFT JOIN compy_resv_table AS t3 ON t1.id_resv = t3.id_resv WHERE ((t1.nodes_alloc between 2 and 10000)) && ((t1.time_start && ((1434384740 BETWEEN t1.time_start AND t1.time_end) || (t1.time_start BETWEEN 1434384740 AND 1434384741)))) GROUP BY id_job , time_submit DESC;
      +----+-------------+-------+--------+----------------------------+-------------+---------+-------------------------+------+---------------------------------------------------------------------+
      | id | select_type | table | type   | possible_keys              | key         | key_len | ref                     | rows | Extra                                                               |
      +----+-------------+-------+--------+----------------------------+-------------+---------+-------------------------+------+---------------------------------------------------------------------+
      |  1 | SIMPLE      | t1    | range  | id_job,rollup2,nodes_alloc | nodes_alloc | 4       | NULL                    |  720 | Using index condition; Using where; Using temporary; Using filesort |
      |  1 | SIMPLE      | t2    | eq_ref | PRIMARY                    | PRIMARY     | 4       | 1411_master.t1.id_assoc |    1 | NULL                                                                |
      |  1 | SIMPLE      | t3    | ref    | PRIMARY                    | PRIMARY     | 4       | 1411_master.t1.id_resv  |    1 | NULL                                                                |
      +----+-------------+-------+--------+----------------------------+-------------+---------+-------------------------+------+---------------------------------------------------------------------+
      3 rows in set (0.00 sec)
      f1ed6616
    • Morris Jette's avatar
      Move logic to optimize performance · 23f6ec89
      Morris Jette authored
      No change in functionality
      23f6ec89
    • Morris Jette's avatar
      Fix bug in node selection with topology optimization · 9dad2ff7
      Morris Jette authored
      If node definitions in slurm.conf are spread across multiple lines
      and topology/tree is configured, then sub-optimal node selection can
      occur.
      bug 1645
      9dad2ff7