Commits · 1d2b34a064b202542accfd11fd79469ac640773a · Manuel G. Marciani / ces_slurm_simulator

28 Jul, 2015 3 commits
- burst_buffer/cray work · 1d2b34a0
  Morris Jette authored Jul 28, 2015
```
Add logic to purge persistent burst buffers.
Create dummy job script as needed for buffer teardown
Multiple persistent buffer create/destroy calls in single job working
```
  1d2b34a0
- Add SLURM_TOPO_LEN env variable for scontrol show topology · cd12b7f5
  Thomas Cadeau authored Jul 28, 2015
  
  cd12b7f5
- When printing topology info make the lenght configurable. · 0c2fe5ef
  Thomas Cadeau authored Jul 28, 2015
  
  0c2fe5ef
27 Jul, 2015 11 commits

Merge branch 'slurm-14.11' · 4fcdc65c
Morris Jette authored Jul 27, 2015

4fcdc65c

Optmized queries with additional db indexes. · f1ed6616

Brian Christiansen authored Jul 27, 2015

Bug 1819

Composite indexes search left to right. E.g. an index of (inx1, inx2, inx3) will search from left to right. inx2 can't be used in a where statement by itself, it requires inx1 to be present (inx3 can be optional). For the rollup index having time_end first speeds up the below query. The actual rollup queries still benefit from the original rollup index.

sacct -S 07/22-09:41:36 -E 07/22-09:42:37 -i 1-4 -ojobid,start,end,nnodes,nodelist -n -a:
mysql> explain select t1.account, t1.array_max_tasks, t1.array_task_str, t1.cpus_alloc, t1.cpus_req, t1.derived_ec, t1.derived_es, t1.exit_code, t1.id_array_job, t1.id_array_task, t1.id_assoc, t1.id_block, t1.id_group, t1.id_job, t1.id_qos, t1.id_resv, t3.resv_name, t1.id_user, t1.id_wckey, t1.job_db_inx, t1.job_name, t1.kill_requid, t1.mem_req, t1.node_inx, t1.nodelist, t1.nodes_alloc, t1.partition, t1.priority, t1.state, t1.time_eligible, t1.time_end, t1.time_start, t1.time_submit, t1.time_suspended, t1.timelimit, t1.track_steps, t1.wckey, t1.gres_alloc, t1.gres_req, t1.gres_used, t2.acct, t2.lft, t2.user from compy_job_table as t1 left join compy_assoc_table as t2 on t1.id_assoc=t2.id_assoc left join compy_resv_table as t3  on t1.id_resv=t3.id_resv  where ((t1.nodes_alloc between 1 and 4)) && ((t1.time_eligible < 1437550957 && (t1.time_end >= 1437550896 || t1.time_end = 0))) group by id_job, time_submit desc;
+----+-------------+-------+--------+---------------+---------+---------+------------------------+--------+----------------------------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                    | rows   | Extra                                        |
+----+-------------+-------+--------+---------------+---------+---------+------------------------+--------+----------------------------------------------+
|  1 | SIMPLE      | t1    | ALL    | id_job,rollup | NULL    | NULL    | NULL                   | 120953 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | t2    | eq_ref | PRIMARY       | PRIMARY | 4       | slurm_1412.t1.id_assoc |      1 | Using where                                  |
|  1 | SIMPLE      | t3    | ref    | PRIMARY       | PRIMARY | 4       | slurm_1412.t1.id_resv  |      1 | NULL                                         |
+----+-------------+-------+--------+---------------+---------+---------+------------------------+--------+----------------------------------------------+
3 rows in set (0.00 sec)

mysql> explain select t1.account, t1.array_max_tasks, t1.array_task_str, t1.cpus_alloc, t1.cpus_req, t1.derived_ec, t1.derived_es, t1.exit_code, t1.id_array_job, t1.id_array_task, t1.id_assoc, t1.id_block, t1.id_group, t1.id_job, t1.id_qos, t1.id_resv, t3.resv_name, t1.id_user, t1.id_wckey, t1.job_db_inx, t1.job_name, t1.kill_requid, t1.mem_req, t1.node_inx, t1.nodelist, t1.nodes_alloc, t1.partition, t1.priority, t1.state, t1.time_eligible, t1.time_end, t1.time_start, t1.time_submit, t1.time_suspended, t1.timelimit, t1.track_steps, t1.wckey, t1.gres_alloc, t1.gres_req, t1.gres_used, t2.acct, t2.lft, t2.user from compy_job_table as t1 left join compy_assoc_table as t2 on t1.id_assoc=t2.id_assoc left join compy_resv_table as t3  on t1.id_resv=t3.id_resv  where ((t1.nodes_alloc between 1 and 4)) && ((t1.time_eligible < 1437550957 && (t1.time_end >= 1437550896 || t1.time_end = 0))) group by id_job, time_submit desc;
+----+-------------+-------+--------+-----------------------+---------+---------+------------------------+------+---------------------------------------------------------------------+
| id | select_type | table | type   | possible_keys         | key     | key_len | ref                    | rows | Extra                                                               |
+----+-------------+-------+--------+-----------------------+---------+---------+------------------------+------+---------------------------------------------------------------------+
|  1 | SIMPLE      | t1    | range  | id_job,rollup,rollup2 | rollup2 | 8       | NULL                   |    6 | Using index condition; Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | t2    | eq_ref | PRIMARY               | PRIMARY | 4       | slurm_1412.t1.id_assoc |    1 | Using where                                                         |
|  1 | SIMPLE      | t3    | ref    | PRIMARY               | PRIMARY | 4       | slurm_1412.t1.id_resv  |    1 | NULL                                                                |
+----+-------------+-------+--------+-----------------------+---------+---------+------------------------+------+---------------------------------------------------------------------+
3 rows in set (0.00 sec)

rollup:
mysql> explain select job.job_db_inx, job.id_job, job.id_assoc, job.id_wckey, job.array_task_pending, job.time_eligible, job.time_start, job.time_end, job.time_suspended, job.cpus_alloc, job.cpus_req, job.id_resv, SUM(step.consumed_energy) from compy_job_table as job left outer join compy_step_table as step on job.job_db_inx=step.job_db_inx and (step.id_step>=0) where (job.time_eligible < 1420102800 && (job.time_end >= 1420099200 || job.time_end = 0)) group by job.job_db_inx order by job.id_assoc, job.time_eligible;
+----+-------------+-------+-------+--------------------------------------------------------------------------------+---------+---------+---------------------------+------+--------------------------------------------------------+
| id | select_type | table | type  | possible_keys                                                                  | key     | key_len | ref                       | rows | Extra                                                  |
+----+-------------+-------+-------+--------------------------------------------------------------------------------+---------+---------+---------------------------+------+--------------------------------------------------------+
|  1 | SIMPLE      | job   | range | PRIMARY,id_job,rollup,rollup2,wckey,qos,association,array_job,reserv,sacct_def | rollup  | 4       | NULL                      |    1 | Using index condition; Using temporary; Using filesort |
|  1 | SIMPLE      | step  | ref   | PRIMARY                                                                        | PRIMARY | 4       | slurm_1412.job.job_db_inx |    1 | Using where                                            |
+----+-------------+-------+-------+--------------------------------------------------------------------------------+---------+---------+---------------------------+------+--------------------------------------------------------+
2 rows in set (0.01 sec)

A plain sacct is sped up by moving time_end into the middle of the index (ex. id_user, time_end, time_eligible). sacct_def is for sacct calls with a state specified, sacct_def2 is for a plain sacct call.

plain sacct:
mysql> explain select t1.account, t1.array_max_tasks, t1.array_task_str, t1.cpus_alloc, t1.cpus_req, t1.derived_ec, t1.derived_es, t1.exit_code, t1.id_array_job, t1.id_array_task, t1.id_assoc, t1.id_block, t1.id_group, t1.id_job, t1.id_qos, t1.id_resv, t3.resv_name, t1.id_user, t1.id_wckey, t1.job_db_inx, t1.job_name, t1.kill_requid, t1.mem_req, t1.node_inx, t1.nodelist, t1.nodes_alloc, t1.partition, t1.priority, t1.state, t1.time_eligible, t1.time_end, t1.time_start, t1.time_submit, t1.time_suspended, t1.timelimit, t1.track_steps, t1.wckey, t1.gres_alloc, t1.gres_req, t1.gres_used, t2.acct, t2.lft, t2.user from compy_job_table as t1 left join compy_assoc_table as t2 on t1.id_assoc=t2.id_assoc left join compy_resv_table as t3  on t1.id_resv=t3.id_resv  where (t1.id_user='1003') && ((t1.time_end >= 1437548400 || t1.time_end = 0)) group by id_job, time_submit desc;
+----+-------------+-------+--------+------------------+-----------+---------+------------------------+-------+---------------------------------------------------------------------+
| id | select_type | table | type   | possible_keys    | key       | key_len | ref                    | rows  | Extra                                                               |
+----+-------------+-------+--------+------------------+-----------+---------+------------------------+-------+---------------------------------------------------------------------+
|  1 | SIMPLE      | t1    | ref    | id_job,sacct_def | sacct_def | 4       | const                  | 60476 | Using index condition; Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | t2    | eq_ref | PRIMARY          | PRIMARY   | 4       | slurm_1412.t1.id_assoc |     1 | Using where                                                         |
|  1 | SIMPLE      | t3    | ref    | PRIMARY          | PRIMARY   | 4       | slurm_1412.t1.id_resv  |     1 | NULL                                                                |
+----+-------------+-------+--------+------------------+-----------+---------+------------------------+-------+---------------------------------------------------------------------+
3 rows in set (0.00 sec)o

mysql> explain select t1.account, t1.array_max_tasks, t1.array_task_str, t1.cpus_alloc, t1.cpus_req, t1.derived_ec, t1.derived_es, t1.exit_code, t1.id_array_job, t1.id_array_task, t1.id_assoc, t1.id_block, t1.id_group, t1.id_job, t1.id_qos, t1.id_resv, t3.resv_name, t1.id_user, t1.id_wckey, t1.job_db_inx, t1.job_name, t1.kill_requid, t1.mem_req, t1.node_inx, t1.nodelist, t1.nodes_alloc, t1.partition, t1.priority, t1.state, t1.time_eligible, t1.time_end, t1.time_start, t1.time_submit, t1.time_suspended, t1.timelimit, t1.track_steps, t1.wckey, t1.gres_alloc, t1.gres_req, t1.gres_used, t2.acct, t2.lft, t2.user from compy_job_table as t1 left join compy_assoc_table as t2 on t1.id_assoc=t2.id_assoc left join compy_resv_table as t3  on t1.id_resv=t3.id_resv  where (t1.id_user='1003') && ((t1.time_end >= 1437548400 || t1.time_end = 0)) group by id_job, time_submit desc;
+----+-------------+-------+--------+-------------------------------------+------------+---------+------------------------+------+--------------------------------------------------------+
| id | select_type | table | type   | possible_keys                       | key        | key_len | ref                    | rows | Extra                                                  |
+----+-------------+-------+--------+-------------------------------------+------------+---------+------------------------+------+--------------------------------------------------------+
|  1 | SIMPLE      | t1    | range  | id_job,rollup2,sacct_def,sacct_def2 | sacct_def2 | 8       | NULL                   |   68 | Using index condition; Using temporary; Using filesort |
|  1 | SIMPLE      | t2    | eq_ref | PRIMARY                             | PRIMARY    | 4       | slurm_1412.t1.id_assoc |    1 | Using where                                            |
|  1 | SIMPLE      | t3    | ref    | PRIMARY                             | PRIMARY    | 4       | slurm_1412.t1.id_resv  |    1 | NULL                                                   |
+----+-------------+-------+--------+-------------------------------------+------------+---------+------------------------+------+--------------------------------------------------------+
3 rows in set (0.00 sec)

Adding the sacct_def2 index order didn't affect other queries:

sacct -s CA,CD,F,R:
mysql> explain select t1.account, t1.array_max_tasks, t1.array_task_str, t1.cpus_alloc, t1.cpus_req, t1.derived_ec, t1.derived_es, t1.exit_code, t1.id_array_job, t1.id_array_task, t1.id_assoc, t1.id_block, t1.id_group, t1.id_job, t1.id_qos, t1.id_resv, t3.resv_name, t1.id_user, t1.id_wckey, t1.job_db_inx, t1.job_name, t1.kill_requid, t1.mem_req, t1.node_inx, t1.nodelist, t1.nodes_alloc, t1.partition, t1.priority, t1.state, t1.time_eligible, t1.time_end, t1.time_start, t1.time_submit, t1.time_suspended, t1.timelimit, t1.track_steps, t1.wckey, t1.gres_alloc, t1.gres_req, t1.gres_used, t2.acct, t2.lft, t2.user from compy_job_table as t1 left join compy_assoc_table as t2 on t1.id_assoc=t2.id_assoc left join compy_resv_table as t3  on t1.id_resv=t3.id_resv  where (t1.id_user='1003') && ((t1.state='4' && (t1.time_end && (t1.time_end >= 1438028802))) || (t1.state='3' && (t1.time_end && (t1.time_end >= 1438028802))) || (t1.state='5' && (t1.time_end && (t1.time_end >= 1438028802))) || (t1.time_start && ((!t1.time_end && t1.state=1) || (1438028802 between t1.time_start and t1.time_end)))) group by id_job, time_submit desc;
+----+-------------+-------+--------+-------------------------------------+-----------+---------+------------------------+-------+---------------------------------------------------------------------+
| id | select_type | table | type   | possible_keys                       | key       | key_len | ref                    | rows  | Extra                                                               |
+----+-------------+-------+--------+-------------------------------------+-----------+---------+------------------------+-------+---------------------------------------------------------------------+
|  1 | SIMPLE      | t1    | ref    | id_job,rollup2,sacct_def,sacct_def2 | sacct_def | 4       | const                  | 60513 | Using index condition; Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | t2    | eq_ref | PRIMARY                             | PRIMARY   | 4       | slurm_1412.t1.id_assoc |     1 | Using where                                                         |
|  1 | SIMPLE      | t3    | ref    | PRIMARY                             | PRIMARY   | 4       | slurm_1412.t1.id_resv  |     1 | NULL                                                                |
+----+-------------+-------+--------+-------------------------------------+-----------+---------+------------------------+-------+---------------------------------------------------------------------+
3 rows in set (0.00 sec)

Adding nodes_alloc index speeds up quries with queries like: sacct -i 2-10000

mysql> EXPLAIN SELECT t1.account, t1.array_max_tasks, t1.array_task_str, t1.cpus_req, t1.derived_ec, t1.derived_es, t1.exit_code, t1.id_array_job, t1.id_array_task, t1.id_assoc, t1.id_block, t1.id_group, t1.id_job, t1.id_qos, t1.id_resv, t3.resv_name, t1.id_user, t1.id_wckey, t1.job_db_inx, t1.job_name, t1.kill_requid, t1.mem_req, t1.node_inx, t1.nodelist, t1.nodes_alloc, t1.partition, t1.priority, t1.state, t1.time_eligible, t1.time_end, t1.time_start, t1.time_submit, t1.time_suspended, t1.timelimit, t1.track_steps, t1.wckey, t1.gres_alloc, t1.gres_req, t1.gres_used, t2.acct, t2.lft, t2.user FROM compy_job_table AS t1 LEFT JOIN compy_assoc_table AS t2 ON t1.id_assoc = t2.id_assoc LEFT JOIN compy_resv_table AS t3 ON t1.id_resv = t3.id_resv WHERE ((t1.nodes_alloc between 2 and 10000)) && ((t1.time_start && ((1434384740 BETWEEN t1.time_start AND t1.time_end) || (t1.time_start BETWEEN 1434384740 AND 1434384741)))) GROUP BY id_job , time_submit DESC;
+----+-------------+-------+--------+----------------+---------+---------+-------------------------+--------+----------------------------------------------+
| id | select_type | table | type   | possible_keys  | key     | key_len | ref                     | rows   | Extra                                        |
+----+-------------+-------+--------+----------------+---------+---------+-------------------------+--------+----------------------------------------------+
|  1 | SIMPLE      | t1    | ALL    | id_job,rollup2 | NULL    | NULL    | NULL                    | 117549 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | t2    | eq_ref | PRIMARY        | PRIMARY | 4       | 1411_master.t1.id_assoc |      1 | NULL                                         |
|  1 | SIMPLE      | t3    | ref    | PRIMARY        | PRIMARY | 4       | 1411_master.t1.id_resv  |      1 | NULL                                         |
+----+-------------+-------+--------+----------------+---------+---------+-------------------------+--------+----------------------------------------------+
3 rows in set (0.00 sec)

mysql> EXPLAIN SELECT t1.account, t1.array_max_tasks, t1.array_task_str, t1.cpus_req, t1.derived_ec, t1.derived_es, t1.exit_code, t1.id_array_job, t1.id_array_task, t1.id_assoc, t1.id_block, t1.id_group, t1.id_job, t1.id_qos, t1.id_resv, t3.resv_name, t1.id_user, t1.id_wckey, t1.job_db_inx, t1.job_name, t1.kill_requid, t1.mem_req, t1.node_inx, t1.nodelist, t1.nodes_alloc, t1.partition, t1.priority, t1.state, t1.time_eligible, t1.time_end, t1.time_start, t1.time_submit, t1.time_suspended, t1.timelimit, t1.track_steps, t1.wckey, t1.gres_alloc, t1.gres_req, t1.gres_used, t2.acct, t2.lft, t2.user FROM compy_job_table AS t1 LEFT JOIN compy_assoc_table AS t2 ON t1.id_assoc = t2.id_assoc LEFT JOIN compy_resv_table AS t3 ON t1.id_resv = t3.id_resv WHERE ((t1.nodes_alloc between 2 and 10000)) && ((t1.time_start && ((1434384740 BETWEEN t1.time_start AND t1.time_end) || (t1.time_start BETWEEN 1434384740 AND 1434384741)))) GROUP BY id_job , time_submit DESC;
+----+-------------+-------+--------+----------------------------+-------------+---------+-------------------------+------+---------------------------------------------------------------------+
| id | select_type | table | type   | possible_keys              | key         | key_len | ref                     | rows | Extra                                                               |
+----+-------------+-------+--------+----------------------------+-------------+---------+-------------------------+------+---------------------------------------------------------------------+
|  1 | SIMPLE      | t1    | range  | id_job,rollup2,nodes_alloc | nodes_alloc | 4       | NULL                    |  720 | Using index condition; Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | t2    | eq_ref | PRIMARY                    | PRIMARY     | 4       | 1411_master.t1.id_assoc |    1 | NULL                                                                |
|  1 | SIMPLE      | t3    | ref    | PRIMARY                    | PRIMARY     | 4       | 1411_master.t1.id_resv  |    1 | NULL                                                                |
+----+-------------+-------+--------+----------------------------+-------------+---------+-------------------------+------+---------------------------------------------------------------------+
3 rows in set (0.00 sec)

f1ed6616

Move logic to optimize performance · 23f6ec89
Morris Jette authored Jul 27, 2015
```
No change in functionality
```
23f6ec89

Fix bug in node selection with topology optimization · 9dad2ff7

Morris Jette authored Jul 27, 2015

If node definitions in slurm.conf are spread across multiple lines
and topology/tree is configured, then sub-optimal node selection can
occur.
bug 1645

9dad2ff7

Add sinfo long format option · bafcb17f
Nathan Yee authored Jul 27, 2015
```
This also adds the ability to display allocated memory on a node
bug 1804
```
bafcb17f
Minor format change for download web page · b1ec4667
Morris Jette authored Jul 27, 2015

b1ec4667

Merge branch 'slurm-14.11' · 821c8bb8

Morris Jette authored Jul 27, 2015

Conflicts:
	slurm/slurm.h.in
	src/slurmctld/job_scheduler.c
	src/slurmd/slurmstepd/task.c

821c8bb8

Prevent slurmctld segv when delete reservation name is NULL · 18bbf378
Dorian Krause authored Jul 27, 2015

18bbf378
More accurate log message · 34b6f814
Dominik Bartkiewicz authored Jul 27, 2015

34b6f814

Log build job queue timeout less ofter · aee694eb

Morris Jette authored Jul 27, 2015

Rather than generating loads of log messages about too much time
being used to build the job queue every few seconds, log it only
every 10 minutes.
bug 1827

aee694eb

Log job-partition pair count in job queue · 4f40d604
Morris Jette authored Jul 27, 2015
```
Log the number of job-partition pairs added to the queue for job
scheduling.
bug 1827
```
4f40d604

24 Jul, 2015 4 commits
- burst_buffer/cray work · 6c425ca9
  Morris Jette authored Jul 24, 2015
```
Minor updates to web page document
Minor improvements to error message logic
```
  6c425ca9
- burst_buffer/cray work · bf2a079f
  Morris Jette authored Jul 24, 2015
```
Improve logging
Disable stage-in/out if only persistent buffers used
End-to-end with persistent buffer works
Improve user error information
```
  bf2a079f
- fix long line · d8775792
  Danny Auble authored Jul 24, 2015
  
  d8775792
- burst_buffer/cray: create persistent buffers · 92c36caa
  Morris Jette authored Jul 23, 2015
```
Also, use first pool as default if none configured
Several changes for updated API calls
```
  92c36caa
23 Jul, 2015 3 commits

BB/Cray: Add "show_sessions" parsing · faf7cbb4
Morris Jette authored Jul 23, 2015

faf7cbb4
burst_buffer/cray: Add show_sessions logic · 57954d06
Morris Jette authored Jul 22, 2015

57954d06

Address some Cray scalability issues · 92049932

Morris Jette authored Jul 22, 2015

On Cray we were seeing an srun error reading slurmstepd message header.
  This was due to a shutdown race condition and the error message has
  been removed.
Cray: Disable LDAP references from slurmstepd on job launch due for
  improved scalability.
Document EioTimeout configuration parameter for large system
bug 1786

92049932

22 Jul, 2015 10 commits
- improve error message · e0844384
  Morris Jette authored Jul 22, 2015
```
If an RPC is being sent to a node and the destination job no longer
exists, log the message "job not running" rather than looking up
the error and reporting "invalid job id".
```
  e0844384
- burst_buffer/cray: Add logic to read BB instances · 35de1deb
  Morris Jette authored Jul 22, 2015
  
  35de1deb
- burst_buffer/cray: Change some symbol names · 210de8ab
  Morris Jette authored Jul 22, 2015
```
No change in logic, just renamed some variable and function names
  for greater clarity
```
  210de8ab
- Merge branch 'slurm-14.11' · e7d5c493
  Morris Jette authored Jul 22, 2015
  
  e7d5c493
- Capture salloc/srun information in sdiag statistics · dc079ea8
  Nicolas Joly authored Jul 22, 2015
```
Previously only batch job completions were being captured.
bug 1820
```
  dc079ea8
- Add functions to un/pack long double · 58e63757
  Morris Jette authored Jul 22, 2015
  
  58e63757
- Merge branch 'slurm-14.11' · 90af8654
  Morris Jette authored Jul 22, 2015
  
  90af8654
- Correct the sacct.a man page. · 8f1c1a80
  David Bigagli authored Jul 22, 2015
  
  8f1c1a80
- Fix node state race condition, UNKNOWN->IDLE · 00e8bb2b
  Morris Jette authored Jul 21, 2015
```
If a job was running on the node when slurmctld restarted. The slurmd
would notify slurmctld when the job ended and slurmctld would change
its state from UNKNOWN to IDLE, at least if the job termination happened
prior to the slurmd being asked for configuration information. The
configuration information might then not be collected for some time.
I've modified the code to address this problem and try to collect
configuration information from every node after slurmctld startup,
eliminating this race condition.
bug 1805
```
  00e8bb2b
- Fix spelling of node_rescrs to node_resrcs in Perl API. · dad397f9
  Brian Christiansen authored Jul 21, 2015
```
Bug 1208
```
  dad397f9
21 Jul, 2015 8 commits

Merge branch 'slurm-14.11' · 11c5684f
Morris Jette authored Jul 21, 2015

11c5684f
Fix typo on web page · 49560d80
Morris Jette authored Jul 21, 2015

49560d80

fix incorrect reading of cpuinfo on POWER systems · 962dea86

Chandler Wilkerson authored Jul 21, 2015

This patch provides a rewrite of how /proc/cpuinfo is parsed in common_jag.c, as the original code made the incorrect assumption that cpuinfo follows a sane format across architectures ;-)

The motivation for this patch is that the original code was producing stack smashing on a POWER7 running RHEL6.4 Red Hat adds -fstack-protector along with a lot of other protective CFLAGS when building RPMs. The code ran okay with -fno-stack-protector, but that is not the best work-around.

So, the relevant /proc/cpuinfo line on an Intel (Xeon X5675) system looks like:

cpu MHz                : 3066.915

Whereas the relevant line in a POWER7 system is

clock                : 3550.000000MHz

My patch replaces the assumption that the relevant number starts 11 characters into the string with another assumption: That the relevant number starts two characters after a colon in a string that matches (M|G)Hz.

All in all, the function has a few more calls, which may be a performance issue if it has to be called multiple times, but since the section I edited only gets evaluated if we don't know the cpu frequency, getting it right will actually result in fewer string operations and unnecessary opens of /proc/cpuinfo for systems likewise affected.

Finally, I also read the actual value into a double and multiply it up to the size indicated by the suffix, so we end up with KHz? It was unclear what the original code intended, since it matched both MHz and GHz, replaced the decimal point with a zero, and read the result as an int.

--
Chandler Wilkerson
Center for Research Computing
Rice University

962dea86

Revert "ALPS - Remove sanity code to work like it did in 2.5. This is an addition" · 6ab26aa6

Danny Auble authored Oct 16, 2014

This reverts commit 2c95e2d2.

Conflicts:
	src/plugins/select/alps/basil_interface.c

This is related to bug 1822.  It isn't clear why the code was taken out in
this commit in the first place and based off of commit 2e2de6a4 (which is
the reason for the conflict) we tried unsuccessfully to put it back.

It appears the only difference here is the addition of
always setting mppnppn = 1 instead of always to
job_ptr->details->ntasks_per_node when no ntasks is set.

This appears to only be an issue with salloc or sbatch as ntasks
is always set for srun.

6ab26aa6

Patch that adds new script and templates · 1a0ec9a3

David Gloe authored Jul 21, 2015

I've attached a patch to add a new script and templates that create slurm.conf and gres.conf on a Cray SMW.

The script parses the output of xthwinv, combines nodes with the same hardware, and writes the output to slurm.conf. It also uses NodeName in gres.conf to allow having the same gres.conf file on all nodes.

I removed the settings in slurm.conf that we had commented out or set to the default.

1a0ec9a3

Merge branch 'slurm-14.11' · 594e043d
Morris Jette authored Jul 21, 2015

594e043d
Clarify GraceTime configuration · afae90b1
Morris Jette authored Jul 20, 2015

afae90b1
Enhance the error message. · 2c937879
david authored Jul 21, 2015

2c937879

20 Jul, 2015 1 commit
- Fix some uninitialized variables reported by CLANG · 12b7926a
  Morris Jette authored Jul 20, 2015
  
  12b7926a