... | @@ -133,7 +133,8 @@ Low CPI near theoretical limit (if instruction throughput is the problem). |
... | @@ -133,7 +133,8 @@ Low CPI near theoretical limit (if instruction throughput is the problem). |
|
Static code analysis predicting large pressure on single execution port.
|
|
Static code analysis predicting large pressure on single execution port.
|
|
High CPI due to bad pipelining.
|
|
High CPI due to bad pipelining.
|
|
(FLOPS_DP, FLOPS_SP, DATA).
|
|
(FLOPS_DP, FLOPS_SP, DATA).
|
|
**Fix:** ?
|
|
**Fix:**
|
|
|
|
?
|
|
|
|
|
|
---
|
|
---
|
|
|
|
|
... | @@ -151,6 +152,7 @@ Low CPI. |
... | @@ -151,6 +152,7 @@ Low CPI. |
|
FLOPS_DP, FLOPS_DP.
|
|
FLOPS_DP, FLOPS_DP.
|
|
**Fix**
|
|
**Fix**
|
|
Remove unnecessary synchronization (especially the implicit ones!)
|
|
Remove unnecessary synchronization (especially the implicit ones!)
|
|
|
|
|
|
---
|
|
---
|
|
|
|
|
|
#### False cache line sharing
|
|
#### False cache line sharing
|
... | @@ -163,6 +165,7 @@ Frequent (remote) evicts (CACHE). |
... | @@ -163,6 +165,7 @@ Frequent (remote) evicts (CACHE). |
|
**Fix**
|
|
**Fix**
|
|
Revisit the working set per thread.
|
|
Revisit the working set per thread.
|
|
Data replication.
|
|
Data replication.
|
|
|
|
|
|
---
|
|
---
|
|
|
|
|
|
#### Bad page placement on ccNUMA
|
|
#### Bad page placement on ccNUMA
|
... | | ... | |