... | @@ -61,5 +61,63 @@ This workflow can be adapted to analyze our projects via: |
... | @@ -61,5 +61,63 @@ This workflow can be adapted to analyze our projects via: |
|
2. If computation bounded, check: Load Imbalance, Branch Rate & Misprediction, Performance Options (intrinsics, vectorization, pipelining, superscalar execution, out of order execution, branch prediction, speculative execution), or GPU porting
|
|
2. If computation bounded, check: Load Imbalance, Branch Rate & Misprediction, Performance Options (intrinsics, vectorization, pipelining, superscalar execution, out of order execution, branch prediction, speculative execution), or GPU porting
|
|
4. Analyze changes with new roofline model, and (potentially) repeat!
|
|
4. Analyze changes with new roofline model, and (potentially) repeat!
|
|
|
|
|
|
|
|
## Performance Patterns & Signatures
|
|
|
|
Performance patterns (or anti-patterns) are specific behaviors/problems that form bottlenecks for performance. The subsections below will discuss performance patterns to look out for. CPU and GPU patterns will be discussed separately.
|
|
|
|
|
|
|
|
### CPU performance patterns
|
|
|
|
This section covers all performance patterns for the CPU side of applications.
|
|
|
|
|
|
|
|
#### Load Imbalance
|
|
|
|
**Issue:** The workload is not equally distributed. Several units stall waiting for one unit to complete.
|
|
|
|
**Performance Behavior:** Saturating speed-up (sooner than expected)
|
|
|
|
**Performance counters:** Different count of instructions retired or floating point operations among cores (FLOPS_DP, FLOPS_SP)
|
|
|
|
**Fix:** Reorganize work to improve load balancing.
|
|
|
|
|
|
|
|
#### Bandwidth saturation
|
|
|
|
**Issue:**
|
|
|
|
**Performance Behavior:**
|
|
|
|
**Performance counters:**
|
|
|
|
**Fix:**
|
|
|
|
|
|
|
|
#### Strided or erratic data access
|
|
|
|
**Issue:**
|
|
|
|
**Performance Behavior:**
|
|
|
|
**Performance counters:**
|
|
|
|
**Fix:**
|
|
|
|
|
|
|
|
#### Bad Instruction Mix
|
|
|
|
**Issue:**
|
|
|
|
**Performance Behavior:**
|
|
|
|
**Performance counters:**
|
|
|
|
**Fix:**
|
|
|
|
|
|
|
|
#### Limited instruction throughput
|
|
|
|
**Issue:**
|
|
|
|
**Performance Behavior:**
|
|
|
|
**Performance counters:**
|
|
|
|
**Fix:**
|
|
|
|
|
|
|
|
#### Synchronization overhead
|
|
|
|
**Issue:**
|
|
|
|
**Performance Behavior:**
|
|
|
|
**Performance counters:**
|
|
|
|
**Fix:**
|
|
|
|
|
|
|
|
#### False cache line sharing
|
|
|
|
**Issue:**
|
|
|
|
**Performance Behavior:**
|
|
|
|
**Performance counters:**
|
|
|
|
**Fix:**
|
|
|
|
|
|
|
|
#### Bad page placement on ccNUMA
|
|
|
|
**Issue:**
|
|
|
|
**Performance Behavior:**
|
|
|
|
**Performance counters:**
|
|
|
|
**Fix:**
|
|
|
|
|
|
|
|
### GPU performance patterns
|
|
|
|
This section covers all performance patterns for the GPU side of applications.
|
|
|
|
|
|
|
|
|
|
# Subpages
|
|
# Subpages
|
|
- [Intel Advisor & Intel VTune](3.a.-Intel-Offload-Advisor-&-Intel-VTune) |
|
- [Intel Advisor & Intel VTune](3.a.-Intel-Offload-Advisor-&-Intel-VTune) |
|
|
|
\ No newline at end of file |