... | @@ -108,10 +108,10 @@ Kernel granularity refers to performance degradation through the API overhead cr |
... | @@ -108,10 +108,10 @@ Kernel granularity refers to performance degradation through the API overhead cr |
|
Excessive branching will reduce the arithmetic throughput and thus a lower value for the `VALUUtilization` metric. Reasonable numbers are 95% and up. The ALU stall time is the inverse of this percentage.
|
|
Excessive branching will reduce the arithmetic throughput and thus a lower value for the `VALUUtilization` metric. Reasonable numbers are 95% and up. The ALU stall time is the inverse of this percentage.
|
|
|
|
|
|
#### 6. Shared memory bank conflicts
|
|
#### 6. Shared memory bank conflicts
|
|
Shared memory bank conflicts are tracked through the `LDSBankConflict` hardware counter. Generally, this value should be 0. You can check the runtime debug information for verifying whether the application actually uses shared memory or not.
|
|
Shared memory bank conflicts are tracked through the `LDSBankConflict` hardware counter. Generally, this value should be 0. If it is not 0, you can use the `ALUStalledByLDS` and `LDSInsts` to get a more detailed picture. You can also check the runtime debug information via `CRAY_ACC_DEBUG` for verifying whether the application actually uses shared memory or not.
|
|
|
|
|
|
#### 7. Impact of atomic operations
|
|
#### 7. Impact of atomic operations
|
|
Atomic instructions are difficult to diagnose. You can scan the source code for these instructions, but there is no known automated way.
|
|
Atomic instructions are difficult to diagnose. You can scan the source code for these instructions, but there is no known automated way. Some metrics on memory transactions, cache utilization, and potential contention can be useful if there are no alternative reasons for the worsening of their values. These can be: `MemUnitBusy`, `MemUnitStalled`, `L2CacheHit`, `VALUBusy`.
|
|
|
|
|
|
|
|
|
|
# Subpages
|
|
# Subpages
|
... | | ... | |