Loops
attention_v2.cpp: 30 - 469.26 %
| Run gcc-256 | Run gcc-512 | Run clang-256 | Run clang-512 | Run aocc-256 | Run aocc-512 | ||||||||||||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 0.41 | 0.42 | 6.23 | 0 | 8.75 | 2 | 0.34 | 0.34 | 5.01 | 0 | 8.75 | 53 | 3.03 | 3.03 | 34.43 | 0 | 7.5 | 44 | 0.50 | 0.50 | 5.67 | 0 | 7.5 | 30 | 0.52 | 0.52 | 5.27 | 0 | 8.33 | 32 | 0.54 | 0.53 | 5.42 | 0 | 8.33 |
| 5 | 0.44 | 0.43 | 6.53 | 0 | 8.75 | 5 | 0.37 | 0.37 | 5.46 | 0 | 8.75 | 64 | 1.86 | 1.86 | 21.08 | 0 | 7.5 | 43 | 0.49 | 0.49 | 5.61 | 0 | 7.5 | 29 | 0.65 | 0.65 | 6.53 | 0 | 8.33 | 31 | 0.65 | 0.65 | 6.58 | 0 | 8.33 |
| 8 | 0.48 | 0.48 | 7.28 | 0 | 8.75 | 8 | 0.41 | 0.42 | 6.20 | 0 | 8.75 | 41 | 0.57 | 0.58 | 6.53 | 0 | 7.5 | 54 | 3.19 | 3.19 | 36.54 | 0 | 7.5 | 28 | 0.62 | 0.62 | 6.28 | 0 | 8.33 | 41 | 3.69 | 3.69 | 37.37 | 0 | 8.33 |
| 13 | 1.68 | 1.68 | 25.23 | 0 | 8.75 | 13 | 1.79 | 1.79 | 26.76 | 0 | 8.75 | 43 | 0.49 | 0.49 | 5.57 | 0 | 7.5 | 66 | 1.91 | 1.91 | 21.88 | 0 | 7.5 | 33 | 2.37 | 2.37 | 23.96 | 0 | 8.33 | 33 | 0.66 | 0.66 | 6.68 | 0 | 8.33 |
| 16 | 2.24 | 2.24 | 33.71 | 0 | 8.75 | 16 | 2.35 | 2.35 | 35.20 | 0 | 8.75 | 42 | 0.63 | 0.62 | 7.10 | 0 | 7.5 | 42 | 0.50 | 0.50 | 5.73 | 0 | 7.5 | 38 | 3.80 | 3.79 | 38.45 | 0 | 8.33 | 36 | 2.46 | 2.46 | 24.96 | 0 | 8.33 |
| Sum on 5 analyzed binary loops (attention-gcc-znver5-256 - 2, attention-gcc-znver5-256 - 5, attention-gcc-znver5-256 - 8, attention-gcc-znver5-256 - 13, attention-gcc-znver5-256 - 16) | Sum on 5 analyzed binary loops (attention-gcc-znver5-512 - 2, attention-gcc-znver5-512 - 5, attention-gcc-znver5-512 - 8, attention-gcc-znver5-512 - 13, attention-gcc-znver5-512 - 16) | Sum on 5 analyzed binary loops (attention-clang-znver5-256 - 53, attention-clang-znver5-256 - 64, attention-clang-znver5-256 - 41, attention-clang-znver5-256 - 43, attention-clang-znver5-256 - 42) | Sum on 5 analyzed binary loops (attention-clang-znver5-512 - 44, attention-clang-znver5-512 - 43, attention-clang-znver5-512 - 54, attention-clang-znver5-512 - 66, attention-clang-znver5-512 - 42) | Sum on 5 analyzed binary loops (attention-aocc-znver5-256 - 30, attention-aocc-znver5-256 - 29, attention-aocc-znver5-256 - 28, attention-aocc-znver5-256 - 33, attention-aocc-znver5-256 - 38) | Sum on 5 analyzed binary loops (attention-aocc-znver5-512 - 32, attention-aocc-znver5-512 - 31, attention-aocc-znver5-512 - 41, attention-aocc-znver5-512 - 33, attention-aocc-znver5-512 - 36) | ||||||||||||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | ||||||||||||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 0 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 0 | ||||||||||||||||||||||||||
| Low iteration count | 0 | Low iteration count | 0 | Low iteration count | Low iteration count | Low iteration count | 1 | Low iteration count | 1 | ||||||||||||||||||||||||||
| Control Flow Issues | Control Flow Issues | Control Flow Issues | Control Flow Issues | Control Flow Issues | Control Flow Issues | ||||||||||||||||||||||||||||||
| Low iteration count | Low iteration count | Low iteration count | Low iteration count | Low iteration count | 1 | Low iteration count | 1 | ||||||||||||||||||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | Data Access Issues | Data Access Issues | Data Access Issues | ||||||||||||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||||||||||
attention_v2.cpp: 55 - 12.08 %
| Run gcc-256 | Run gcc-512 | Run clang-256 | Run clang-512 | Run aocc-256 | Run aocc-512 | ||||||||||||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 28 | 0.22 | 0.22 | 3.23 | 0 | 6.25 | 28 | 0.25 | 0.25 | 3.74 | 0 | 6.25 | 49 | 0.10 | 0.10 | 1.14 | 83.33 | 25.35 | 60 | 0.02 | 0.02 | 0.23 | 83.33 | 21.18 | 37 | 0.15 | 0.15 | 1.47 | 81.08 | 25 | 49 | 0.00 | 0.00 | 0.05 | 42.86 | 14.29 |
| 58 | 0.02 | 0.02 | 0.23 | 42.86 | 14.29 | 59 | 0.03 | 0.03 | 0.29 | 37.5 | 14.06 | 44 | 0.01 | 0.01 | 0.10 | 78.95 | 20.72 | 48 | 0.01 | 0.01 | 0.10 | 81.08 | 25 | ||||||||||||
| 50 | 0.08 | 0.08 | 0.86 | 83.33 | 30.21 | 45 | 0.01 | 0.01 | 0.10 | 42.86 | 14.29 | 40 | 0.05 | 0.05 | 0.56 | 82.19 | 29.97 | ||||||||||||||||||
| Sum on 1 analyzed binary loop (attention-gcc-znver5-256 - 28) | Sum on 1 analyzed binary loop (attention-gcc-znver5-512 - 28) | Sum on 1 analyzed binary loop (attention-clang-znver5-256 - 49) | Sum on 1 analyzed binary loop (attention-clang-znver5-512 - 50) | Sum on 1 analyzed binary loop (attention-aocc-znver5-256 - 37) | Sum on 1 analyzed binary loop (attention-aocc-znver5-512 - 40) | ||||||||||||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | ||||||||||||||||||||||||||||||
| Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | ||||||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||||||
| Presence of a large number of scalar integer instructions | 0 | Presence of a large number of scalar integer instructions | 0 | Presence of a large number of scalar integer instructions | 0 | Presence of a large number of scalar integer instructions | 0 | Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 0 | ||||||||||||||||||||||||
| Control Flow Issues | Control Flow Issues | Control Flow Issues | Control Flow Issues | Control Flow Issues | Control Flow Issues | ||||||||||||||||||||||||||||||
| Presence of calls | 1 | Presence of calls | 1 | Presence of calls | 1 | Presence of calls | 1 | Presence of calls | 1 | Presence of calls | 1 | ||||||||||||||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | Data Access Issues | Data Access Issues | Data Access Issues | ||||||||||||||||||||||||||||||
| Presence of indirect access | 0 | Presence of indirect access | 0 | Presence of indirect access | 0 | Presence of indirect access | 0 | Presence of indirect access | 1 | Presence of indirect access | 1 | ||||||||||||||||||||||||
| More than 10% of the vector loads instructions are unaligned | 0 | More than 10% of the vector loads instructions are unaligned | 0 | More than 10% of the vector loads instructions are unaligned | 1 | More than 10% of the vector loads instructions are unaligned | 1 | More than 10% of the vector loads instructions are unaligned | 1 | More than 10% of the vector loads instructions are unaligned | 1 | ||||||||||||||||||||||||
| Presence of special instructions executing on a single port | 0 | Presence of special instructions executing on a single port | 0 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||||
| More than 20% of the loads are accessing the stack | 1 | More than 20% of the loads are accessing the stack | 1 | More than 20% of the loads are accessing the stack | 1 | More than 20% of the loads are accessing the stack | 1 | More than 20% of the loads are accessing the stack | 1 | More than 20% of the loads are accessing the stack | 1 | ||||||||||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||||||||||||||||
| Presence of calls | 1 | Presence of calls | 1 | Presence of calls | 1 | Presence of calls | 1 | Presence of calls | 1 | Presence of calls | 1 | ||||||||||||||||||||||||
| Presence of indirect access | 0 | Presence of indirect access | 0 | Presence of indirect access | 0 | Presence of indirect access | 0 | Presence of indirect access | 1 | Presence of indirect access | 1 | ||||||||||||||||||||||||
| Inefficient Vectorization | Inefficient Vectorization | Inefficient Vectorization | Inefficient Vectorization | Inefficient Vectorization | Inefficient Vectorization | ||||||||||||||||||||||||||||||
| Presence of special instructions executing on a single port | Presence of special instructions executing on a single port | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||||||
attention_v2.cpp: 52 - 5.53 %
| Run gcc-256 | Run gcc-512 | Run clang-256 | Run clang-512 | Run aocc-256 | Run aocc-512 | ||||||||||||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 27 | 0.13 | 0.13 | 2.03 | 0 | 6.25 | 27 | 0.07 | 0.07 | 1.05 | 0 | 6.25 | 47 | 0.02 | 0.03 | 0.28 | 84.21 | 26.64 | 48 | 0.04 | 0.04 | 0.40 | 83.78 | 32.09 | 36 | 0.01 | 0.01 | 0.10 | 50 | 16.41 | 39 | 0.01 | 0.01 | 0.10 | 57.14 | 16.96 |
| 59 | 0.02 | 0.02 | 0.23 | 80 | 20.94 | 61 | 0.02 | 0.02 | 0.23 | 81.58 | 25.66 | 46 | 0.01 | 0.01 | 0.10 | 84.21 | 21.38 | 50 | 0.02 | 0.02 | 0.25 | 83.78 | 26.01 | ||||||||||||
| 35 | 0.05 | 0.05 | 0.51 | 84.21 | 26.64 | 38 | 0.02 | 0.02 | 0.25 | 83.5 | 32.03 | ||||||||||||||||||||||||
| Sum on 1 analyzed binary loop (attention-gcc-znver5-256 - 27) | Sum on 1 analyzed binary loop (attention-gcc-znver5-512 - 27) | No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. | No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. | Sum on 1 analyzed binary loop (attention-aocc-znver5-256 - 35) | No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. | ||||||||||||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | |||||||||||||||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||||||||||||
| Control Flow Issues | Control Flow Issues | Control Flow Issues | |||||||||||||||||||||||||||||||||
| Presence of calls | 1 | Presence of calls | 1 | Presence of calls | 1 | ||||||||||||||||||||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | |||||||||||||||||||||||||||||||||
| More than 10% of the vector loads instructions are unaligned | 0 | More than 10% of the vector loads instructions are unaligned | 0 | More than 10% of the vector loads instructions are unaligned | 1 | ||||||||||||||||||||||||||||||
| Presence of special instructions executing on a single port | 0 | Presence of special instructions executing on a single port | 0 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||||||||||
| More than 20% of the loads are accessing the stack | 1 | More than 20% of the loads are accessing the stack | 1 | More than 20% of the loads are accessing the stack | 1 | ||||||||||||||||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | |||||||||||||||||||||||||||||||||
| Presence of calls | 1 | Presence of calls | 1 | Presence of calls | 1 | ||||||||||||||||||||||||||||||
| Inefficient Vectorization | Inefficient Vectorization | Inefficient Vectorization | |||||||||||||||||||||||||||||||||
| Presence of special instructions executing on a single port | Presence of special instructions executing on a single port | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||||||||||||

