Loops
attention.cpp: 30 - 260.14 %
| Run march=znver5 mprefer-vector-width=256 | Run march=znver5 [mprefer-vector-width=512] | Run march=znver5 mprefer-vector-width=512 ffast-math | ||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| |||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 27 | 3.85 | 3.85 | 40.96 | 0 | 8.33 | 2.7 | 30 | 3.70 | 3.69 | 38.73 | 0 | 8.33 | 2.81 | 30 | 3.82 | 3.82 | 40.69 | 0 | 8.33 | 2.72 |
| 62 | 0.69 | 0.68 | 7.29 | 0 | 8.33 | 3.77 | 47 | 2.44 | 2.44 | 25.58 | 0 | 8.33 | 3.33 | 47 | 2.82 | 2.82 | 30.00 | 0 | 8.33 | 3.38 |
| 52 | 0.70 | 0.70 | 7.50 | 0 | 8.33 | 3.69 | 69 | 0.56 | 0.56 | 5.87 | 0 | 8.33 | 4.16 | 69 | 0.70 | 0.70 | 7.45 | 0 | 8.33 | 3.45 |
| 43 | 1.95 | 1.95 | 20.74 | 0 | 8.33 | 5.33 | 63 | 0.57 | 0.57 | 5.97 | 0 | 8.33 | 4.09 | 63 | 0.71 | 0.71 | 7.55 | 0 | 8.33 | 3.49 |
| 57 | 0.68 | 0.68 | 7.23 | 0 | 8.33 | 3.84 | 57 | 0.70 | 0.70 | 7.34 | 0 | 8.33 | 3.46 | 57 | 0.68 | 0.68 | 7.23 | 0 | 8.33 | 3.64 |
| Sum on 5 analyzed binary loops (attention-avx512 - 27, attention-avx512 - 62, attention-avx512 - 52, attention-avx512 - 43, attention-avx512 - 57) | Sum on 5 analyzed binary loops (attention-avx512 - 30, attention-avx512 - 47, attention-avx512 - 69, attention-avx512 - 63, attention-avx512 - 57) | Sum on 5 analyzed binary loops (attention-avx512 - 30, attention-avx512 - 47, attention-avx512 - 69, attention-avx512 - 63, attention-avx512 - 57) | ||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | |||||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | ||||||||||||||||||
| Low iteration count | 1 | Low iteration count | 1 | Low iteration count | 1 | |||||||||||||||
| Control Flow Issues | Control Flow Issues | Control Flow Issues | ||||||||||||||||||
| Low iteration count | 1 | Low iteration count | 1 | Low iteration count | 1 | |||||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | ||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | |||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | |||||||||||||||
attention.cpp: 55 - 2.28 %
| Run march=znver5 mprefer-vector-width=256 | Run march=znver5 [mprefer-vector-width=512] | Run march=znver5 mprefer-vector-width=512 ffast-math | ||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| |||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 38 | 0.00 | 0.00 | 0.05 | 78.95 | 20.72 | 2.5 | 43 | 0.01 | 0.01 | 0.10 | 37.5 | 14.06 | 2 | 43 | 0.01 | 0.01 | 0.11 | 28.57 | 12.5 | 2.75 |
| 39 | 0.01 | 0.01 | 0.11 | 42.86 | 14.29 | 1 | 39 | 0.04 | 0.04 | 0.42 | 83.33 | 30.21 | 1.62 | 39 | 0.01 | 0.01 | 0.16 | 100 | 100 | 13.66 |
| 35 | 0.11 | 0.12 | 1.22 | 81.08 | 25 | 2.59 | 42 | 0.01 | 0.01 | 0.10 | 83.33 | 25.35 | 1.5 | |||||||
| Sum on 1 analyzed binary loop (attention-avx512 - 35) | Sum on 1 analyzed binary loop (attention-avx512 - 39) | No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. | ||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | |||||||||||||||
| Loop Computation Issues | Loop Computation Issues | |||||||||||||||||||
| Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | |||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | |||||||||||||||||
| Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 | |||||||||||||||||
| Control Flow Issues | Control Flow Issues | |||||||||||||||||||
| Presence of calls | 1 | Presence of calls | 1 | |||||||||||||||||
| Data Access Issues | Data Access Issues | |||||||||||||||||||
| Presence of indirect access | 1 | Presence of indirect access | 0 | |||||||||||||||||
| More than 10% of the vector loads instructions are unaligned | 1 | More than 10% of the vector loads instructions are unaligned | 1 | |||||||||||||||||
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | |||||||||||||||||
| More than 20% of the loads are accessing the stack | 1 | More than 20% of the loads are accessing the stack | 1 | |||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | |||||||||||||||||||
| Presence of calls | 1 | Presence of calls | 1 | |||||||||||||||||
| Presence of indirect access | 1 | Presence of indirect access | 0 | |||||||||||||||||
| Inefficient Vectorization | Inefficient Vectorization | |||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | |||||||||||||||||
attention.cpp: 52 - 1.32 %
| Run march=znver5 mprefer-vector-width=256 | Run march=znver5 [mprefer-vector-width=512] | Run march=znver5 mprefer-vector-width=512 ffast-math | ||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| |||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 34 | 0.00 | 0.00 | 0.05 | 57.14 | 16.96 | 2 | 38 | 0.01 | 0.01 | 0.05 | 57.14 | 16.96 | 2.99 | 44 | 0.01 | 0.01 | 0.11 | 100 | 50 | 5 |
| 33 | 0.05 | 0.04 | 0.48 | 83.66 | 26.55 | 2.78 | 37 | 0.01 | 0.01 | 0.10 | 83.78 | 32.09 | 1.5 | 38 | 0.01 | 0.01 | 0.11 | 57.14 | 16.96 | 5.25 |
| 40 | 0.01 | 0.01 | 0.11 | 84.21 | 21.38 | 5 | 44 | 0.03 | 0.03 | 0.31 | 83.78 | 26.01 | 1.25 | |||||||
| Sum on 1 analyzed binary loop (attention-avx512 - 33) | Sum on 1 analyzed binary loop (attention-avx512 - 44) | No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. | ||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | |||||||||||||||
| Loop Computation Issues | Loop Computation Issues | |||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | |||||||||||||||||
| Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 | |||||||||||||||||
| Control Flow Issues | Control Flow Issues | |||||||||||||||||||
| Presence of calls | 1 | Presence of calls | 1 | |||||||||||||||||
| Data Access Issues | Data Access Issues | |||||||||||||||||||
| Presence of indirect access | 1 | Presence of indirect access | 0 | |||||||||||||||||
| More than 10% of the vector loads instructions are unaligned | 1 | More than 10% of the vector loads instructions are unaligned | 1 | |||||||||||||||||
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | |||||||||||||||||
| More than 20% of the loads are accessing the stack | 1 | More than 20% of the loads are accessing the stack | 1 | |||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | |||||||||||||||||||
| Presence of calls | 1 | Presence of calls | 1 | |||||||||||||||||
| Presence of indirect access | 1 | Presence of indirect access | 0 | |||||||||||||||||
| Inefficient Vectorization | Inefficient Vectorization | |||||||||||||||||||
| Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | |||||||||||||||||
attention.cpp: 240 - 0.32 %
| Run march=znver5 mprefer-vector-width=256 | Run march=znver5 [mprefer-vector-width=512] | Run march=znver5 mprefer-vector-width=512 ffast-math | ||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| |||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 48 | 0.01 | 0.01 | 0.11 | 38.46 | 21.15 | 0 | 53 | 0.01 | 0.01 | 0.10 | 100 | 80 | 0 | 53 | 0.01 | 0.01 | 0.11 | 100 | 80 | 0 |
| No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. | No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. | No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. | ||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | |||||||||||||||
attention.cpp: 47 - 0.21 %
| Run march=znver5 mprefer-vector-width=256 | Run march=znver5 [mprefer-vector-width=512] | Run march=znver5 mprefer-vector-width=512 ffast-math | ||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| |||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 41 | 0.00 | 0.00 | 0.05 | 0 | 6.25 | 0 | 46 | 0.01 | 0.01 | 0.05 | 100 | 50 | 1 | 46 | 0.00 | 0.00 | 0.05 | 100 | 50 | 3.5 |
| 42 | 0.00 | 0.00 | 0.05 | 100 | 25 | 0 | ||||||||||||||
| No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. | No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. | No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. | ||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | |||||||||||||||

