Loops
▶main.cpp: 61 - 150.56 %
Run CASCADE LAKE | ICPX O3 + More Aggressive Flags | Run SKYLAKE | ICPX O3 + More Aggressive Flags | Run NEOVERSE V1 | ACFL O3 + Funroll + Ffastmath | Run NEOVERSE V2 | G++ O3 + Funroll | ||||||||||||||||||||||||
Loop Source Regions | Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| |||||||||||||||||||||
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
18 | 16.97 | 21.10 | 74.57 | 57.89 | 18.86 | 29.38 | 8 | 10.67 | 13.82 | 69.86 | 41.67 | 37.5 | 0.29 | 5 | 0.74 | 1.09 | 6.13 | 11.76 | 47.79 | 41.16 | |||||||
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (kmeans-icpx-O3-aggressive - 18) | Sum on 1 analyzed binary loop (kmeans-acfl-O3-all - 8) | Sum on 1 analyzed binary loop (kmeans-gcc-O3-funroll - 5) | ||||||||||||||||||||||||
Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||
Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | |||||||||||||||||||||||||
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 0 | ||||||||||||||||||||||
Presence of a large number of scalar integer instructions | 0 | Presence of a large number of scalar integer instructions | 0 | Presence of a large number of scalar integer instructions | 1 | ||||||||||||||||||||||
Control Flow Issues | Control Flow Issues | Control Flow Issues | |||||||||||||||||||||||||
Presence of 2 to 4 paths | 0 | Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 0 | ||||||||||||||||||||||
Presence of more than 4 paths | 1 | Presence of more than 4 paths | 0 | Presence of more than 4 paths | 0 | ||||||||||||||||||||||
Data Access Issues | Data Access Issues | Data Access Issues | |||||||||||||||||||||||||
Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | Presence of special instructions executing on a single port | ||||||||||||||||||||||||
Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | |||||||||||||||||||||||||
Presence of 2 to 4 paths | 0 | Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | 0 | ||||||||||||||||||||||
Presence of more than 4 paths | 1 | Presence of more than 4 paths | 0 | Presence of more than 4 paths | 1 | ||||||||||||||||||||||
Inefficient Vectorization | Inefficient Vectorization | Inefficient Vectorization | |||||||||||||||||||||||||
Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | Presence of special instructions executing on a single port |
▶main.cpp: 73 - 69.65 %
Run CASCADE LAKE | ICPX O3 + More Aggressive Flags | Run SKYLAKE | ICPX O3 + More Aggressive Flags | Run NEOVERSE V1 | ACFL O3 + Funroll + Ffastmath | Run NEOVERSE V2 | G++ O3 + Funroll | ||||||||||||||||||||||||
Loop Source Regions |
| Loop Source Regions | Loop Source Regions | Loop Source Regions | |||||||||||||||||||||||
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
18 | 12.42 | 14.28 | 69.65 | 57.89 | 18.86 | 43.41 | |||||||||||||||||||||
Sum on 1 analyzed binary loop (kmeans-icpx-O3-aggressive - 18) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||||||
Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||
Loop Computation Issues | |||||||||||||||||||||||||||
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||||||||
Control Flow Issues | |||||||||||||||||||||||||||
Presence of more than 4 paths | 1 | ||||||||||||||||||||||||||
Data Access Issues | |||||||||||||||||||||||||||
Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||||||
Vectorization Roadblocks | |||||||||||||||||||||||||||
Presence of more than 4 paths | 1 | ||||||||||||||||||||||||||
Inefficient Vectorization | |||||||||||||||||||||||||||
Presence of special instructions executing on a single port | 1 |
▶main.cpp: 81 - 21.71 %
Run CASCADE LAKE | ICPX O3 + More Aggressive Flags | Run SKYLAKE | ICPX O3 + More Aggressive Flags | Run NEOVERSE V1 | ACFL O3 + Funroll + Ffastmath | Run NEOVERSE V2 | G++ O3 + Funroll | ||||||||||||||||||||||||
Loop Source Regions | Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| |||||||||||||||||||||
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
9 | 11.17 | 1.74 | 6.15 | 0 | 11.61 | 2.87 | 6 | 8.94 | 1.45 | 7.33 | 11.11 | 26.39 | 1.3 | 13 | 7.39 | 1.46 | 8.23 | 7.89 | 48.03 | 3.42 | |||||||
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (kmeans-icpx-O3-aggressive - 9) | Sum on 1 analyzed binary loop (kmeans-acfl-O3-all - 6) | Sum on 1 analyzed binary loop (kmeans-gcc-O3-funroll - 13) | ||||||||||||||||||||||||
Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||
Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | |||||||||||||||||||||||||
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||||
Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 | ||||||||||||||||||||||
Data Access Issues | Data Access Issues | Data Access Issues | |||||||||||||||||||||||||
Presence of constant non-unit stride data access | 0 | Presence of constant non-unit stride data access | 0 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||||||||
Presence of indirect access | 1 | Presence of indirect access | 1 | Presence of indirect access | 1 | ||||||||||||||||||||||
Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | |||||||||||||||||||||||||
Presence of constant non-unit stride data access | 0 | Presence of constant non-unit stride data access | 0 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||||||||
Presence of indirect access | 1 | Presence of indirect access | 1 | Presence of indirect access | 1 |
▶main.cpp: 93 - 5.52 %
Run CASCADE LAKE | ICPX O3 + More Aggressive Flags | Run SKYLAKE | ICPX O3 + More Aggressive Flags | Run NEOVERSE V1 | ACFL O3 + Funroll + Ffastmath | Run NEOVERSE V2 | G++ O3 + Funroll | ||||||||||||||||||||||||
Loop Source Regions |
| Loop Source Regions | Loop Source Regions | Loop Source Regions | |||||||||||||||||||||||
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
9 | 7.81 | 1.13 | 5.52 | 0 | 11.61 | 4.41 | |||||||||||||||||||||
Sum on 1 analyzed binary loop (kmeans-icpx-O3-aggressive - 9) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||||||
Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||
Loop Computation Issues | |||||||||||||||||||||||||||
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||||||||
Presence of a large number of scalar integer instructions | 1 | ||||||||||||||||||||||||||
Data Access Issues | |||||||||||||||||||||||||||
Presence of indirect access | 1 | ||||||||||||||||||||||||||
Vectorization Roadblocks | |||||||||||||||||||||||||||
Presence of indirect access | 1 |