| Run Skylake ICPX Ofast Manual Unroll ONLY (no Hoisting) | Run Skylake ICPX Ofast Hoisting ONLY (no Manual Unroll) |
| Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 72-86
| Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 72-78
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 27 | 102.63 | 88.38 | 83.94 | 13.79 | 13.36 | 73.81 | 23 | 0.01 | 0.00 | 0.00 | 37.5 | 16.41 | 0 |
| 25 | 21.43 | 21.43 | 77.09 | 77.78 | 41.32 | 404.91 |
| |
| Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 27) | Sum on 2 analyzed binary loops (kmeans-icpx-Ofast - 23, kmeans-icpx-Ofast - 25) |
| Analysis | Count | Analysis | Count |
| Control Flow Issues | | Control Flow Issues | |
| Presence of more than 4 paths | 1 | Presence of more than 4 paths | |
| Data Access Issues | | Data Access Issues | |
| Presence of special instructions executing on a single port | | Presence of special instructions executing on a single port | 1 |
| Vectorization Roadblocks | | Vectorization Roadblocks | |
| Presence of more than 4 paths | 1 | Presence of more than 4 paths | |
| Inefficient Vectorization | | Inefficient Vectorization | |
| Presence of special instructions executing on a single port | | Presence of special instructions executing on a single port | 1 |
| Use of masked instructions | | Use of masked instructions | 1 |
| Run Skylake ICPX Ofast Manual Unroll ONLY (no Hoisting) | Run Skylake ICPX Ofast Hoisting ONLY (no Manual Unroll) |
| Loop Source Regions | | Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 96-101
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 21 | 5.02 | 3.31 | 11.92 | 0 | 11.61 | 10.02 |
| |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 21) |
| Analysis | Count | Analysis | Count |
| | Loop Computation Issues | |
| | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 |
| | Presence of a large number of scalar integer instructions | 1 |
| | Data Access Issues | |
| | Presence of indirect access | 1 |
| | Vectorization Roadblocks | |
| | Presence of indirect access | 1 |
| Run Skylake ICPX Ofast Manual Unroll ONLY (no Hoisting) | Run Skylake ICPX Ofast Hoisting ONLY (no Manual Unroll) |
| Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 115-120
| Loop Source Regions | |
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 21 | 3.91 | 1.87 | 1.78 | 0 | 11.61 | 10.12 | |
| |
| Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 21) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count |
| Loop Computation Issues | | | |
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | | |
| Presence of a large number of scalar integer instructions | 1 | | |
| Data Access Issues | | | |
| Presence of indirect access | 1 | | |
| Vectorization Roadblocks | | | |
| Presence of indirect access | 1 | | |