| Run G++ O3 + Vectorize + Funroll + Ffastmath | Run ACFL O3 + Vectorize + Funroll + Ffastmath |
| Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 61-67
| Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 61-67
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 4 | 1.37 | 8.91 | 83.63 | 8.33 | 24.65 | 0.23 | 8 | 1.48 | 2.93 | 27.46 | 41.67 | 37.5 | 0.74 |
| |
| Sum on 1 analyzed binary loop (kmeans-gcc-O3-all - 4) | Sum on 1 analyzed binary loop (kmeans-acfl-O3-all - 8) |
| Analysis | Count | Analysis | Count |
| Loop Computation Issues | | Loop Computation Issues | |
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 0 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 |
| Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 0 |
| Control Flow Issues | | Control Flow Issues | |
| Presence of 2 to 4 paths | 0 | Presence of 2 to 4 paths | 1 |
| Presence of more than 4 paths | 1 | Presence of more than 4 paths | 0 |
| Vectorization Roadblocks | | Vectorization Roadblocks | |
| Presence of 2 to 4 paths | 0 | Presence of 2 to 4 paths | 1 |
| Presence of more than 4 paths | 1 | Presence of more than 4 paths | 0 |
| Run G++ O3 + Vectorize + Funroll + Ffastmath | Run ACFL O3 + Vectorize + Funroll + Ffastmath |
| Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 81-84
| Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 81-84
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 14 | 9.06 | 0.93 | 8.75 | 7.69 | 23.72 | 3.99 | 6 | 9.16 | 0.28 | 2.67 | 11.11 | 26.39 | 7.24 |
| |
| Sum on 1 analyzed binary loop (kmeans-gcc-O3-all - 14) | Sum on 1 analyzed binary loop (kmeans-acfl-O3-all - 6) |
| Analysis | Count | Analysis | Count |
| Loop Computation Issues | | Loop Computation Issues | |
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 |
| Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 |
| Data Access Issues | | Data Access Issues | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 0 |
| Presence of indirect access | 1 | Presence of indirect access | 1 |
| Vectorization Roadblocks | | Vectorization Roadblocks | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 0 |
| Presence of indirect access | 1 | Presence of indirect access | 1 |