| Function: k_means(int, point_t*, point_t*, int*, point_t*, int, int) [clone ._omp_fn.0] | Module: kmeans-gcc-O3-all | Source: main.cpp:58-67 | Coverage (incl. loops): 92.22% | (excl. loops): 0.00% |
|---|
| Function: k_means(int, point_t*, point_t*, int*, point_t*, int, int) [clone ._omp_fn.0] | Module: kmeans-gcc-O3-all | Source: main.cpp:58-67 | Coverage (incl. loops): 92.22% | (excl. loops): 0.00% |
|---|
/home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 58 - 67 |
-------------------------------------------------------------------------------- |
58: #pragma omp parallel for |
59: for (int i = 0; i < n; ++i) { |
60: double optimal_dist = DBL_MAX; |
61: for (int j = 0; j < k; ++j) { |
62: double dist = |
63: (points[i].x - centroids[j].x) * (points[i].x - centroids[j].x) + |
64: (points[i].y - centroids[j].y) * (points[i].y - centroids[j].y); |
65: if (dist < optimal_dist) { |
66: optimal_dist = dist; |
67: assignment[i] = j; |
0x401d20 STP X29, X30, [SP, #976]! |
0x401d24 ADD X29, SP, #0 |
0x401d28 STP X19, X20, [SP, #16] |
0x401d2c ORR X20, XZR, X0 |
0x401d30 STR X21, [SP, #32] |
0x401d34 LDR W21, [X0, #24] |
0x401d38 BL 401780 |
0x401d3c ORR W19, WZR, W0 |
0x401d40 BL 4016e0 |
0x401d44 ORR W5, WZR, W0 |
0x401d48 SDIV W9, W21, W19 |
0x401d4c MSUB W1, W9, W19, W21 |
0x401d50 CMP W0, W1 |
0x401d54 B.LT 401f14 |
(3) 0x401d58 MADD W2, W9, W5, W1 |
(3) 0x401d5c ADD W17, W9, W2 |
(3) 0x401d60 CMP W2, W17 |
(3) 0x401d64 B.GE 401f04 |
(3) 0x401d68 LDR W6, [X20, #28] |
(3) 0x401d6c LDP X0, X10, [X20] |
(3) 0x401d70 LDR X7, [X20, #16] |
(3) 0x401d74 CMP W6, #0 |
(3) 0x401d78 B.LE 401f04 |
(3) 0x401d7c MOVN X3, #32784 |
(3) 0x401d80 SBFM X4, X2, #0, #31 |
(3) 0x401d84 ADD X18, X0, W2,SXTW #4 |
(3) 0x401d88 FMOV D26, X3 |
(5) 0x401d8c ANDS W8, W6, #64 |
(5) 0x401d90 LDP D27, D28, [X18] |
(5) 0x401d94 FMOV D30, D26 |
(5) 0x401d98 ORR X16, XZR, X10 |
(5) 0x401d9c MOVZ W13, #0 |
(5) 0x401da0 B.EQ 401e40 |
(5) 0x401da4 CMP W8, #1 |
(5) 0x401da8 B.EQ 401e0c |
(5) 0x401dac CMP W8, #2 |
(5) 0x401db0 B.EQ 401de0 |
(5) 0x401db4 LDP D31, D29, [X10] |
(5) 0x401db8 FSUB D0, D28, S29 |
(5) 0x401dbc FSUB D1, D27, S31 |
(5) 0x401dc0 FMUL D2, D0, D0 |
(5) 0x401dc4 FMADD D3, D1, D1, D2 |
(5) 0x401dc8 FCMPE D26, D3 |
(5) 0x401dcc B.LS 401dd8 |
(5) 0x401dd0 FMOV D30, D3 |
(5) 0x401dd4 STR WZR, [X7, X4,LSL #2] |
(5) 0x401dd8 MOVZ W13, #1 |
(5) 0x401ddc ADD X16, X10, #16 |
(5) 0x401de0 LDP D4, D5, [X16] |
(5) 0x401de4 FSUB D6, D28, S5 |
(5) 0x401de8 FSUB D7, D27, S4 |
(5) 0x401dec FMUL D16, D6, D6 |
(5) 0x401df0 FMADD D17, D7, D7, D16 |
(5) 0x401df4 FCMPE D30, D17 |
(5) 0x401df8 B.LS 401e04 |
(5) 0x401dfc FMOV D30, D17 |
(5) 0x401e00 STR W13, [X7, X4,LSL #2] |
(5) 0x401e04 ADD W13, W13, #1 |
(5) 0x401e08 ADD X16, X16, #16 |
(5) 0x401e0c LDP D18, D19, [X16] |
(5) 0x401e10 FSUB D20, D28, S19 |
(5) 0x401e14 FSUB D21, D27, S18 |
(5) 0x401e18 FMUL D22, D20, D20 |
(5) 0x401e1c FMADD D23, D21, D21, D22 |
(5) 0x401e20 FCMPE D30, D23 |
(5) 0x401e24 B.LS 401e30 |
(5) 0x401e28 FMOV D30, D23 |
(5) 0x401e2c STR W13, [X7, X4,LSL #2] |
(5) 0x401e30 ADD W13, W13, #1 |
(5) 0x401e34 ADD X16, X16, #16 |
(5) 0x401e38 CMP W6, W13 |
(5) 0x401e3c B.EQ 401ef4 |
(4) 0x401e40 LDP D24, D25, [X16] |
(4) 0x401e44 ADD X11, X16, #16 |
(4) 0x401e48 ADD W12, W13, #1 |
(4) 0x401e4c FSUB D29, D28, S25 |
(4) 0x401e50 FSUB D31, D27, S24 |
(4) 0x401e54 FMUL D0, D29, D29 |
(4) 0x401e58 FMADD D1, D31, D31, D0 |
(4) 0x401e5c FCMPE D30, D1 |
(4) 0x401e60 B.LS 401e6c |
(4) 0x401e64 FMOV D30, D1 |
(4) 0x401e68 STR W13, [X7, X4,LSL #2] |
(4) 0x401e6c LDR D4, [X11, #8] |
(4) 0x401e70 LDR D2, [X16, #16] |
(4) 0x401e74 FSUB D5, D28, S4 |
(4) 0x401e78 FSUB D3, D27, S2 |
(4) 0x401e7c FMUL D6, D5, D5 |
(4) 0x401e80 FMADD D7, D3, D3, D6 |
(4) 0x401e84 FCMPE D30, D7 |
(4) 0x401e88 B.LS 401e94 |
(4) 0x401e8c FMOV D30, D7 |
(4) 0x401e90 STR W12, [X7, X4,LSL #2] |
(4) 0x401e94 LDP D16, D17, [X11, #16] |
(4) 0x401e98 ADD W13, W12, #3 |
(4) 0x401e9c ADD W14, W12, #1 |
(4) 0x401ea0 ADD W15, W12, #2 |
(4) 0x401ea4 ADD X16, X11, #48 |
(4) 0x401ea8 FSUB D18, D28, S17 |
(4) 0x401eac FSUB D19, D27, S16 |
(4) 0x401eb0 FMUL D20, D18, D18 |
(4) 0x401eb4 FMADD D21, D19, D19, D20 |
(4) 0x401eb8 FCMPE D30, D21 |
(4) 0x401ebc B.LS 401ec8 |
(4) 0x401ec0 FMOV D30, D21 |
(4) 0x401ec4 STR W14, [X7, X4,LSL #2] |
(4) 0x401ec8 LDP D22, D23, [X11, #32] |
(4) 0x401ecc FSUB D24, D28, S23 |
(4) 0x401ed0 FSUB D25, D27, S22 |
(4) 0x401ed4 FMUL D29, D24, D24 |
(4) 0x401ed8 FMADD D31, D25, D25, D29 |
(4) 0x401edc FCMPE D30, D31 |
(4) 0x401ee0 B.LS 401eec |
(4) 0x401ee4 FMOV D30, D31 |
(4) 0x401ee8 STR W15, [X7, X4,LSL #2] |
(4) 0x401eec CMP W6, W13 |
(4) 0x401ef0 B.NE 401e40 |
(5) 0x401ef4 ADD X4, X4, #1 |
(5) 0x401ef8 ADD X18, X18, #16 |
(5) 0x401efc CMP W17, W4 |
(5) 0x401f00 B.GT 401d8c |
(3) 0x401f04 LDR X21, [SP, #32] |
(3) 0x401f08 LDP X19, X20, [SP, #16] |
(3) 0x401f0c LDP X29, X30, [SP], #48 |
(3) 0x401f10 RET |
(3) 0x401f14 ADD W9, W9, #1 |
(3) 0x401f18 MOVZ W1, #0 |
(3) 0x401f1c B 401d58 |
| Coverage (%) | Name | Source Location | Module |
|---|---|---|---|
| ►100.00+ | GOMP_parallel | libgomp.h:980 | libgomp.so.1.0.0 |
| ○ | k_means(int, point_t*, point_t[...] | main.cpp:73 | kmeans-gcc-O3-all |
| ○ | main | main.cpp:20 | kmeans-gcc-O3-all |
| ○ | __libc_start_call_main | libc.so.6 | |
| ○ | __libc_start_main | libc.so.6 | |
| ○ | _start | new_allocator.h:104 | kmeans-gcc-O3-all |
| Coverage (%) | Name | Source Location | Module |
|---|---|---|---|
| ►50.01+ | GOMP_parallel | libgomp.h:980 | libgomp.so.1.0.0 |
| ○ | k_means(int, point_t*, point_t[...] | main.cpp:73 | kmeans-gcc-O3-all |
| ○ | main | main.cpp:20 | kmeans-gcc-O3-all |
| ○ | __libc_start_call_main | libc.so.6 | |
| ○ | __libc_start_main | libc.so.6 | |
| ○ | _start | new_allocator.h:104 | kmeans-gcc-O3-all |
| ►49.99+ | gomp_thread_start | team.c:130 | libgomp.so.1.0.0 |
| ○ | start_thread | libc.so.6 | |
| ○ | thread_start | libc.so.6 |
| Coverage (%) | Name | Source Location | Module |
|---|---|---|---|
| ►75.00+ | gomp_thread_start | team.c:130 | libgomp.so.1.0.0 |
| ○ | start_thread | libc.so.6 | |
| ○ | thread_start | libc.so.6 | |
| ►25.00+ | GOMP_parallel | libgomp.h:980 | libgomp.so.1.0.0 |
| ○ | k_means(int, point_t*, point_t[...] | main.cpp:73 | kmeans-gcc-O3-all |
| ○ | main | main.cpp:20 | kmeans-gcc-O3-all |
| ○ | __libc_start_call_main | libc.so.6 | |
| ○ | __libc_start_main | libc.so.6 | |
| ○ | _start | new_allocator.h:104 | kmeans-gcc-O3-all |
| Coverage (%) | Name | Source Location | Module |
|---|---|---|---|
| ►87.46+ | gomp_thread_start | team.c:130 | libgomp.so.1.0.0 |
| ○ | start_thread | libc.so.6 | |
| ○ | thread_start | libc.so.6 | |
| ►12.54+ | GOMP_parallel | libgomp.h:980 | libgomp.so.1.0.0 |
| ○ | k_means(int, point_t*, point_t[...] | main.cpp:73 | kmeans-gcc-O3-all |
| ○ | main | main.cpp:20 | kmeans-gcc-O3-all |
| ○ | __libc_start_call_main | libc.so.6 | |
| ○ | __libc_start_main | libc.so.6 | |
| ○ | _start | new_allocator.h:104 | kmeans-gcc-O3-all |
| Coverage (%) | Name | Source Location | Module |
|---|---|---|---|
| ►93.72+ | gomp_thread_start | team.c:130 | libgomp.so.1.0.0 |
| ○ | start_thread | libc.so.6 | |
| ○ | thread_start | libc.so.6 | |
| ►6.28+ | GOMP_parallel | libgomp.h:980 | libgomp.so.1.0.0 |
| ○ | k_means(int, point_t*, point_t[...] | main.cpp:73 | kmeans-gcc-O3-all |
| ○ | main | main.cpp:20 | kmeans-gcc-O3-all |
| ○ | __libc_start_call_main | libc.so.6 | |
| ○ | __libc_start_main | libc.so.6 | |
| ○ | _start | new_allocator.h:104 | kmeans-gcc-O3-all |
| Coverage (%) | Name | Source Location | Module |
|---|---|---|---|
| ►96.87+ | gomp_thread_start | team.c:130 | libgomp.so.1.0.0 |
| ○ | start_thread | libc.so.6 | |
| ○ | thread_start | libc.so.6 | |
| ►3.13+ | GOMP_parallel | libgomp.h:980 | libgomp.so.1.0.0 |
| ○ | k_means(int, point_t*, point_t[...] | main.cpp:73 | kmeans-gcc-O3-all |
| ○ | main | main.cpp:20 | kmeans-gcc-O3-all |
| ○ | __libc_start_call_main | libc.so.6 | |
| ○ | __libc_start_main | libc.so.6 | |
| ○ | _start | new_allocator.h:104 | kmeans-gcc-O3-all |
| Coverage (%) | Name | Source Location | Module |
|---|---|---|---|
| ►97.91+ | gomp_thread_start | team.c:130 | libgomp.so.1.0.0 |
| ○ | start_thread | libc.so.6 | |
| ○ | thread_start | libc.so.6 | |
| ►2.09+ | GOMP_parallel | libgomp.h:980 | libgomp.so.1.0.0 |
| ○ | k_means(int, point_t*, point_t[...] | main.cpp:73 | kmeans-gcc-O3-all |
| ○ | main | main.cpp:20 | kmeans-gcc-O3-all |
| ○ | __libc_start_call_main | libc.so.6 | |
| ○ | __libc_start_main | libc.so.6 | |
| ○ | _start | new_allocator.h:104 | kmeans-gcc-O3-all |
| Coverage (%) | Name | Source Location | Module |
|---|---|---|---|
| ►98.44+ | gomp_thread_start | team.c:130 | libgomp.so.1.0.0 |
| ○ | start_thread | libc.so.6 | |
| ○ | thread_start | libc.so.6 | |
| ►1.56+ | GOMP_parallel | libgomp.h:980 | libgomp.so.1.0.0 |
| ○ | k_means(int, point_t*, point_t[...] | main.cpp:73 | kmeans-gcc-O3-all |
| ○ | main | main.cpp:20 | kmeans-gcc-O3-all |
| ○ | __libc_start_call_main | libc.so.6 | |
| ○ | __libc_start_main | libc.so.6 | |
| ○ | _start | new_allocator.h:104 | kmeans-gcc-O3-all |
| Coverage (%) | Name | Source Location | Module |
|---|---|---|---|
| ►98.75+ | gomp_thread_start | team.c:130 | libgomp.so.1.0.0 |
| ○ | start_thread | libc.so.6 | |
| ○ | thread_start | libc.so.6 | |
| ►1.25+ | GOMP_parallel | libgomp.h:980 | libgomp.so.1.0.0 |
| ○ | k_means(int, point_t*, point_t[...] | main.cpp:73 | kmeans-gcc-O3-all |
| ○ | main | main.cpp:20 | kmeans-gcc-O3-all |
| ○ | __libc_start_call_main | libc.so.6 | |
| ○ | __libc_start_main | libc.so.6 | |
| ○ | _start | new_allocator.h:104 | kmeans-gcc-O3-all |
| Coverage (%) | Name | Source Location | Module |
|---|---|---|---|
| ►98.95+ | gomp_thread_start | team.c:130 | libgomp.so.1.0.0 |
| ○ | start_thread | libc.so.6 | |
| ○ | thread_start | libc.so.6 | |
| ►1.05+ | GOMP_parallel | libgomp.h:980 | libgomp.so.1.0.0 |
| ○ | k_means(int, point_t*, point_t[...] | main.cpp:73 | kmeans-gcc-O3-all |
| ○ | main | main.cpp:20 | kmeans-gcc-O3-all |
| ○ | __libc_start_call_main | libc.so.6 | |
| ○ | __libc_start_main | libc.so.6 | |
| ○ | _start | new_allocator.h:104 | kmeans-gcc-O3-all |
| min | med | avg | max |
|---|---|---|---|
| Percentile Index | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
|---|---|---|---|---|---|---|---|---|---|---|
| Value |
| min | med | avg | max |
|---|---|---|---|
| Percentile Index | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
|---|---|---|---|---|---|---|---|---|---|---|
| Value |
| Path / |
The code analyzed by CQA in that panel excludes loops and represents 0.00% of application time for run run_1_thread
| Source file and lines | main.cpp:58-67 |
| Module | kmeans-gcc-O3-all |
| nb instructions | 14 |
| nb uops | 14 |
| loop length | 56 |
| used w registers | 7 |
| used x registers | 7 |
| used b registers | 0 |
| used h registers | 0 |
| used s registers | 1 |
| used d registers | 0 |
| used q registers | 0 |
| used v registers | 0 |
| used z registers | 0 |
| nb stack references | 3 |
| micro-operation queue | 1.75 cycles |
| front end | 1.75 cycles |
| P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | P15 | P16 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| uops | 1.50 | 1.50 | 1.67 | 1.50 | 1.67 | 1.67 | 2.00 | 1.50 | 0.00 | 0.00 | 0.00 | 0.00 | 1.50 | 1.50 | 1.00 | 1.50 | 1.50 |
| cycles | 1.50 | 1.50 | 1.67 | 1.50 | 1.67 | 1.67 | 2.00 | 1.50 | 0.00 | 0.00 | 0.00 | 0.00 | 1.50 | 1.50 | 1.00 | 1.50 | 1.50 |
| Cycles executing div or sqrt instructions | 5.00-12.50 |
| Front-end | 1.75 |
| Dispatch | 2.00 |
| DIV/SQRT | 5.00-12.50 |
| Overall L1 | 5.00-12.50 |
| all | 0% |
| load | 0% |
| store | 0% |
| mul | NA (no mul vectorizable/vectorized instructions) |
| add-sub | 0% |
| fma | NA (no fma vectorizable/vectorized instructions) |
| div/sqrt | 0% |
| other | 0% |
| all | 50% |
| load | 25% |
| store | 83% |
| mul | NA (no mul vectorizable/vectorized instructions) |
| add-sub | 50% |
| fma | NA (no fma vectorizable/vectorized instructions) |
| div/sqrt | 25% |
| other | 33% |
| Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | P15 | P16 | Latency | Recip. throughput | Vectorization |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| STP X29, X30, [SP, #976]! | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 | scal (100.0%) |
| ADD X29, SP, #0 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (50.0%) |
| STP X19, X20, [SP, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 | scal (100.0%) |
| ORR X20, XZR, X0 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (50.0%) |
| STR X21, [SP, #32] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 | scal (50.0%) |
| LDR W21, [X0, #24] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 | scal (25.0%) |
| BL 401780 <@plt_start@+0x270> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 | N/A |
| ORR W19, WZR, W0 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (25.0%) |
| BL 4016e0 <@plt_start@+0x1d0> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 | N/A |
| ORR W5, WZR, W0 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (25.0%) |
| SDIV W9, W21, W19 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5-12 | 5-12.50 | scal (25.0%) |
| MSUB W1, W9, W19, W21 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | N/A |
| CMP W0, W1 | 1 | 0 | 0 | 0.25 | 0.25 | 0 | 0 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.33 | N/A |
| B.LT 401f14 <_Z7k_meansiP7point_tS0_PiS0_ii._omp_fn.0+0x1f4> | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 | N/A |
The code analyzed by CQA in that panel excludes loops and represents 0.00% of application time for run run_1_thread
| Source file and lines | main.cpp:58-67 |
| Module | kmeans-gcc-O3-all |
| nb instructions | 14 |
| nb uops | 14 |
| loop length | 56 |
| used w registers | 7 |
| used x registers | 7 |
| used b registers | 0 |
| used h registers | 0 |
| used s registers | 1 |
| used d registers | 0 |
| used q registers | 0 |
| used v registers | 0 |
| used z registers | 0 |
| nb stack references | 3 |
| micro-operation queue | 1.75 cycles |
| front end | 1.75 cycles |
| P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | P15 | P16 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| uops | 1.50 | 1.50 | 1.67 | 1.50 | 1.67 | 1.67 | 2.00 | 1.50 | 0.00 | 0.00 | 0.00 | 0.00 | 1.50 | 1.50 | 1.00 | 1.50 | 1.50 |
| cycles | 1.50 | 1.50 | 1.67 | 1.50 | 1.67 | 1.67 | 2.00 | 1.50 | 0.00 | 0.00 | 0.00 | 0.00 | 1.50 | 1.50 | 1.00 | 1.50 | 1.50 |
| Cycles executing div or sqrt instructions | 5.00-12.50 |
| Front-end | 1.75 |
| Dispatch | 2.00 |
| DIV/SQRT | 5.00-12.50 |
| Overall L1 | 5.00-12.50 |
| all | 0% |
| load | 0% |
| store | 0% |
| mul | NA (no mul vectorizable/vectorized instructions) |
| add-sub | 0% |
| fma | NA (no fma vectorizable/vectorized instructions) |
| div/sqrt | 0% |
| other | 0% |
| all | 50% |
| load | 25% |
| store | 83% |
| mul | NA (no mul vectorizable/vectorized instructions) |
| add-sub | 50% |
| fma | NA (no fma vectorizable/vectorized instructions) |
| div/sqrt | 25% |
| other | 33% |
| Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | P15 | P16 | Latency | Recip. throughput | Vectorization |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| STP X29, X30, [SP, #976]! | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 | scal (100.0%) |
| ADD X29, SP, #0 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (50.0%) |
| STP X19, X20, [SP, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 | scal (100.0%) |
| ORR X20, XZR, X0 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (50.0%) |
| STR X21, [SP, #32] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 | scal (50.0%) |
| LDR W21, [X0, #24] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 | scal (25.0%) |
| BL 401780 <@plt_start@+0x270> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 | N/A |
| ORR W19, WZR, W0 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (25.0%) |
| BL 4016e0 <@plt_start@+0x1d0> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 | N/A |
| ORR W5, WZR, W0 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (25.0%) |
| SDIV W9, W21, W19 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5-12 | 5-12.50 | scal (25.0%) |
| MSUB W1, W9, W19, W21 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | N/A |
| CMP W0, W1 | 1 | 0 | 0 | 0.25 | 0.25 | 0 | 0 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.33 | N/A |
| B.LT 401f14 <_Z7k_meansiP7point_tS0_PiS0_ii._omp_fn.0+0x1f4> | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 | N/A |
| Run run_1_thread | Number processes: 1Number nodes: 1Run Command: <executable> input/100000000.in 1000 100000000 50 25MPI Command: Dataset: Run Directory: /home/fmusial/KMEANS_BenchmarksOMP_PROC_BIND: trueOMP_NUM_THREADS: 1 |
|---|---|
| Run run_2_threads | Number processes: 1Number nodes: 1Run Command: <executable> input/100000000.in 1000 100000000 50 25MPI Command: Dataset: Run Directory: /home/fmusial/KMEANS_BenchmarksOMP_PROC_BIND: trueOMP_NUM_THREADS: 2 |
| Run run_4_threads | Number processes: 1Number nodes: 1Run Command: <executable> input/100000000.in 1000 100000000 50 25MPI Command: Dataset: Run Directory: /home/fmusial/KMEANS_BenchmarksOMP_PROC_BIND: trueOMP_NUM_THREADS: 4 |
| Run run_8_threads | Number processes: 1Number nodes: 1Run Command: <executable> input/100000000.in 1000 100000000 50 25MPI Command: Dataset: Run Directory: /home/fmusial/KMEANS_BenchmarksOMP_PROC_BIND: trueOMP_NUM_THREADS: 8 |
| Run run_16_threads | Number processes: 1Number nodes: 1Run Command: <executable> input/100000000.in 1000 100000000 50 25MPI Command: Dataset: Run Directory: /home/fmusial/KMEANS_BenchmarksOMP_PROC_BIND: trueOMP_NUM_THREADS: 16 |
| Run run_32_threads | Number processes: 1Number nodes: 1Run Command: <executable> input/100000000.in 1000 100000000 50 25MPI Command: Dataset: Run Directory: /home/fmusial/KMEANS_BenchmarksOMP_PROC_BIND: trueOMP_NUM_THREADS: 32 |
| Run run_48_threads | Number processes: 1Number nodes: 1Run Command: <executable> input/100000000.in 1000 100000000 50 25MPI Command: Dataset: Run Directory: /home/fmusial/KMEANS_BenchmarksOMP_PROC_BIND: trueOMP_NUM_THREADS: 48 |
| Run run_64_threads | Number processes: 1Number nodes: 1Run Command: <executable> input/100000000.in 1000 100000000 50 25MPI Command: Dataset: Run Directory: /home/fmusial/KMEANS_BenchmarksOMP_PROC_BIND: trueOMP_NUM_THREADS: 64 |
| Run run_80_threads | Number processes: 1Number nodes: 1Run Command: <executable> input/100000000.in 1000 100000000 50 25MPI Command: Dataset: Run Directory: /home/fmusial/KMEANS_BenchmarksOMP_PROC_BIND: trueOMP_NUM_THREADS: 80 |
| Run run_96_threads | Number processes: 1Number nodes: 1Run Command: <executable> input/100000000.in 1000 100000000 50 25MPI Command: Dataset: Run Directory: /home/fmusial/KMEANS_BenchmarksOMP_PROC_BIND: trueOMP_NUM_THREADS: 96 |
| (run_1_thread) Efficiency | (run_1_thread) Potential Speed-Up (%) | (run_2_threads) Efficiency | (run_2_threads) Potential Speed-Up (%) | (run_4_threads) Efficiency | (run_4_threads) Potential Speed-Up (%) | (run_8_threads) Efficiency | (run_8_threads) Potential Speed-Up (%) | (run_16_threads) Efficiency | (run_16_threads) Potential Speed-Up (%) | (run_32_threads) Efficiency | (run_32_threads) Potential Speed-Up (%) | (run_48_threads) Efficiency | (run_48_threads) Potential Speed-Up (%) | (run_64_threads) Efficiency | (run_64_threads) Potential Speed-Up (%) | (run_80_threads) Efficiency | (run_80_threads) Potential Speed-Up (%) | (run_96_threads) Efficiency | (run_96_threads) Potential Speed-Up (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0 | 0.92 | 6.92 | 0.81 | 17.86 | 0.64 | 33.03 | 0.46 | 50.02 | 0.29 | 65.39 | 0.21 | 72.47 | 0.17 | 76.86 | 0.14 | 79.67 | 0.12 | 81.7 |
| Run | Number of threads | Efficiency (ideal is 1) | Speedup | Ideal Speedup | Time (s) | Coverage (%) |
|---|---|---|---|---|---|---|
| run_1_thread | 1 | 1 | 1 | 1 | 87.549995422363 | 92.220993041992 |
| run_2_threads | 2 | 0.92 | 1.85 | 2 | 43.91499710083 | 92.232551574707 |
| run_4_threads | 4 | 0.81 | 3.23 | 4 | 22.05499458313 | 92.234443664551 |
| run_8_threads | 8 | 0.64 | 5.14 | 8 | 11.089999198914 | 92.246726989746 |
| run_16_threads | 16 | 0.46 | 7.32 | 16 | 5.5700001716614 | 92.206588745117 |
| run_32_threads | 32 | 0.29 | 9.3 | 32 | 2.7950003147125 | 92.18985748291 |
| run_48_threads | 48 | 0.21 | 10.21 | 48 | 1.8749996423721 | 92.064796447754 |
| run_64_threads | 64 | 0.17 | 10.65 | 64 | 1.4549998044968 | 92.197914123535 |
| run_80_threads | 80 | 0.14 | 10.97 | 80 | 1.2199997901917 | 92.329376220703 |
| run_96_threads | 96 | 0.12 | 11.18 | 96 | 1.0550001859665 | 92.470695495605 |
| Name | Coverage (%) | Time (s) |
|---|---|---|
| ▼k_means(int, point_t*, point_t*, int*, point_t*, int, int) [clone ._omp_fn.0]– | 92.22 | 87.55 |
| ▼Loop 3 - main.cpp:58-67 - kmeans-gcc-O3-all– | 0.00 | 0.00 |
| ▼Loop 5 - main.cpp:60-67 - kmeans-gcc-O3-all– | 6.78 | 6.43 |
| ○Loop 4 - main.cpp:61-67 - kmeans-gcc-O3-all | 85.44 | 81.11 |
