Function: k_means(int, point_t*, point_t*, int*, point_t*, int, int) [clone ._omp_fn.0] | Module: kmeans-gcc-O3-funroll | Source: main.cpp:58-67 | Coverage (incl. loops): 91.69% | (excl. loops): 0.00% |
---|
Function: k_means(int, point_t*, point_t*, int*, point_t*, int, int) [clone ._omp_fn.0] | Module: kmeans-gcc-O3-funroll | Source: main.cpp:58-67 | Coverage (incl. loops): 91.69% | (excl. loops): 0.00% |
---|
/home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 58 - 67 |
-------------------------------------------------------------------------------- |
58: #pragma omp parallel for |
59: for (int i = 0; i < n; ++i) { |
60: double optimal_dist = DBL_MAX; |
61: for (int j = 0; j < k; ++j) { |
62: double dist = |
63: (points[i].x - centroids[j].x) * (points[i].x - centroids[j].x) + |
64: (points[i].y - centroids[j].y) * (points[i].y - centroids[j].y); |
65: if (dist < optimal_dist) { |
66: optimal_dist = dist; |
67: assignment[i] = j; |
0x401d20 STP X29, X30, [SP, #976]! |
0x401d24 ADD X29, SP, #0 |
0x401d28 STP X19, X20, [SP, #16] |
0x401d2c ORR X20, XZR, X0 |
0x401d30 STR X21, [SP, #32] |
0x401d34 LDR W21, [X0, #24] |
0x401d38 BL 401780 |
0x401d3c ORR W19, WZR, W0 |
0x401d40 BL 4016e0 |
0x401d44 ORR W4, WZR, W0 |
0x401d48 SDIV W7, W21, W19 |
0x401d4c MSUB W1, W7, W19, W21 |
0x401d50 CMP W0, W1 |
0x401d54 B.LT 401f30 |
0x401d58 MADD W2, W7, W4, W1 |
0x401d5c ADD W17, W7, W2 |
0x401d60 CMP W2, W17 |
0x401d64 B.GE 401ecc |
0x401d68 LDR W5, [X20, #28] |
0x401d6c LDP X0, X8, [X20] |
0x401d70 LDR X6, [X20, #16] |
0x401d74 CMP W5, #0 |
0x401d78 B.LE 401ecc |
0x401d7c MOVN X9, #32784 |
0x401d80 SBFM X3, X2, #0, #31 |
0x401d84 ADD X18, X0, W2,SXTW #4 |
0x401d88 FMOV D26, X9 |
(5) 0x401d8c ANDS W10, W5, #64 |
(5) 0x401d90 LDP D27, D28, [X18] |
(5) 0x401d94 FMOV D30, D26 |
(5) 0x401d98 ORR X16, XZR, X8 |
(5) 0x401d9c MOVZ W15, #0 |
(5) 0x401da0 B.EQ 401e28 |
(5) 0x401da4 CMP W10, #1 |
(5) 0x401da8 B.EQ 401dfc |
(5) 0x401dac CMP W10, #2 |
(5) 0x401db0 B.EQ 401dd8 |
(5) 0x401db4 LDP D31, D29, [X8] |
(5) 0x401db8 FSUB D0, D28, S29 |
(5) 0x401dbc FSUB D1, D27, S31 |
(5) 0x401dc0 FMUL D2, D0, D0 |
(5) 0x401dc4 FMADD D3, D1, D1, D2 |
(5) 0x401dc8 FCMPE D26, D3 |
(5) 0x401dcc B.GT 401f24 |
(5) 0x401dd0 MOVZ W15, #1 |
(5) 0x401dd4 ADD X16, X8, #16 |
(5) 0x401dd8 LDP D4, D5, [X16] |
(5) 0x401ddc FSUB D6, D28, S5 |
(5) 0x401de0 FSUB D7, D27, S4 |
(5) 0x401de4 FMUL D16, D6, D6 |
(5) 0x401de8 FMADD D17, D7, D7, D16 |
(5) 0x401dec FCMPE D30, D17 |
(5) 0x401df0 B.GT 401f18 |
(5) 0x401df4 ADD W15, W15, #1 |
(5) 0x401df8 ADD X16, X16, #16 |
(5) 0x401dfc LDP D18, D19, [X16] |
(5) 0x401e00 FSUB D20, D28, S19 |
(5) 0x401e04 FSUB D21, D27, S18 |
(5) 0x401e08 FMUL D22, D20, D20 |
(5) 0x401e0c FMADD D23, D21, D21, D22 |
(5) 0x401e10 FCMPE D30, D23 |
(5) 0x401e14 B.GT 401f0c |
(5) 0x401e18 ADD W15, W15, #1 |
(5) 0x401e1c ADD X16, X16, #16 |
(5) 0x401e20 CMP W5, W15 |
(5) 0x401e24 B.EQ 401ebc |
(4) 0x401e28 LDP D24, D25, [X16] |
(4) 0x401e2c FSUB D29, D28, S25 |
(4) 0x401e30 FSUB D31, D27, S24 |
(4) 0x401e34 FMUL D0, D29, D29 |
(4) 0x401e38 FMADD D1, D31, D31, D0 |
(4) 0x401e3c FCMPE D30, D1 |
(4) 0x401e40 B.GT 401f00 |
(4) 0x401e44 LDR D4, [X16, #24] |
(4) 0x401e48 ADD X11, X16, #16 |
(4) 0x401e4c ADD W12, W15, #1 |
(4) 0x401e50 LDR D2, [X16, #16] |
(4) 0x401e54 FSUB D5, D28, S4 |
(4) 0x401e58 FSUB D3, D27, S2 |
(4) 0x401e5c FMUL D6, D5, D5 |
(4) 0x401e60 FMADD D7, D3, D3, D6 |
(4) 0x401e64 FCMPE D30, D7 |
(4) 0x401e68 B.GT 401ef4 |
(4) 0x401e6c LDP D16, D17, [X11, #16] |
(4) 0x401e70 ADD W13, W12, #1 |
(4) 0x401e74 FSUB D18, D28, S17 |
(4) 0x401e78 FSUB D19, D27, S16 |
(4) 0x401e7c FMUL D20, D18, D18 |
(4) 0x401e80 FMADD D21, D19, D19, D20 |
(4) 0x401e84 FCMPE D30, D21 |
(4) 0x401e88 B.GT 401ee8 |
(4) 0x401e8c LDP D22, D23, [X11, #32] |
(4) 0x401e90 ADD W14, W12, #2 |
(4) 0x401e94 FSUB D24, D28, S23 |
(4) 0x401e98 FSUB D25, D27, S22 |
(4) 0x401e9c FMUL D29, D24, D24 |
(4) 0x401ea0 FMADD D31, D25, D25, D29 |
(4) 0x401ea4 FCMPE D30, D31 |
(4) 0x401ea8 B.GT 401edc |
(4) 0x401eac ADD W15, W12, #3 |
(4) 0x401eb0 ADD X16, X11, #48 |
(4) 0x401eb4 CMP W5, W15 |
(4) 0x401eb8 B.NE 401e28 |
(5) 0x401ebc ADD X3, X3, #1 |
(5) 0x401ec0 ADD X18, X18, #16 |
(5) 0x401ec4 CMP W17, W3 |
(5) 0x401ec8 B.GT 401d8c |
(6) 0x401ecc LDR X21, [SP, #32] |
(6) 0x401ed0 LDP X19, X20, [SP, #16] |
(6) 0x401ed4 LDP X29, X30, [SP], #48 |
(6) 0x401ed8 RET |
(3) 0x401edc FMOV D30, D31 |
(3) 0x401ee0 STR W14, [X6, X3,LSL #2] |
(3) 0x401ee4 B 401eac |
(4) 0x401ee8 FMOV D30, D21 |
(4) 0x401eec STR W13, [X6, X3,LSL #2] |
(4) 0x401ef0 B 401e8c |
(4) 0x401ef4 FMOV D30, D7 |
(4) 0x401ef8 STR W12, [X6, X3,LSL #2] |
(4) 0x401efc B 401e6c |
(4) 0x401f00 FMOV D30, D1 |
(4) 0x401f04 STR W15, [X6, X3,LSL #2] |
(4) 0x401f08 B 401e44 |
(5) 0x401f0c FMOV D30, D23 |
(5) 0x401f10 STR W15, [X6, X3,LSL #2] |
(5) 0x401f14 B 401e18 |
(5) 0x401f18 FMOV D30, D17 |
(5) 0x401f1c STR W15, [X6, X3,LSL #2] |
(5) 0x401f20 B 401df4 |
(5) 0x401f24 FMOV D30, D3 |
(5) 0x401f28 STR WZR, [X6, X3,LSL #2] |
(5) 0x401f2c B 401dd0 |
0x401f30 ADD W7, W7, #1 |
0x401f34 MOVZ W1, #0 |
0x401f38 B 401d58 |
0x401f3c HINT #0 |
Coverage (%) | Name | Source Location | Module |
---|---|---|---|
►100.00+ | GOMP_parallel | libgomp.h:980 | libgomp.so.1.0.0 |
○ | k_means(int, point_t*, point_t[...] | main.cpp:73 | kmeans-gcc-O3-funroll |
○ | main | main.cpp:20 | kmeans-gcc-O3-funroll |
○ | __libc_start_call_main | libc.so.6 | |
○ | __libc_start_main | libc.so.6 | |
○ | _start | new_allocator.h:104 | kmeans-gcc-O3-funroll |
Coverage (%) | Name | Source Location | Module |
---|---|---|---|
►50.01+ | GOMP_parallel | libgomp.h:980 | libgomp.so.1.0.0 |
○ | k_means(int, point_t*, point_t[...] | main.cpp:73 | kmeans-gcc-O3-funroll |
○ | main | main.cpp:20 | kmeans-gcc-O3-funroll |
○ | __libc_start_call_main | libc.so.6 | |
○ | __libc_start_main | libc.so.6 | |
○ | _start | new_allocator.h:104 | kmeans-gcc-O3-funroll |
►49.99+ | gomp_thread_start | team.c:130 | libgomp.so.1.0.0 |
○ | start_thread | libc.so.6 | |
○ | thread_start | libc.so.6 |
Coverage (%) | Name | Source Location | Module |
---|---|---|---|
►74.96+ | gomp_thread_start | team.c:130 | libgomp.so.1.0.0 |
○ | start_thread | libc.so.6 | |
○ | thread_start | libc.so.6 | |
►25.04+ | GOMP_parallel | libgomp.h:980 | libgomp.so.1.0.0 |
○ | k_means(int, point_t*, point_t[...] | main.cpp:73 | kmeans-gcc-O3-funroll |
○ | main | main.cpp:20 | kmeans-gcc-O3-funroll |
○ | __libc_start_call_main | libc.so.6 | |
○ | __libc_start_main | libc.so.6 | |
○ | _start | new_allocator.h:104 | kmeans-gcc-O3-funroll |
Coverage (%) | Name | Source Location | Module |
---|---|---|---|
►87.49+ | gomp_thread_start | team.c:130 | libgomp.so.1.0.0 |
○ | start_thread | libc.so.6 | |
○ | thread_start | libc.so.6 | |
►12.51+ | GOMP_parallel | libgomp.h:980 | libgomp.so.1.0.0 |
○ | k_means(int, point_t*, point_t[...] | main.cpp:73 | kmeans-gcc-O3-funroll |
○ | main | main.cpp:20 | kmeans-gcc-O3-funroll |
○ | __libc_start_call_main | libc.so.6 | |
○ | __libc_start_main | libc.so.6 | |
○ | _start | new_allocator.h:104 | kmeans-gcc-O3-funroll |
Coverage (%) | Name | Source Location | Module |
---|---|---|---|
►93.79+ | gomp_thread_start | team.c:130 | libgomp.so.1.0.0 |
○ | start_thread | libc.so.6 | |
○ | thread_start | libc.so.6 | |
►6.21+ | GOMP_parallel | libgomp.h:980 | libgomp.so.1.0.0 |
○ | k_means(int, point_t*, point_t[...] | main.cpp:73 | kmeans-gcc-O3-funroll |
○ | main | main.cpp:20 | kmeans-gcc-O3-funroll |
○ | __libc_start_call_main | libc.so.6 | |
○ | __libc_start_main | libc.so.6 | |
○ | _start | new_allocator.h:104 | kmeans-gcc-O3-funroll |
Coverage (%) | Name | Source Location | Module |
---|---|---|---|
►96.88+ | gomp_thread_start | team.c:130 | libgomp.so.1.0.0 |
○ | start_thread | libc.so.6 | |
○ | thread_start | libc.so.6 | |
►3.12+ | GOMP_parallel | libgomp.h:980 | libgomp.so.1.0.0 |
○ | k_means(int, point_t*, point_t[...] | main.cpp:73 | kmeans-gcc-O3-funroll |
○ | main | main.cpp:20 | kmeans-gcc-O3-funroll |
○ | __libc_start_call_main | libc.so.6 | |
○ | __libc_start_main | libc.so.6 | |
○ | _start | new_allocator.h:104 | kmeans-gcc-O3-funroll |
Coverage (%) | Name | Source Location | Module |
---|---|---|---|
►97.91+ | gomp_thread_start | team.c:130 | libgomp.so.1.0.0 |
○ | start_thread | libc.so.6 | |
○ | thread_start | libc.so.6 | |
►2.09+ | GOMP_parallel | libgomp.h:980 | libgomp.so.1.0.0 |
○ | k_means(int, point_t*, point_t[...] | main.cpp:73 | kmeans-gcc-O3-funroll |
○ | main | main.cpp:20 | kmeans-gcc-O3-funroll |
○ | __libc_start_call_main | libc.so.6 | |
○ | __libc_start_main | libc.so.6 | |
○ | _start | new_allocator.h:104 | kmeans-gcc-O3-funroll |
Coverage (%) | Name | Source Location | Module |
---|---|---|---|
►98.44+ | gomp_thread_start | team.c:130 | libgomp.so.1.0.0 |
○ | start_thread | libc.so.6 | |
○ | thread_start | libc.so.6 | |
►1.56+ | GOMP_parallel | libgomp.h:980 | libgomp.so.1.0.0 |
○ | k_means(int, point_t*, point_t[...] | main.cpp:73 | kmeans-gcc-O3-funroll |
○ | main | main.cpp:20 | kmeans-gcc-O3-funroll |
○ | __libc_start_call_main | libc.so.6 | |
○ | __libc_start_main | libc.so.6 | |
○ | _start | new_allocator.h:104 | kmeans-gcc-O3-funroll |
Coverage (%) | Name | Source Location | Module |
---|---|---|---|
►98.75+ | gomp_thread_start | team.c:130 | libgomp.so.1.0.0 |
○ | start_thread | libc.so.6 | |
○ | thread_start | libc.so.6 | |
►1.25+ | GOMP_parallel | libgomp.h:980 | libgomp.so.1.0.0 |
○ | k_means(int, point_t*, point_t[...] | main.cpp:73 | kmeans-gcc-O3-funroll |
○ | main | main.cpp:20 | kmeans-gcc-O3-funroll |
○ | __libc_start_call_main | libc.so.6 | |
○ | __libc_start_main | libc.so.6 | |
○ | _start | new_allocator.h:104 | kmeans-gcc-O3-funroll |
Coverage (%) | Name | Source Location | Module |
---|---|---|---|
►98.95+ | gomp_thread_start | team.c:130 | libgomp.so.1.0.0 |
○ | start_thread | libc.so.6 | |
○ | thread_start | libc.so.6 | |
►1.05+ | GOMP_parallel | libgomp.h:980 | libgomp.so.1.0.0 |
○ | k_means(int, point_t*, point_t[...] | main.cpp:73 | kmeans-gcc-O3-funroll |
○ | main | main.cpp:20 | kmeans-gcc-O3-funroll |
○ | __libc_start_call_main | libc.so.6 | |
○ | __libc_start_main | libc.so.6 | |
○ | _start | new_allocator.h:104 | kmeans-gcc-O3-funroll |
min | med | avg | max |
---|---|---|---|
Percentile Index | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
---|---|---|---|---|---|---|---|---|---|---|
Value |
min | med | avg | max |
---|---|---|---|
Percentile Index | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
---|---|---|---|---|---|---|---|---|---|---|
Value |
Path / |
The code analyzed by CQA in that panel excludes loops and represents 0.00% of application time for run run_1_thread
Source file and lines | main.cpp:58-67 |
Module | kmeans-gcc-O3-funroll |
nb instructions | 31 |
nb uops | 30 |
loop length | 124 |
used w registers | 10 |
used x registers | 13 |
used b registers | 0 |
used h registers | 0 |
used s registers | 1 |
used d registers | 1 |
used q registers | 0 |
used v registers | 0 |
used z registers | 0 |
nb stack references | 3 |
micro-operation queue | 3.75 cycles |
front end | 3.75 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | P15 | P16 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 3.00 | 3.00 | 3.17 | 3.17 | 3.17 | 3.17 | 3.17 | 3.17 | 0.25 | 0.25 | 0.25 | 0.25 | 2.50 | 2.17 | 2.33 | 1.50 | 1.50 |
cycles | 3.00 | 3.00 | 3.17 | 3.17 | 3.17 | 3.17 | 3.17 | 3.17 | 0.25 | 0.25 | 0.25 | 0.25 | 2.50 | 2.17 | 2.33 | 1.50 | 1.50 |
Cycles executing div or sqrt instructions | 5.00-12.50 |
Front-end | 3.75 |
Dispatch | 3.17 |
DIV/SQRT | 5.00-12.50 |
Overall L1 | 5.00-12.50 |
all | 0% |
load | 0% |
store | 0% |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 0% |
fma | 0% |
other | 0% |
all | 0% |
load | NA (no load vectorizable/vectorized instructions) |
store | NA (no store vectorizable/vectorized instructions) |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | NA (no add-sub vectorizable/vectorized instructions) |
fma | NA (no fma vectorizable/vectorized instructions) |
div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
other | 0% |
all | 0% |
load | 0% |
store | 0% |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 0% |
fma | 0% |
div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
other | 0% |
all | 43% |
load | 33% |
store | 83% |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 41% |
fma | 25% |
other | 37% |
all | 50% |
load | NA (no load vectorizable/vectorized instructions) |
store | NA (no store vectorizable/vectorized instructions) |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | NA (no add-sub vectorizable/vectorized instructions) |
fma | NA (no fma vectorizable/vectorized instructions) |
div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
other | 50% |
all | 43% |
load | 33% |
store | 83% |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 41% |
fma | 25% |
div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
other | 38% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | P15 | P16 | Latency | Recip. throughput | Vectorization |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
STP X29, X30, [SP, #976]! | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 | scal (100.0%) |
ADD X29, SP, #0 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (50.0%) |
STP X19, X20, [SP, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 | scal (100.0%) |
ORR X20, XZR, X0 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | N/A |
STR X21, [SP, #32] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 | scal (50.0%) |
LDR W21, [X0, #24] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 | scal (25.0%) |
BL 401780 <@plt_start@+0x270> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 | N/A |
ORR W19, WZR, W0 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (25.0%) |
BL 4016e0 <@plt_start@+0x1d0> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 | N/A |
ORR W4, WZR, W0 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (25.0%) |
SDIV W7, W21, W19 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5-12 | 5-12.50 | N/A |
MSUB W1, W7, W19, W21 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | scal (25.0%) |
CMP W0, W1 | 1 | 0 | 0 | 0.25 | 0.25 | 0 | 0 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.33 | scal (25.0%) |
B.LT 401f30 <_Z7k_meansiP7point_tS0_PiS0_ii._omp_fn.0+0x210> | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 | N/A |
MADD W2, W7, W4, W1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | scal (25.0%) |
ADD W17, W7, W2 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (25.0%) |
CMP W2, W17 | 1 | 0 | 0 | 0.25 | 0.25 | 0 | 0 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.33 | scal (25.0%) |
B.GE 401ecc <_Z7k_meansiP7point_tS0_PiS0_ii._omp_fn.0+0x1ac> | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 | N/A |
LDR W5, [X20, #28] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 | scal (25.0%) |
LDP X0, X8, [X20] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.50 | N/A |
LDR X6, [X20, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 | scal (50.0%) |
CMP W5, #0 | 1 | 0 | 0 | 0.25 | 0.25 | 0 | 0 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.33 | scal (25.0%) |
B.LE 401ecc <_Z7k_meansiP7point_tS0_PiS0_ii._omp_fn.0+0x1ac> | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 | N/A |
MOVN X9, #32784 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (50.0%) |
SBFM X3, X2, #0, #31 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (100.0%) |
ADD X18, X0, W2,SXTW #4 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 | scal (50.0%) |
FMOV D26, X9 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 2 | 0.25 | scal (50.0%) |
ADD W7, W7, #1 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | N/A |
MOVZ W1, #0 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (25.0%) |
B 401d58 <_Z7k_meansiP7point_tS0_PiS0_ii._omp_fn.0+0x38> | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 | N/A |
HINT #0 | N/A |
The code analyzed by CQA in that panel excludes loops and represents 0.00% of application time for run run_1_thread
Source file and lines | main.cpp:58-67 |
Module | kmeans-gcc-O3-funroll |
nb instructions | 31 |
nb uops | 30 |
loop length | 124 |
used w registers | 10 |
used x registers | 13 |
used b registers | 0 |
used h registers | 0 |
used s registers | 1 |
used d registers | 1 |
used q registers | 0 |
used v registers | 0 |
used z registers | 0 |
nb stack references | 3 |
micro-operation queue | 3.75 cycles |
front end | 3.75 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | P15 | P16 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 3.00 | 3.00 | 3.17 | 3.17 | 3.17 | 3.17 | 3.17 | 3.17 | 0.25 | 0.25 | 0.25 | 0.25 | 2.50 | 2.17 | 2.33 | 1.50 | 1.50 |
cycles | 3.00 | 3.00 | 3.17 | 3.17 | 3.17 | 3.17 | 3.17 | 3.17 | 0.25 | 0.25 | 0.25 | 0.25 | 2.50 | 2.17 | 2.33 | 1.50 | 1.50 |
Cycles executing div or sqrt instructions | 5.00-12.50 |
Front-end | 3.75 |
Dispatch | 3.17 |
DIV/SQRT | 5.00-12.50 |
Overall L1 | 5.00-12.50 |
all | 0% |
load | 0% |
store | 0% |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 0% |
fma | 0% |
other | 0% |
all | 0% |
load | NA (no load vectorizable/vectorized instructions) |
store | NA (no store vectorizable/vectorized instructions) |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | NA (no add-sub vectorizable/vectorized instructions) |
fma | NA (no fma vectorizable/vectorized instructions) |
div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
other | 0% |
all | 0% |
load | 0% |
store | 0% |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 0% |
fma | 0% |
div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
other | 0% |
all | 43% |
load | 33% |
store | 83% |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 41% |
fma | 25% |
other | 37% |
all | 50% |
load | NA (no load vectorizable/vectorized instructions) |
store | NA (no store vectorizable/vectorized instructions) |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | NA (no add-sub vectorizable/vectorized instructions) |
fma | NA (no fma vectorizable/vectorized instructions) |
div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
other | 50% |
all | 43% |
load | 33% |
store | 83% |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 41% |
fma | 25% |
div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
other | 38% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | P15 | P16 | Latency | Recip. throughput | Vectorization |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
STP X29, X30, [SP, #976]! | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 | scal (100.0%) |
ADD X29, SP, #0 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (50.0%) |
STP X19, X20, [SP, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 | scal (100.0%) |
ORR X20, XZR, X0 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | N/A |
STR X21, [SP, #32] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 | scal (50.0%) |
LDR W21, [X0, #24] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 | scal (25.0%) |
BL 401780 <@plt_start@+0x270> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 | N/A |
ORR W19, WZR, W0 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (25.0%) |
BL 4016e0 <@plt_start@+0x1d0> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 | N/A |
ORR W4, WZR, W0 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (25.0%) |
SDIV W7, W21, W19 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5-12 | 5-12.50 | N/A |
MSUB W1, W7, W19, W21 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | scal (25.0%) |
CMP W0, W1 | 1 | 0 | 0 | 0.25 | 0.25 | 0 | 0 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.33 | scal (25.0%) |
B.LT 401f30 <_Z7k_meansiP7point_tS0_PiS0_ii._omp_fn.0+0x210> | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 | N/A |
MADD W2, W7, W4, W1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | scal (25.0%) |
ADD W17, W7, W2 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (25.0%) |
CMP W2, W17 | 1 | 0 | 0 | 0.25 | 0.25 | 0 | 0 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.33 | scal (25.0%) |
B.GE 401ecc <_Z7k_meansiP7point_tS0_PiS0_ii._omp_fn.0+0x1ac> | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 | N/A |
LDR W5, [X20, #28] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 | scal (25.0%) |
LDP X0, X8, [X20] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.50 | N/A |
LDR X6, [X20, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 | scal (50.0%) |
CMP W5, #0 | 1 | 0 | 0 | 0.25 | 0.25 | 0 | 0 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.33 | scal (25.0%) |
B.LE 401ecc <_Z7k_meansiP7point_tS0_PiS0_ii._omp_fn.0+0x1ac> | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 | N/A |
MOVN X9, #32784 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (50.0%) |
SBFM X3, X2, #0, #31 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (100.0%) |
ADD X18, X0, W2,SXTW #4 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 | scal (50.0%) |
FMOV D26, X9 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 2 | 0.25 | scal (50.0%) |
ADD W7, W7, #1 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | N/A |
MOVZ W1, #0 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (25.0%) |
B 401d58 <_Z7k_meansiP7point_tS0_PiS0_ii._omp_fn.0+0x38> | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 | N/A |
HINT #0 | N/A |
Run run_1_thread | Number processes: 1Number nodes: 1Run Command: <executable> input/100000000.in 1000 100000000 50 25MPI Command: Dataset: Run Directory: /home/fmusial/KMEANS_BenchmarksOMP_PROC_BIND: trueOMP_NUM_THREADS: 1 |
---|---|
Run run_2_threads | Number processes: 1Number nodes: 1Run Command: <executable> input/100000000.in 1000 100000000 50 25MPI Command: Dataset: Run Directory: /home/fmusial/KMEANS_BenchmarksOMP_PROC_BIND: trueOMP_NUM_THREADS: 2 |
Run run_4_threads | Number processes: 1Number nodes: 1Run Command: <executable> input/100000000.in 1000 100000000 50 25MPI Command: Dataset: Run Directory: /home/fmusial/KMEANS_BenchmarksOMP_PROC_BIND: trueOMP_NUM_THREADS: 4 |
Run run_8_threads | Number processes: 1Number nodes: 1Run Command: <executable> input/100000000.in 1000 100000000 50 25MPI Command: Dataset: Run Directory: /home/fmusial/KMEANS_BenchmarksOMP_PROC_BIND: trueOMP_NUM_THREADS: 8 |
Run run_16_threads | Number processes: 1Number nodes: 1Run Command: <executable> input/100000000.in 1000 100000000 50 25MPI Command: Dataset: Run Directory: /home/fmusial/KMEANS_BenchmarksOMP_PROC_BIND: trueOMP_NUM_THREADS: 16 |
Run run_32_threads | Number processes: 1Number nodes: 1Run Command: <executable> input/100000000.in 1000 100000000 50 25MPI Command: Dataset: Run Directory: /home/fmusial/KMEANS_BenchmarksOMP_PROC_BIND: trueOMP_NUM_THREADS: 32 |
Run run_48_threads | Number processes: 1Number nodes: 1Run Command: <executable> input/100000000.in 1000 100000000 50 25MPI Command: Dataset: Run Directory: /home/fmusial/KMEANS_BenchmarksOMP_PROC_BIND: trueOMP_NUM_THREADS: 48 |
Run run_64_threads | Number processes: 1Number nodes: 1Run Command: <executable> input/100000000.in 1000 100000000 50 25MPI Command: Dataset: Run Directory: /home/fmusial/KMEANS_BenchmarksOMP_PROC_BIND: trueOMP_NUM_THREADS: 64 |
Run run_80_threads | Number processes: 1Number nodes: 1Run Command: <executable> input/100000000.in 1000 100000000 50 25MPI Command: Dataset: Run Directory: /home/fmusial/KMEANS_BenchmarksOMP_PROC_BIND: trueOMP_NUM_THREADS: 80 |
Run run_96_threads | Number processes: 1Number nodes: 1Run Command: <executable> input/100000000.in 1000 100000000 50 25MPI Command: Dataset: Run Directory: /home/fmusial/KMEANS_BenchmarksOMP_PROC_BIND: trueOMP_NUM_THREADS: 96 |
(run_1_thread) Efficiency | (run_1_thread) Potential Speed-Up (%) | (run_2_threads) Efficiency | (run_2_threads) Potential Speed-Up (%) | (run_4_threads) Efficiency | (run_4_threads) Potential Speed-Up (%) | (run_8_threads) Efficiency | (run_8_threads) Potential Speed-Up (%) | (run_16_threads) Efficiency | (run_16_threads) Potential Speed-Up (%) | (run_32_threads) Efficiency | (run_32_threads) Potential Speed-Up (%) | (run_48_threads) Efficiency | (run_48_threads) Potential Speed-Up (%) | (run_64_threads) Efficiency | (run_64_threads) Potential Speed-Up (%) | (run_80_threads) Efficiency | (run_80_threads) Potential Speed-Up (%) | (run_96_threads) Efficiency | (run_96_threads) Potential Speed-Up (%) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0 | 0.92 | 7.32 | 0.8 | 18.7 | 0.63 | 34.16 | 0.44 | 51.17 | 0.28 | 66.47 | 0.2 | 73.24 | 0.16 | 77.51 | 0.13 | 80.6 | 0.11 | 82.45 |
Run | Number of threads | Efficiency (ideal is 1) | Speedup | Ideal Speedup | Time (s) | Coverage (%) |
---|---|---|---|---|---|---|
run_1_thread | 1 | 1 | 1 | 1 | 81.53498840332 | 91.689605712891 |
run_2_threads | 2 | 0.92 | 1.84 | 2 | 40.925006866455 | 91.718040466309 |
run_4_threads | 4 | 0.8 | 3.18 | 4 | 20.529998779297 | 91.707214355469 |
run_8_threads | 8 | 0.63 | 5.02 | 8 | 10.320000648499 | 91.720458984375 |
run_16_threads | 16 | 0.44 | 7.07 | 16 | 5.2049994468689 | 91.709983825684 |
run_32_threads | 32 | 0.28 | 8.83 | 32 | 2.6850006580353 | 91.792808532715 |
run_48_threads | 48 | 0.2 | 9.66 | 48 | 1.7949998378754 | 91.69905090332 |
run_64_threads | 64 | 0.16 | 10.06 | 64 | 1.4199998378754 | 91.970985412598 |
run_80_threads | 80 | 0.13 | 10.23 | 80 | 1.2349998950958 | 92.423080444336 |
run_96_threads | 96 | 0.11 | 10.41 | 96 | 1.0649999380112 | 92.474327087402 |
Name | Coverage (%) | Time (s) |
---|---|---|
▼k_means(int, point_t*, point_t*, int*, point_t*, int, int) [clone ._omp_fn.0]– | 91.69 | 81.53 |
▼Loop 6 - main.cpp:58-67 - kmeans-gcc-O3-funroll– | 0.00 | 0.00 |
▼Loop 3 - main.cpp:60-67 - kmeans-gcc-O3-funroll– | 0.00 | 0.00 |
▼Loop 4 - main.cpp:60-67 - kmeans-gcc-O3-funroll– | 85.37 | 75.92 |
○Loop 5 - main.cpp:60-67 - kmeans-gcc-O3-funroll | 6.32 | 5.62 |