Function: hypre_SeqVectorInnerProd._omp_fn.0 | Module: exec | Source: vector.c:483-486 | Coverage: 0.71% |
---|
Function: hypre_SeqVectorInnerProd._omp_fn.0 | Module: exec | Source: vector.c:483-486 | Coverage: 0.71% |
---|
/home/hbollore/qaas/qaas-runs/169-817-3176/intel/AMG/build/AMG/AMG/seq_mv/vector.c: 483 - 486 |
-------------------------------------------------------------------------------- |
483: #pragma omp parallel for private(i) reduction(+:result) HYPRE_SMP_SCHEDULE |
484: #endif |
485: for (i = 0; i < size; i++) |
486: result += hypre_conj(y_data[i]) * x_data[i]; |
0x4fcf20 STP X29, X30, [SP, #976]! |
0x4fcf24 ADD X29, SP, #0 |
0x4fcf28 STP X21, X22, [SP, #32] |
0x4fcf2c LDR X22, [X0, #16] |
0x4fcf30 STP X19, X20, [SP, #16] |
0x4fcf34 ORR X19, XZR, X0 |
0x4fcf38 BL 40f400 |
0x4fcf3c SBFM X20, X0, #0, #31 |
0x4fcf40 BL 40f150 |
0x4fcf44 SDIV X3, X22, X20 |
0x4fcf48 SBFM X2, X0, #0, #31 |
0x4fcf4c LDP X21, X4, [X19] |
0x4fcf50 MSUB X0, X3, X20, X22 |
0x4fcf54 CMP X2, X0 |
0x4fcf58 B.LT 4fcfe0 |
(3844) 0x4fcf5c MADD X1, X3, X2, X0 |
(3844) 0x4fcf60 MOVI D5, #0 |
(3844) 0x4fcf64 ADD X5, X3, X1 |
(3844) 0x4fcf68 CMP X1, X5 |
(3844) 0x4fcf6c B.GE 4fcfac |
(3844) 0x4fcf70 UBFM X6, X1, #61, #60 |
(3844) 0x4fcf74 MOVZ X10, #0 |
(3844) 0x4fcf78 WHILELO P0.D, XZR, X3 |
(3844) 0x4fcf7c ADD X7, X4, X6 |
(3844) 0x4fcf80 ADD X8, X21, X6 |
(3844) 0x4fcf84 DUP Z0.D, #0 |
(3844) 0x4fcf88 CNTD X9, ALL |
(3845) 0x4fcf8c LD1D {Z2.D}, P0/Z, [X7, X10,LSL #3] |
(3845) 0x4fcf90 LD1D {Z1.D}, P0/Z, [X8, X10,LSL #3] |
(3845) 0x4fcf94 ADD X10, X10, X9 |
(3845) 0x4fcf98 FMLA Z0.D, P0/M, Z1.D, Z2.D |
(3845) 0x4fcf9c WHILELO P0.D, X10, X3 |
(3845) 0x4fcfa0 B.NE 4fcf8c |
(3844) 0x4fcfa4 PTRUE P1.B, ALL |
(3844) 0x4fcfa8 FADDV D5, P1, Z0.D |
(3844) 0x4fcfac ADD X13, X19, #24 |
(3844) 0x4fcfb0 LDR X1, [X13] |
(3843) 0x4fcfb4 FMOV D3, X1 |
(3843) 0x4fcfb8 ORR X12, XZR, X1 |
(3843) 0x4fcfbc FADD D4, D5, D3 |
(3843) 0x4fcfc0 FMOV X11, D4 |
(3843) 0x4fcfc4 CASAL X12, X11, [X13] |
(3843) 0x4fcfc8 CMP X1, X12 |
(3843) 0x4fcfcc B.NE 4fcfec |
(3844) 0x4fcfd0 LDP X19, X20, [SP, #16] |
(3844) 0x4fcfd4 LDP X21, X22, [SP, #32] |
(3844) 0x4fcfd8 LDP X29, X30, [SP], #48 |
(3844) 0x4fcfdc RET |
(3844) 0x4fcfe0 ADD X3, X3, #1 |
(3844) 0x4fcfe4 MOVZ X0, #0 |
(3844) 0x4fcfe8 B 4fcf5c |
(3843) 0x4fcfec ORR X1, XZR, X12 |
(3843) 0x4fcff0 B 4fcfb4 |
0x4fcff4 HINT #0 |
0x4fcff8 HINT #0 |
0x4fcffc HINT #0 |
Coverage (%) | Name | Source Location | Module |
---|---|---|---|
►50.00+ | GOMP_parallel | libomp.so | |
○ | hypre_SeqVectorInnerProd | vector.c:483 | exec |
○ | hypre_ParVectorInnerProd | par_vector.c:432 | exec |
○ | hypre_PCGSolve | pcg.c:548 | exec |
○ | main | amg.c:419 | exec |
○ | __libc_start_main | libc-2.31.so | |
○ | _start | amg.c:599 | exec |
►27.27+ | GOMP_parallel | libomp.so | |
○ | hypre_SeqVectorInnerProd | vector.c:483 | exec |
○ | hypre_ParVectorInnerProd | par_vector.c:432 | exec |
○ | hypre_PCGSolve | pcg.c:497 | exec |
○ | main | amg.c:419 | exec |
○ | __libc_start_main | libc-2.31.so | |
○ | _start | amg.c:599 | exec |
►18.18+ | GOMP_parallel | libomp.so | |
○ | hypre_SeqVectorInnerProd | vector.c:483 | exec |
○ | hypre_ParVectorInnerProd | par_vector.c:432 | exec |
○ | hypre_PCGSolve | pcg.c:564 | exec |
○ | main | amg.c:419 | exec |
○ | __libc_start_main | libc-2.31.so | |
○ | _start | amg.c:599 | exec |
►4.55+ | GOMP_parallel | libomp.so | |
○ | hypre_SeqVectorInnerProd | vector.c:483 | exec |
○ | hypre_ParVectorInnerProd | par_vector.c:432 | exec |
○ | hypre_PCGSolve | pcg.c:344 | exec |
○ | main | amg.c:419 | exec |
○ | __libc_start_main | libc-2.31.so | |
○ | _start | amg.c:599 | exec |
Path / |
Source file and lines | vector.c:483-486 |
Module | exec |
nb instructions | 18 |
loop length | 72 |
nb stack references | 0 |
front end | 1.88 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 1.50 | 1.50 | 2.50 | 2.50 | 2.50 | 2.50 | 0.00 | 0.00 | 0.00 | 0.00 | 1.83 | 1.50 | 1.67 | 1.50 | 1.50 |
cycles | 1.50 | 1.50 | 2.50 | 2.50 | 2.50 | 2.50 | 0.00 | 0.00 | 0.00 | 0.00 | 1.83 | 1.50 | 1.67 | 1.50 | 1.50 |
Cycles executing div or sqrt instructions | 1.00-0.50 |
Front-end | 1.88 |
Overall L1 | 2.50 |
all | 0% |
load | NA (no load vectorizable/vectorized instructions) |
store | NA (no store vectorizable/vectorized instructions) |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 0% |
fma | NA (no fma vectorizable/vectorized instructions) |
div/sqrt | 0% |
other | NA (no other vectorizable/vectorized instructions) |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
STP X29, X30, [SP, #976]! | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
ADD X29, SP, #0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
STP X21, X22, [SP, #32] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
LDR X22, [X0, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 |
STP X19, X20, [SP, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
ORR X19, XZR, X0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
BL 40f400 <@plt_start@+0x400> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
SBFM X20, X0, #0, #31 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
BL 40f150 <@plt_start@+0x150> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
SDIV X3, X22, X20 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5-20 | 1-0.50 |
SBFM X2, X0, #0, #31 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
LDP X21, X4, [X19] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 1 |
MSUB X0, X3, X20, X22 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 |
CMP X2, X0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.33 |
B.LT 4fcfe0 <hypre_SeqVectorInnerProd._omp_fn.0+0xc0> | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
HINT #0 | ||||||||||||||||||
HINT #0 | ||||||||||||||||||
HINT #0 |
Source file and lines | vector.c:483-486 |
Module | exec |
nb instructions | 18 |
loop length | 72 |
nb stack references | 0 |
front end | 1.88 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 1.50 | 1.50 | 2.50 | 2.50 | 2.50 | 2.50 | 0.00 | 0.00 | 0.00 | 0.00 | 1.83 | 1.50 | 1.67 | 1.50 | 1.50 |
cycles | 1.50 | 1.50 | 2.50 | 2.50 | 2.50 | 2.50 | 0.00 | 0.00 | 0.00 | 0.00 | 1.83 | 1.50 | 1.67 | 1.50 | 1.50 |
Cycles executing div or sqrt instructions | 1.00-0.50 |
Front-end | 1.88 |
Overall L1 | 2.50 |
all | 0% |
load | NA (no load vectorizable/vectorized instructions) |
store | NA (no store vectorizable/vectorized instructions) |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 0% |
fma | NA (no fma vectorizable/vectorized instructions) |
div/sqrt | 0% |
other | NA (no other vectorizable/vectorized instructions) |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
STP X29, X30, [SP, #976]! | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
ADD X29, SP, #0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
STP X21, X22, [SP, #32] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
LDR X22, [X0, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 |
STP X19, X20, [SP, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
ORR X19, XZR, X0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
BL 40f400 <@plt_start@+0x400> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
SBFM X20, X0, #0, #31 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
BL 40f150 <@plt_start@+0x150> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
SDIV X3, X22, X20 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5-20 | 1-0.50 |
SBFM X2, X0, #0, #31 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
LDP X21, X4, [X19] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 1 |
MSUB X0, X3, X20, X22 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 |
CMP X2, X0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.33 |
B.LT 4fcfe0 <hypre_SeqVectorInnerProd._omp_fn.0+0xc0> | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
HINT #0 | ||||||||||||||||||
HINT #0 | ||||||||||||||||||
HINT #0 |
Name | Coverage (%) | Time (s) |
---|---|---|
▼hypre_SeqVectorInnerProd._omp_fn.0– | 0.71 | 0.11 |
▼Loop 3844 - vector.c:483-486 - exec– | 0 | 0 |
○Loop 3845 - vector.c:486-486 - exec | 0.71 | 0.11 |
○Loop 3843 - vector.c:483-483 - exec | 0 | 0 |