Function: hypre_BoomerAMGBuildMultipass._omp_fn.6 | Module: exec | Source: par_multi_interp.c:1167-1173 | Coverage: 0.03% |
---|
Function: hypre_BoomerAMGBuildMultipass._omp_fn.6 | Module: exec | Source: par_multi_interp.c:1167-1173 | Coverage: 0.03% |
---|
/home/hbollore/qaas/qaas-runs/169-817-3176/intel/AMG/build/AMG/AMG/parcsr_ls/par_multi_interp.c: 1167 - 1173 |
-------------------------------------------------------------------------------- |
1167: #pragma omp parallel for private(i,i1) HYPRE_SMP_SCHEDULE |
1168: #endif |
1169: for (i=0; i < n_coarse; i++) |
1170: { |
1171: i1 = C_array[i]; |
1172: P_diag_j[P_diag_i[i1]] = fine_to_coarse[i1]; |
1173: P_diag_data[P_diag_i[i1]] = 1.0; |
0x448cd0 STP X29, X30, [SP, #976]! |
0x448cd4 ADD X29, SP, #0 |
0x448cd8 STR X21, [SP, #32] |
0x448cdc LDR X21, [X0, #32] |
0x448ce0 STP X19, X20, [SP, #16] |
0x448ce4 ORR X20, XZR, X0 |
0x448ce8 BL 40f400 |
0x448cec SBFM X19, X0, #0, #31 |
0x448cf0 BL 40f150 |
0x448cf4 SDIV X3, X21, X19 |
0x448cf8 SBFM X1, X0, #0, #31 |
0x448cfc MSUB X2, X3, X19, X21 |
0x448d00 CMP X1, X2 |
0x448d04 B.LT 448e20 |
(766) 0x448d08 MADD X7, X3, X1, X2 |
(766) 0x448d0c ADD X8, X3, X7 |
(766) 0x448d10 CMP X7, X8 |
(766) 0x448d14 B.GE 448e10 |
(766) 0x448d18 LDP X5, X0, [X20, #16] |
(766) 0x448d1c FMOV D0, #1.0000000 |
(766) 0x448d20 LDP X4, X11, [X20] |
(766) 0x448d24 ADD X1, X0, X7,LSL #3 |
(766) 0x448d28 ADD X12, X0, X8,LSL #3 |
(766) 0x448d2c LDR X6, [X20, #40] |
(766) 0x448d30 SUB X9, X12, X1 |
(766) 0x448d34 SUB X10, X9, #8 |
(766) 0x448d38 UBFM X13, X10, #3, #63 |
(766) 0x448d3c ADD X14, X13, #1 |
(766) 0x448d40 ANDS X15, X14, #4160 |
(766) 0x448d44 B.EQ 448da0 |
(766) 0x448d48 CMP X15, #1 |
(766) 0x448d4c B.EQ 448d78 |
(766) 0x448d50 CMP X15, #2 |
(766) 0x448d54 B.NE 448e2c |
(766) 0x448d58 LDR X19, [X1], #8 |
(766) 0x448d5c UBFM X3, X19, #61, #60 |
(766) 0x448d60 ADD X2, X11, X3 |
(766) 0x448d64 LDR X7, [X6, X3] |
(766) 0x448d68 LDR X8, [X2] |
(766) 0x448d6c STR X7, [X5, X8,LSL #3] |
(766) 0x448d70 LDR X0, [X2] |
(766) 0x448d74 STR D0, [X4, X0,LSL #3] |
(766) 0x448d78 LDR X9, [X1], #8 |
(766) 0x448d7c UBFM X10, X9, #61, #60 |
(766) 0x448d80 ADD X13, X11, X10 |
(766) 0x448d84 LDR X14, [X6, X10] |
(766) 0x448d88 LDR X15, [X13] |
(766) 0x448d8c STR X14, [X5, X15,LSL #3] |
(766) 0x448d90 LDR X16, [X13] |
(766) 0x448d94 STR D0, [X4, X16,LSL #3] |
(766) 0x448d98 CMP X1, X12 |
(766) 0x448d9c B.EQ 448e10 |
(767) 0x448da0 ORR X17, XZR, X1 |
(767) 0x448da4 ADD X1, X1, #32 |
(767) 0x448da8 LDR X18, [X17], #8 |
(767) 0x448dac LDR X30, [X6, X18,LSL #3] |
(767) 0x448db0 LDR X20, [X11, X18,LSL #3] |
(767) 0x448db4 STR X30, [X5, X20,LSL #3] |
(767) 0x448db8 LDUR X19, [X1, #488] |
(767) 0x448dbc LDR X21, [X11, X18,LSL #3] |
(767) 0x448dc0 LDR X3, [X6, X19,LSL #3] |
(767) 0x448dc4 LDR X2, [X11, X19,LSL #3] |
(767) 0x448dc8 STR D0, [X4, X21,LSL #3] |
(767) 0x448dcc STR X3, [X5, X2,LSL #3] |
(767) 0x448dd0 LDR X7, [X17, #8] |
(767) 0x448dd4 LDR X9, [X11, X19,LSL #3] |
(767) 0x448dd8 LDR X8, [X6, X7,LSL #3] |
(767) 0x448ddc LDR X0, [X11, X7,LSL #3] |
(767) 0x448de0 STR D0, [X4, X9,LSL #3] |
(767) 0x448de4 STR X8, [X5, X0,LSL #3] |
(767) 0x448de8 LDUR X13, [X1, #504] |
(767) 0x448dec LDR X10, [X11, X7,LSL #3] |
(767) 0x448df0 LDR X14, [X6, X13,LSL #3] |
(767) 0x448df4 LDR X15, [X11, X13,LSL #3] |
(767) 0x448df8 STR D0, [X4, X10,LSL #3] |
(767) 0x448dfc STR X14, [X5, X15,LSL #3] |
(767) 0x448e00 LDR X16, [X11, X13,LSL #3] |
(767) 0x448e04 STR D0, [X4, X16,LSL #3] |
(767) 0x448e08 CMP X1, X12 |
(767) 0x448e0c B.NE 448da0 |
(766) 0x448e10 LDP X19, X20, [SP, #16] |
(766) 0x448e14 LDR X21, [SP, #32] |
(766) 0x448e18 LDP X29, X30, [SP], #48 |
(766) 0x448e1c RET |
(766) 0x448e20 ADD X3, X3, #1 |
(766) 0x448e24 MOVZ X2, #0 |
(766) 0x448e28 B 448d08 |
(766) 0x448e2c LDR X16, [X1], #8 |
(766) 0x448e30 UBFM X17, X16, #61, #60 |
(766) 0x448e34 ADD X18, X11, X17 |
(766) 0x448e38 LDR X30, [X6, X17] |
(766) 0x448e3c LDR X20, [X18] |
(766) 0x448e40 STR X30, [X5, X20,LSL #3] |
(766) 0x448e44 LDR X21, [X18] |
(766) 0x448e48 STR D0, [X4, X21,LSL #3] |
(766) 0x448e4c B 448d58 |
Coverage (%) | Name | Source Location | Module |
---|---|---|---|
►100.00+ | GOMP_parallel | libomp.so | |
○ | hypre_BoomerAMGBuildMultipass | par_multi_interp.c:1177 | exec |
○ | hypre_BoomerAMGSetup | par_amg_setup.c:737 | exec |
○ | hypre_PCGSetup | pcg.c:234 | exec |
○ | main | amg.c:398 | exec |
○ | __libc_start_main | libc-2.31.so | |
○ | _start | amg.c:599 | exec |
Path / |
Source file and lines | par_multi_interp.c:1167-1173 |
Module | exec |
nb instructions | 14 |
loop length | 56 |
nb stack references | 0 |
front end | 1.75 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 1.50 | 1.50 | 2.50 | 2.50 | 2.50 | 2.50 | 0.00 | 0.00 | 0.00 | 0.00 | 1.50 | 1.50 | 1.00 | 1.50 | 1.50 |
cycles | 1.50 | 1.50 | 2.50 | 2.50 | 2.50 | 2.50 | 0.00 | 0.00 | 0.00 | 0.00 | 1.50 | 1.50 | 1.00 | 1.50 | 1.50 |
Cycles executing div or sqrt instructions | 1.00-0.50 |
Front-end | 1.75 |
Overall L1 | 2.50 |
all | 0% |
load | NA (no load vectorizable/vectorized instructions) |
store | NA (no store vectorizable/vectorized instructions) |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 0% |
fma | NA (no fma vectorizable/vectorized instructions) |
div/sqrt | 0% |
other | 0% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
STP X29, X30, [SP, #976]! | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
ADD X29, SP, #0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
STR X21, [SP, #32] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
LDR X21, [X0, #32] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 |
STP X19, X20, [SP, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
ORR X20, XZR, X0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
BL 40f400 <@plt_start@+0x400> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
SBFM X19, X0, #0, #31 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
BL 40f150 <@plt_start@+0x150> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
SDIV X3, X21, X19 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5-20 | 1-0.50 |
SBFM X1, X0, #0, #31 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
MSUB X2, X3, X19, X21 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 |
CMP X1, X2 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.33 |
B.LT 448e20 <hypre_BoomerAMGBuildMultipass._omp_fn.6+0x150> | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
Source file and lines | par_multi_interp.c:1167-1173 |
Module | exec |
nb instructions | 14 |
loop length | 56 |
nb stack references | 0 |
front end | 1.75 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 1.50 | 1.50 | 2.50 | 2.50 | 2.50 | 2.50 | 0.00 | 0.00 | 0.00 | 0.00 | 1.50 | 1.50 | 1.00 | 1.50 | 1.50 |
cycles | 1.50 | 1.50 | 2.50 | 2.50 | 2.50 | 2.50 | 0.00 | 0.00 | 0.00 | 0.00 | 1.50 | 1.50 | 1.00 | 1.50 | 1.50 |
Cycles executing div or sqrt instructions | 1.00-0.50 |
Front-end | 1.75 |
Overall L1 | 2.50 |
all | 0% |
load | NA (no load vectorizable/vectorized instructions) |
store | NA (no store vectorizable/vectorized instructions) |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 0% |
fma | NA (no fma vectorizable/vectorized instructions) |
div/sqrt | 0% |
other | 0% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
STP X29, X30, [SP, #976]! | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
ADD X29, SP, #0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
STR X21, [SP, #32] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
LDR X21, [X0, #32] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 |
STP X19, X20, [SP, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
ORR X20, XZR, X0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
BL 40f400 <@plt_start@+0x400> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
SBFM X19, X0, #0, #31 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
BL 40f150 <@plt_start@+0x150> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
SDIV X3, X21, X19 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5-20 | 1-0.50 |
SBFM X1, X0, #0, #31 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
MSUB X2, X3, X19, X21 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 |
CMP X1, X2 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.33 |
B.LT 448e20 <hypre_BoomerAMGBuildMultipass._omp_fn.6+0x150> | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
Name | Coverage (%) | Time (s) |
---|---|---|
▼hypre_BoomerAMGBuildMultipass._omp_fn.6– | 0.03 | 0 |
▼Loop 766 - par_multi_interp.c:1167-1173 - exec– | 0 | 0 |
○Loop 767 - par_multi_interp.c:1172-1173 - exec | 0.03 | 0 |