Function: hypre_BoomerAMGCoarsenPMIS._omp_fn.6 | Module: exec | Source: par_coarsen.c:2354-2381 | Coverage: 4.22% |
---|
Function: hypre_BoomerAMGCoarsenPMIS._omp_fn.6 | Module: exec | Source: par_coarsen.c:2354-2381 | Coverage: 4.22% |
---|
/home/hbollore/qaas/qaas-runs/169-817-3176/intel/AMG/build/AMG/AMG/parcsr_ls/par_coarsen.c: 2354 - 2381 |
-------------------------------------------------------------------------------- |
2354: #pragma omp parallel for private(ig, i, jS, j, jj) HYPRE_SMP_SCHEDULE |
2355: #endif |
2356: for (ig = 0; ig < graph_size; ig++) |
2357: { |
2358: i = graph_array[ig]; |
2359: if (measure_array[i] > 1) |
2360: { |
2361: for (jS = S_diag_i[i]; jS < S_diag_i[i+1]; jS++) |
2362: { |
2363: j = S_diag_j[jS]; |
2364: if (measure_array[j] > 1) |
2365: { |
2366: if (measure_array[i] > measure_array[j]) |
2367: CF_marker[j] = 0; |
2368: else if (measure_array[j] > measure_array[i]) |
2369: CF_marker[i] = 0; |
2370: } |
2371: } /* for each local neighbor j of i */ |
2372: for (jS = S_offd_i[i]; jS < S_offd_i[i+1]; jS++) |
2373: { |
2374: jj = S_offd_j[jS]; |
2375: j = num_variables+jj; |
2376: if (measure_array[j] > 1) |
2377: { |
2378: if (measure_array[i] > measure_array[j]) |
2379: CF_marker_offd[jj] = 0; |
2380: else if (measure_array[j] > measure_array[i]) |
2381: CF_marker[i] = 0; |
0x420fc0 STP X29, X30, [SP, #976]! |
0x420fc4 ADD X29, SP, #0 |
0x420fc8 STP X19, X20, [SP, #16] |
0x420fcc ORR X19, XZR, X0 |
0x420fd0 LDR X0, [X0, #72] |
0x420fd4 STR X21, [SP, #32] |
0x420fd8 LDR X21, [X0] |
0x420fdc BL 40f400 |
0x420fe0 SBFM X20, X0, #0, #31 |
0x420fe4 BL 40f150 |
0x420fe8 SBFM X3, X0, #0, #31 |
0x420fec SDIV X9, X21, X20 |
0x420ff0 MSUB X1, X9, X20, X21 |
0x420ff4 CMP X3, X1 |
0x420ff8 B.LT 421124 |
(144) 0x420ffc MADD X2, X9, X3, X1 |
(144) 0x421000 ADD X4, X9, X2 |
(144) 0x421004 CMP X2, X4 |
(144) 0x421008 B.GE 421114 |
(144) 0x42100c LDR X5, [X19, #64] |
(144) 0x421010 FMOV D1, #1.0000000 |
(144) 0x421014 LDP X15, X8, [X19] |
(144) 0x421018 LDP X14, X7, [X19, #16] |
(144) 0x42101c ADD X3, X5, X2,LSL #3 |
(144) 0x421020 ADD X18, X5, X4,LSL #3 |
(144) 0x421024 LDP X6, X11, [X19, #32] |
(144) 0x421028 LDP X16, X17, [X19, #48] |
(144) 0x42102c B 42103c |
(145) 0x421030 ADD X3, X3, #8 |
(145) 0x421034 CMP X18, X3 |
(145) 0x421038 B.EQ 421114 |
(145) 0x42103c LDR X30, [X3] |
(145) 0x421040 LDR D2, [X17, X30,LSL #3] |
(145) 0x421044 UBFM X10, X30, #61, #60 |
(145) 0x421048 FCMPE D2, D1 |
(145) 0x42104c B.LS 421030 |
(145) 0x421050 ADD X12, X10, #8 |
(145) 0x421054 LDR X0, [X15, X30,LSL #3] |
(145) 0x421058 ADD X13, X15, X12 |
(145) 0x42105c LDR X21, [X15, X12] |
(145) 0x421060 CMP X0, X21 |
(145) 0x421064 B.LT 421084 |
(145) 0x421068 B 4210b0 |
(147) 0x42106c B.GE 421078 |
(147) 0x421070 STR XZR, [X11, X10] |
(147) 0x421074 LDR X21, [X13] |
(147) 0x421078 ADD X0, X0, #1 |
(147) 0x42107c CMP X0, X21 |
(147) 0x421080 B.GE 4210b0 |
(147) 0x421084 LDR X19, [X8, X0,LSL #3] |
(147) 0x421088 LDR D0, [X17, X19,LSL #3] |
(147) 0x42108c FCMPE D0, D1 |
(147) 0x421090 B.LS 421078 |
(147) 0x421094 FCMPE D2, D0 |
(147) 0x421098 B.LS 42106c |
(147) 0x42109c STR XZR, [X11, X19,LSL #3] |
(147) 0x4210a0 ADD X0, X0, #1 |
(147) 0x4210a4 LDR X21, [X13] |
(147) 0x4210a8 CMP X0, X21 |
(147) 0x4210ac B.LT 421084 |
(145) 0x4210b0 LDR X4, [X14, X30,LSL #3] |
(145) 0x4210b4 ADD X20, X14, X12 |
(145) 0x4210b8 LDR X2, [X14, X12] |
(145) 0x4210bc CMP X4, X2 |
(145) 0x4210c0 B.LT 4210ec |
(145) 0x4210c4 B 421030 |
(146) 0x4210c8 B.GE 4210e0 |
(146) 0x4210cc STR XZR, [X11, X10] |
(146) 0x4210d0 LDR X2, [X20] |
(146) 0x4210d4 HINT #0 |
(146) 0x4210d8 HINT #0 |
(146) 0x4210dc HINT #0 |
(146) 0x4210e0 ADD X4, X4, #1 |
(146) 0x4210e4 CMP X4, X2 |
(146) 0x4210e8 B.GE 421030 |
(146) 0x4210ec LDR X9, [X7, X4,LSL #3] |
(146) 0x4210f0 ADD X1, X6, X9 |
(146) 0x4210f4 LDR D3, [X17, X1,LSL #3] |
(146) 0x4210f8 FCMPE D3, D1 |
(146) 0x4210fc B.LS 4210e0 |
(146) 0x421100 FCMPE D2, D3 |
(146) 0x421104 B.LS 4210c8 |
(146) 0x421108 STR XZR, [X16, X9,LSL #3] |
(146) 0x42110c LDR X2, [X20] |
(146) 0x421110 B 4210e0 |
(144) 0x421114 LDP X19, X20, [SP, #16] |
(144) 0x421118 LDR X21, [SP, #32] |
(144) 0x42111c LDP X29, X30, [SP], #48 |
(144) 0x421120 RET |
(144) 0x421124 ADD X9, X9, #1 |
(144) 0x421128 MOVZ X1, #0 |
(144) 0x42112c B 420ffc |
Coverage (%) | Name | Source Location | Module |
---|---|---|---|
►95.38+ | GOMP_parallel | libomp.so | |
○ | hypre_BoomerAMGCoarsenPMIS | par_coarsen.c:2393 | exec |
○ | hypre_BoomerAMGSetup | par_amg_setup.c:612 | exec |
○ | hypre_PCGSetup | pcg.c:234 | exec |
○ | main | amg.c:398 | exec |
○ | __libc_start_main | libc-2.31.so | |
○ | _start | amg.c:599 | exec |
►4.62+ | GOMP_parallel | libomp.so | |
○ | hypre_BoomerAMGCoarsenPMIS | par_coarsen.c:2393 | exec |
○ | hypre_BoomerAMGSetup | par_amg_setup.c:623 | exec |
○ | hypre_PCGSetup | pcg.c:234 | exec |
○ | main | amg.c:398 | exec |
○ | __libc_start_main | libc-2.31.so | |
○ | _start | amg.c:599 | exec |
Path / |
Source file and lines | par_coarsen.c:2354-2381 |
Module | exec |
nb instructions | 15 |
loop length | 60 |
nb stack references | 0 |
front end | 1.88 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 1.50 | 1.50 | 2.50 | 2.50 | 2.50 | 2.50 | 0.00 | 0.00 | 0.00 | 0.00 | 1.83 | 1.50 | 1.67 | 1.50 | 1.50 |
cycles | 1.50 | 1.50 | 2.50 | 2.50 | 2.50 | 2.50 | 0.00 | 0.00 | 0.00 | 0.00 | 1.83 | 1.50 | 1.67 | 1.50 | 1.50 |
Cycles executing div or sqrt instructions | 1.00-0.50 |
Front-end | 1.88 |
Overall L1 | 2.50 |
all | 0% |
load | NA (no load vectorizable/vectorized instructions) |
store | NA (no store vectorizable/vectorized instructions) |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 0% |
fma | NA (no fma vectorizable/vectorized instructions) |
div/sqrt | 0% |
other | 0% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
STP X29, X30, [SP, #976]! | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
ADD X29, SP, #0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
STP X19, X20, [SP, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
ORR X19, XZR, X0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
LDR X0, [X0, #72] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 |
STR X21, [SP, #32] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
LDR X21, [X0] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 |
BL 40f400 <@plt_start@+0x400> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
SBFM X20, X0, #0, #31 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
BL 40f150 <@plt_start@+0x150> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
SBFM X3, X0, #0, #31 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
SDIV X9, X21, X20 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5-20 | 1-0.50 |
MSUB X1, X9, X20, X21 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 |
CMP X3, X1 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.33 |
B.LT 421124 <hypre_BoomerAMGCoarsenPMIS._omp_fn.6+0x164> | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
Source file and lines | par_coarsen.c:2354-2381 |
Module | exec |
nb instructions | 15 |
loop length | 60 |
nb stack references | 0 |
front end | 1.88 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 1.50 | 1.50 | 2.50 | 2.50 | 2.50 | 2.50 | 0.00 | 0.00 | 0.00 | 0.00 | 1.83 | 1.50 | 1.67 | 1.50 | 1.50 |
cycles | 1.50 | 1.50 | 2.50 | 2.50 | 2.50 | 2.50 | 0.00 | 0.00 | 0.00 | 0.00 | 1.83 | 1.50 | 1.67 | 1.50 | 1.50 |
Cycles executing div or sqrt instructions | 1.00-0.50 |
Front-end | 1.88 |
Overall L1 | 2.50 |
all | 0% |
load | NA (no load vectorizable/vectorized instructions) |
store | NA (no store vectorizable/vectorized instructions) |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 0% |
fma | NA (no fma vectorizable/vectorized instructions) |
div/sqrt | 0% |
other | 0% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
STP X29, X30, [SP, #976]! | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
ADD X29, SP, #0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
STP X19, X20, [SP, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
ORR X19, XZR, X0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
LDR X0, [X0, #72] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 |
STR X21, [SP, #32] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
LDR X21, [X0] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 |
BL 40f400 <@plt_start@+0x400> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
SBFM X20, X0, #0, #31 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
BL 40f150 <@plt_start@+0x150> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
SBFM X3, X0, #0, #31 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
SDIV X9, X21, X20 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5-20 | 1-0.50 |
MSUB X1, X9, X20, X21 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 |
CMP X3, X1 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.33 |
B.LT 421124 <hypre_BoomerAMGCoarsenPMIS._omp_fn.6+0x164> | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
Name | Coverage (%) | Time (s) |
---|---|---|
▼hypre_BoomerAMGCoarsenPMIS._omp_fn.6– | 4.22 | 0.65 |
▼Loop 144 - par_coarsen.c:2354-2381 - exec– | 0 | 0 |
▼Loop 145 - par_coarsen.c:2354-2381 - exec– | 0.19 | 0.03 |
○Loop 147 - par_coarsen.c:2361-2369 - exec | 4.02 | 0.62 |
○Loop 146 - par_coarsen.c:2372-2381 - exec | 0 | 0 |