Function: hypre_BoomerAMGCoarsenPMIS._omp_fn.2 | Module: exec | Source: par_coarsen.c:2132-2136 | Coverage: 2.89% |
---|
Function: hypre_BoomerAMGCoarsenPMIS._omp_fn.2 | Module: exec | Source: par_coarsen.c:2132-2136 | Coverage: 2.89% |
---|
/home/hbollore/qaas/qaas-runs/169-817-3176/intel/AMG/build/AMG/AMG/parcsr_ls/par_coarsen.c: 2132 - 2136 |
-------------------------------------------------------------------------------- |
2132: #pragma omp parallel for private(i) HYPRE_SMP_SCHEDULE |
2133: for (i=0; i < S_diag_i[num_variables]; i++) |
2134: { |
2135: #pragma omp atomic |
2136: measure_array_temp[S_diag_j[i]]++; |
0x4224a8 STP X29, X30, [SP, #976]! |
0x4224ac ADD X29, SP, #0 |
0x4224b0 STP X19, X20, [SP, #16] |
0x4224b4 ORR X20, XZR, X0 |
0x4224b8 LDR X0, [X0] |
0x4224bc LDR X1, [X20, #16] |
0x4224c0 STR X21, [SP, #32] |
0x4224c4 LDR X21, [X0, X1,LSL #3] |
0x4224c8 BL 40f400 |
0x4224cc SBFM X19, X0, #0, #31 |
0x4224d0 BL 40f150 |
0x4224d4 SBFM X4, X0, #0, #31 |
0x4224d8 SDIV X3, X21, X19 |
0x4224dc MSUB X2, X3, X19, X21 |
0x4224e0 CMP X4, X2 |
0x4224e4 B.LT 422630 |
(168) 0x4224e8 MADD X6, X3, X4, X2 |
(168) 0x4224ec ADD X7, X3, X6 |
(168) 0x4224f0 CMP X6, X7 |
(168) 0x4224f4 B.GE 422620 |
(168) 0x4224f8 LDR X5, [X20, #8] |
(168) 0x4224fc MOVZ X8, #1 |
(168) 0x422500 LDR X9, [X20, #24] |
(168) 0x422504 ADD X0, X5, X6,LSL #3 |
(168) 0x422508 ADD X10, X5, X7,LSL #3 |
(168) 0x42250c SUB X11, X10, X0 |
(168) 0x422510 SUB X12, X11, #8 |
(168) 0x422514 UBFM X13, X12, #3, #63 |
(168) 0x422518 ADD X14, X13, X8 |
(168) 0x42251c ANDS X15, X14, #4224 |
(168) 0x422520 B.EQ 4225b0 |
(168) 0x422524 CMP X15, X8 |
(168) 0x422528 B.EQ 42259c |
(168) 0x42252c CMP X15, #2 |
(168) 0x422530 B.EQ 422590 |
(168) 0x422534 CMP X15, #3 |
(168) 0x422538 B.EQ 422584 |
(168) 0x42253c CMP X15, #4 |
(168) 0x422540 B.EQ 422578 |
(168) 0x422544 CMP X15, #5 |
(168) 0x422548 B.EQ 42256c |
(168) 0x42254c CMP X15, #6 |
(168) 0x422550 B.EQ 422560 |
(168) 0x422554 LDR X16, [X0], #8 |
(168) 0x422558 ADD X17, X9, X16,LSL #3 |
(168) 0x42255c LDADD X8, X1, [X17] |
(168) 0x422560 LDR X18, [X0], #8 |
(168) 0x422564 ADD X30, X9, X18,LSL #3 |
(168) 0x422568 LDADD X8, X1, [X30] |
(168) 0x42256c LDR X20, [X0], #8 |
(168) 0x422570 ADD X1, X9, X20,LSL #3 |
(168) 0x422574 LDADD X8, X1, [X1] |
(168) 0x422578 LDR X21, [X0], #8 |
(168) 0x42257c ADD X19, X9, X21,LSL #3 |
(168) 0x422580 LDADD X8, X1, [X19] |
(168) 0x422584 LDR X4, [X0], #8 |
(168) 0x422588 ADD X3, X9, X4,LSL #3 |
(168) 0x42258c LDADD X8, X1, [X3] |
(168) 0x422590 LDR X2, [X0], #8 |
(168) 0x422594 ADD X6, X9, X2,LSL #3 |
(168) 0x422598 LDADD X8, X1, [X6] |
(168) 0x42259c LDR X7, [X0], #8 |
(168) 0x4225a0 ADD X5, X9, X7,LSL #3 |
(168) 0x4225a4 LDADD X8, X1, [X5] |
(168) 0x4225a8 CMP X10, X0 |
(168) 0x4225ac B.EQ 422620 |
(169) 0x4225b0 ORR X11, XZR, X0 |
(169) 0x4225b4 LDR X12, [X11], #8 |
(169) 0x4225b8 ADD X13, X9, X12,LSL #3 |
(169) 0x4225bc LDADD X8, X1, [X13] |
(169) 0x4225c0 LDR X14, [X0, #8] |
(169) 0x4225c4 ADD X15, X9, X14,LSL #3 |
(169) 0x4225c8 LDADD X8, X1, [X15] |
(169) 0x4225cc LDR X16, [X11, #8] |
(169) 0x4225d0 ADD X17, X9, X16,LSL #3 |
(169) 0x4225d4 LDADD X8, X1, [X17] |
(169) 0x4225d8 LDR X18, [X0, #24] |
(169) 0x4225dc ADD X30, X9, X18,LSL #3 |
(169) 0x4225e0 LDADD X8, X1, [X30] |
(169) 0x4225e4 LDR X20, [X0, #32] |
(169) 0x4225e8 ADD X1, X9, X20,LSL #3 |
(169) 0x4225ec LDADD X8, X1, [X1] |
(169) 0x4225f0 LDR X21, [X0, #40] |
(169) 0x4225f4 ADD X19, X9, X21,LSL #3 |
(169) 0x4225f8 LDADD X8, X1, [X19] |
(169) 0x4225fc LDR X4, [X0, #48] |
(169) 0x422600 ADD X3, X9, X4,LSL #3 |
(169) 0x422604 LDADD X8, X1, [X3] |
(169) 0x422608 LDR X2, [X0, #56] |
(169) 0x42260c ADD X6, X9, X2,LSL #3 |
(169) 0x422610 LDADD X8, X1, [X6] |
(169) 0x422614 ADD X0, X0, #64 |
(169) 0x422618 CMP X10, X0 |
(169) 0x42261c B.NE 4225b0 |
(168) 0x422620 LDP X19, X20, [SP, #16] |
(168) 0x422624 LDR X21, [SP, #32] |
(168) 0x422628 LDP X29, X30, [SP], #48 |
(168) 0x42262c RET |
(168) 0x422630 ADD X3, X3, #1 |
(168) 0x422634 MOVZ X2, #0 |
(168) 0x422638 B 4224e8 |
0x42263c HINT #0 |
Coverage (%) | Name | Source Location | Module |
---|---|---|---|
►96.63+ | GOMP_parallel | libomp.so | |
○ | hypre_BoomerAMGCoarsenPMIS | par_coarsen.c:2132 | exec |
○ | hypre_BoomerAMGSetup | par_amg_setup.c:612 | exec |
○ | hypre_PCGSetup | pcg.c:234 | exec |
○ | main | amg.c:398 | exec |
○ | __libc_start_main | libc-2.31.so | |
○ | _start | amg.c:599 | exec |
►3.37+ | GOMP_parallel | libomp.so | |
○ | hypre_BoomerAMGCoarsenPMIS | par_coarsen.c:2132 | exec |
○ | hypre_BoomerAMGSetup | par_amg_setup.c:623 | exec |
○ | hypre_PCGSetup | pcg.c:234 | exec |
○ | main | amg.c:398 | exec |
○ | __libc_start_main | libc-2.31.so | |
○ | _start | amg.c:599 | exec |
Path / |
Source file and lines | par_coarsen.c:2132-2136 |
Module | exec |
nb instructions | 17 |
loop length | 68 |
nb stack references | 0 |
front end | 2.00 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 1.50 | 1.50 | 2.50 | 2.50 | 2.50 | 2.50 | 0.00 | 0.00 | 0.00 | 0.00 | 2.17 | 1.83 | 2.00 | 1.50 | 1.50 |
cycles | 1.50 | 1.50 | 2.50 | 2.50 | 2.50 | 2.50 | 0.00 | 0.00 | 0.00 | 0.00 | 2.17 | 1.83 | 2.00 | 1.50 | 1.50 |
Cycles executing div or sqrt instructions | 1.00-0.50 |
Front-end | 2.00 |
Overall L1 | 2.50 |
all | 0% |
load | NA (no load vectorizable/vectorized instructions) |
store | NA (no store vectorizable/vectorized instructions) |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 0% |
fma | NA (no fma vectorizable/vectorized instructions) |
div/sqrt | 0% |
other | NA (no other vectorizable/vectorized instructions) |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
STP X29, X30, [SP, #976]! | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
ADD X29, SP, #0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
STP X19, X20, [SP, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
ORR X20, XZR, X0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
LDR X0, [X0] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 |
LDR X1, [X20, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 |
STR X21, [SP, #32] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
LDR X21, [X0, X1,LSL #3] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 |
BL 40f400 <@plt_start@+0x400> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
SBFM X19, X0, #0, #31 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
BL 40f150 <@plt_start@+0x150> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
SBFM X4, X0, #0, #31 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
SDIV X3, X21, X19 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5-20 | 1-0.50 |
MSUB X2, X3, X19, X21 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 |
CMP X4, X2 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.33 |
B.LT 422630 <hypre_BoomerAMGCoarsenPMIS._omp_fn.2+0x188> | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
HINT #0 |
Source file and lines | par_coarsen.c:2132-2136 |
Module | exec |
nb instructions | 17 |
loop length | 68 |
nb stack references | 0 |
front end | 2.00 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 1.50 | 1.50 | 2.50 | 2.50 | 2.50 | 2.50 | 0.00 | 0.00 | 0.00 | 0.00 | 2.17 | 1.83 | 2.00 | 1.50 | 1.50 |
cycles | 1.50 | 1.50 | 2.50 | 2.50 | 2.50 | 2.50 | 0.00 | 0.00 | 0.00 | 0.00 | 2.17 | 1.83 | 2.00 | 1.50 | 1.50 |
Cycles executing div or sqrt instructions | 1.00-0.50 |
Front-end | 2.00 |
Overall L1 | 2.50 |
all | 0% |
load | NA (no load vectorizable/vectorized instructions) |
store | NA (no store vectorizable/vectorized instructions) |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 0% |
fma | NA (no fma vectorizable/vectorized instructions) |
div/sqrt | 0% |
other | NA (no other vectorizable/vectorized instructions) |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
STP X29, X30, [SP, #976]! | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
ADD X29, SP, #0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
STP X19, X20, [SP, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
ORR X20, XZR, X0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
LDR X0, [X0] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 |
LDR X1, [X20, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 |
STR X21, [SP, #32] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
LDR X21, [X0, X1,LSL #3] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 |
BL 40f400 <@plt_start@+0x400> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
SBFM X19, X0, #0, #31 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
BL 40f150 <@plt_start@+0x150> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
SBFM X4, X0, #0, #31 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
SDIV X3, X21, X19 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5-20 | 1-0.50 |
MSUB X2, X3, X19, X21 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 |
CMP X4, X2 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.33 |
B.LT 422630 <hypre_BoomerAMGCoarsenPMIS._omp_fn.2+0x188> | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
HINT #0 |
Name | Coverage (%) | Time (s) |
---|---|---|
▼hypre_BoomerAMGCoarsenPMIS._omp_fn.2– | 2.89 | 0.44 |
▼Loop 168 - par_coarsen.c:2132-2136 - exec– | 0 | 0 |
○Loop 169 - par_coarsen.c:2135-2136 - exec | 2.89 | 0.44 |