Function: hypre_SeqVectorScale._omp_fn.0 | Module: exec | Source: vector.c:413-416 | Coverage: 0.03% |
---|
Function: hypre_SeqVectorScale._omp_fn.0 | Module: exec | Source: vector.c:413-416 | Coverage: 0.03% |
---|
/home/hbollore/qaas/qaas-runs/169-817-3176/intel/AMG/build/AMG/AMG/seq_mv/vector.c: 413 - 416 |
-------------------------------------------------------------------------------- |
413: #pragma omp parallel for private(i) HYPRE_SMP_SCHEDULE |
414: #endif |
415: for (i = 0; i < size; i++) |
416: y_data[i] *= alpha; |
0x4fcb20 STP X29, X30, [SP, #976]! |
0x4fcb24 ADD X29, SP, #0 |
0x4fcb28 STR X21, [SP, #32] |
0x4fcb2c LDR X21, [X0, #16] |
0x4fcb30 STP X19, X20, [SP, #16] |
0x4fcb34 ORR X20, XZR, X0 |
0x4fcb38 BL 40f400 |
0x4fcb3c SBFM X19, X0, #0, #31 |
0x4fcb40 BL 40f150 |
0x4fcb44 SDIV X3, X21, X19 |
0x4fcb48 SBFM X1, X0, #0, #31 |
0x4fcb4c MSUB X2, X3, X19, X21 |
0x4fcb50 CMP X1, X2 |
0x4fcb54 B.LT 4fccb8 |
(3838) 0x4fcb58 MADD X15, X3, X1, X2 |
(3838) 0x4fcb5c ADD X0, X3, X15 |
(3838) 0x4fcb60 CMP X15, X0 |
(3838) 0x4fcb64 B.GE 4fcca8 |
(3838) 0x4fcb68 LDR D1, [X20] |
(3838) 0x4fcb6c LDR X6, [X20, #8] |
(3838) 0x4fcb70 CMP X3, #1 |
(3838) 0x4fcb74 B.EQ 4fcc9c |
(3838) 0x4fcb78 UBFM X5, X3, #1, #63 |
(3838) 0x4fcb7c ADD X13, X6, X15,LSL #3 |
(3838) 0x4fcb80 DUP V0.2D, V1.D[0] |
(3838) 0x4fcb84 UBFM X4, X5, #60, #59 |
(3838) 0x4fcb88 SUB X7, X4, #16 |
(3838) 0x4fcb8c ADD X8, X13, X5,LSL #4 |
(3838) 0x4fcb90 UBFM X9, X7, #4, #63 |
(3838) 0x4fcb94 ADD X10, X9, #1 |
(3838) 0x4fcb98 ANDS X11, X10, #4224 |
(3838) 0x4fcb9c B.EQ 4fcc2c |
(3838) 0x4fcba0 CMP X11, #1 |
(3838) 0x4fcba4 B.EQ 4fcc18 |
(3838) 0x4fcba8 CMP X11, #2 |
(3838) 0x4fcbac B.EQ 4fcc0c |
(3838) 0x4fcbb0 CMP X11, #3 |
(3838) 0x4fcbb4 B.EQ 4fcc00 |
(3838) 0x4fcbb8 CMP X11, #4 |
(3838) 0x4fcbbc B.EQ 4fcbf4 |
(3838) 0x4fcbc0 CMP X11, #5 |
(3838) 0x4fcbc4 B.EQ 4fcbe8 |
(3838) 0x4fcbc8 CMP X11, #6 |
(3838) 0x4fcbcc B.EQ 4fcbdc |
(3838) 0x4fcbd0 LDR Q2, [X13] |
(3838) 0x4fcbd4 FMUL V3.2D, V2.2D, V0.2D |
(3838) 0x4fcbd8 STR Q3, [X13], #16 |
(3838) 0x4fcbdc LDR Q4, [X13] |
(3838) 0x4fcbe0 FMUL V5.2D, V4.2D, V0.2D |
(3838) 0x4fcbe4 STR Q5, [X13], #16 |
(3838) 0x4fcbe8 LDR Q6, [X13] |
(3838) 0x4fcbec FMUL V7.2D, V6.2D, V0.2D |
(3838) 0x4fcbf0 STR Q7, [X13], #16 |
(3838) 0x4fcbf4 LDR Q16, [X13] |
(3838) 0x4fcbf8 FMUL V17.2D, V16.2D, V0.2D |
(3838) 0x4fcbfc STR Q17, [X13], #16 |
(3838) 0x4fcc00 LDR Q18, [X13] |
(3838) 0x4fcc04 FMUL V19.2D, V18.2D, V0.2D |
(3838) 0x4fcc08 STR Q19, [X13], #16 |
(3838) 0x4fcc0c LDR Q20, [X13] |
(3838) 0x4fcc10 FMUL V21.2D, V20.2D, V0.2D |
(3838) 0x4fcc14 STR Q21, [X13], #16 |
(3838) 0x4fcc18 LDR Q22, [X13] |
(3838) 0x4fcc1c FMUL V23.2D, V22.2D, V0.2D |
(3838) 0x4fcc20 STR Q23, [X13], #16 |
(3838) 0x4fcc24 CMP X13, X8 |
(3838) 0x4fcc28 B.EQ 4fcc8c |
(3839) 0x4fcc2c LDR Q24, [X13] |
(3839) 0x4fcc30 ORR X12, XZR, X13 |
(3839) 0x4fcc34 FMUL V25.2D, V24.2D, V0.2D |
(3839) 0x4fcc38 STR Q25, [X12], #16 |
(3839) 0x4fcc3c LDR Q26, [X13, #16] |
(3839) 0x4fcc40 FMUL V27.2D, V26.2D, V0.2D |
(3839) 0x4fcc44 STR Q27, [X13, #16] |
(3839) 0x4fcc48 LDR Q28, [X12, #16] |
(3839) 0x4fcc4c FMUL V29.2D, V28.2D, V0.2D |
(3839) 0x4fcc50 STR Q29, [X12, #16] |
(3839) 0x4fcc54 LDP Q30, Q31, [X13, #48] |
(3839) 0x4fcc58 LDP Q4, Q3, [X13, #80] |
(3839) 0x4fcc5c LDR Q2, [X13, #112] |
(3839) 0x4fcc60 FMUL V6.2D, V30.2D, V0.2D |
(3839) 0x4fcc64 FMUL V5.2D, V31.2D, V0.2D |
(3839) 0x4fcc68 FMUL V7.2D, V4.2D, V0.2D |
(3839) 0x4fcc6c FMUL V16.2D, V3.2D, V0.2D |
(3839) 0x4fcc70 FMUL V17.2D, V2.2D, V0.2D |
(3839) 0x4fcc74 STP Q6, Q5, [X13, #48] |
(3839) 0x4fcc78 STP Q7, Q16, [X13, #80] |
(3839) 0x4fcc7c ADD X13, X13, #128 |
(3839) 0x4fcc80 STUR Q17, [X13, #496] |
(3839) 0x4fcc84 CMP X13, X8 |
(3839) 0x4fcc88 B.NE 4fcc2c |
(3838) 0x4fcc8c AND X14, X3, #8127 |
(3838) 0x4fcc90 ADD X15, X15, X14 |
(3838) 0x4fcc94 CMP X3, X14 |
(3838) 0x4fcc98 B.EQ 4fcca8 |
(3838) 0x4fcc9c LDR D0, [X6, X15,LSL #3] |
(3838) 0x4fcca0 FMUL D1, D0, D1 |
(3838) 0x4fcca4 STR D1, [X6, X15,LSL #3] |
(3838) 0x4fcca8 LDP X19, X20, [SP, #16] |
(3838) 0x4fccac LDR X21, [SP, #32] |
(3838) 0x4fccb0 LDP X29, X30, [SP], #48 |
(3838) 0x4fccb4 RET |
(3838) 0x4fccb8 ADD X3, X3, #1 |
(3838) 0x4fccbc MOVZ X2, #0 |
(3838) 0x4fccc0 B 4fcb58 |
Coverage (%) | Name | Source Location | Module |
---|---|---|---|
►100.00+ | GOMP_parallel | libomp.so | |
○ | hypre_SeqVectorScale | vector.c:413 | exec |
○ | hypre_PCGSolve | pcg.c:709 | exec |
○ | main | amg.c:419 | exec |
○ | __libc_start_main | libc-2.31.so | |
○ | _start | amg.c:599 | exec |
Path / |
Source file and lines | vector.c:413-416 |
Module | exec |
nb instructions | 14 |
loop length | 56 |
nb stack references | 0 |
front end | 1.75 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 1.50 | 1.50 | 2.50 | 2.50 | 2.50 | 2.50 | 0.00 | 0.00 | 0.00 | 0.00 | 1.50 | 1.50 | 1.00 | 1.50 | 1.50 |
cycles | 1.50 | 1.50 | 2.50 | 2.50 | 2.50 | 2.50 | 0.00 | 0.00 | 0.00 | 0.00 | 1.50 | 1.50 | 1.00 | 1.50 | 1.50 |
Cycles executing div or sqrt instructions | 1.00-0.50 |
Front-end | 1.75 |
Overall L1 | 2.50 |
all | 0% |
load | NA (no load vectorizable/vectorized instructions) |
store | NA (no store vectorizable/vectorized instructions) |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 0% |
fma | NA (no fma vectorizable/vectorized instructions) |
div/sqrt | 0% |
other | 0% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
STP X29, X30, [SP, #976]! | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
ADD X29, SP, #0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
STR X21, [SP, #32] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
LDR X21, [X0, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 |
STP X19, X20, [SP, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
ORR X20, XZR, X0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
BL 40f400 <@plt_start@+0x400> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
SBFM X19, X0, #0, #31 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
BL 40f150 <@plt_start@+0x150> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
SDIV X3, X21, X19 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5-20 | 1-0.50 |
SBFM X1, X0, #0, #31 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
MSUB X2, X3, X19, X21 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 |
CMP X1, X2 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.33 |
B.LT 4fccb8 <hypre_SeqVectorScale._omp_fn.0+0x198> | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
Source file and lines | vector.c:413-416 |
Module | exec |
nb instructions | 14 |
loop length | 56 |
nb stack references | 0 |
front end | 1.75 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 1.50 | 1.50 | 2.50 | 2.50 | 2.50 | 2.50 | 0.00 | 0.00 | 0.00 | 0.00 | 1.50 | 1.50 | 1.00 | 1.50 | 1.50 |
cycles | 1.50 | 1.50 | 2.50 | 2.50 | 2.50 | 2.50 | 0.00 | 0.00 | 0.00 | 0.00 | 1.50 | 1.50 | 1.00 | 1.50 | 1.50 |
Cycles executing div or sqrt instructions | 1.00-0.50 |
Front-end | 1.75 |
Overall L1 | 2.50 |
all | 0% |
load | NA (no load vectorizable/vectorized instructions) |
store | NA (no store vectorizable/vectorized instructions) |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 0% |
fma | NA (no fma vectorizable/vectorized instructions) |
div/sqrt | 0% |
other | 0% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
STP X29, X30, [SP, #976]! | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
ADD X29, SP, #0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
STR X21, [SP, #32] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
LDR X21, [X0, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 |
STP X19, X20, [SP, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
ORR X20, XZR, X0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
BL 40f400 <@plt_start@+0x400> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
SBFM X19, X0, #0, #31 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
BL 40f150 <@plt_start@+0x150> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
SDIV X3, X21, X19 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5-20 | 1-0.50 |
SBFM X1, X0, #0, #31 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
MSUB X2, X3, X19, X21 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 |
CMP X1, X2 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.33 |
B.LT 4fccb8 <hypre_SeqVectorScale._omp_fn.0+0x198> | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
Name | Coverage (%) | Time (s) |
---|---|---|
▼hypre_SeqVectorScale._omp_fn.0– | 0.03 | 0 |
▼Loop 3838 - vector.c:413-416 - exec– | 0 | 0 |
○Loop 3839 - vector.c:416-416 - exec | 0.03 | 0 |