Function: main._omp_fn.1 | Module: exec | Source: main.c:139-146 | Coverage: 0.04% |
---|
Function: main._omp_fn.1 | Module: exec | Source: main.c:139-146 | Coverage: 0.04% |
---|
/home/hbollore/qaas-runs/170-256-3563/intel/HACCmk/build/HACCmk/src/main.c: 139 - 146 |
-------------------------------------------------------------------------------- |
139: #pragma omp parallel for private( dx1, dy1, dz1 ) |
140: for ( i = 0; i < count; ++i) |
141: { |
142: Step10_orig( n, xx[i], yy[i], zz[i], fsrrmax2, mp_rsm2, xx, yy, zz, mass, &dx1, &dy1, &dz1 ); |
143: |
144: vx1[i] = vx1[i] + dx1 * fcoeff; |
145: vy1[i] = vy1[i] + dy1 * fcoeff; |
146: vz1[i] = vz1[i] + dz1 * fcoeff; |
0x400fa0 ADRP X1, |
0x400fa4 STP X29, X30, [SP, #848]! |
0x400fa8 ADD X29, SP, #0 |
0x400fac LDR X2, [X1, #4064] |
0x400fb0 STP X23, X24, [SP, #48] |
0x400fb4 LDR W23, [X0, #16] |
0x400fb8 STP X19, X20, [SP, #16] |
0x400fbc STP X21, X22, [SP, #32] |
0x400fc0 ORR X22, XZR, X0 |
0x400fc4 LDR X0, [X2] |
0x400fc8 STR X0, [SP, #168] |
0x400fcc MOVZ X0, #0 |
0x400fd0 BL 400a90 |
0x400fd4 ORR W21, WZR, W0 |
0x400fd8 BL 400a40 |
0x400fdc SDIV W20, W23, W21 |
0x400fe0 ORR W19, WZR, W0 |
0x400fe4 MSUB W3, W20, W21, W23 |
0x400fe8 CMP W0, W3 |
0x400fec B.LT 401124 |
(2) 0x400ff0 MADD W4, W20, W19, W3 |
(2) 0x400ff4 ADD W0, W20, W4 |
(2) 0x400ff8 STR W0, [SP, #140] |
(2) 0x400ffc CMP W4, W0 |
(2) 0x401000 B.GE 4010f4 |
(2) 0x401004 ADRP X5, |
(2) 0x401008 ADRP X6, |
(2) 0x40100c STP X27, X28, [SP, #80] |
(2) 0x401010 ADRP X7, |
(2) 0x401014 ADRP X8, |
(2) 0x401018 ADRP X28, |
(2) 0x40101c SBFM X19, X4, #0, #31 |
(2) 0x401020 ADD X21, X5, #4080 |
(2) 0x401024 ADD X3, X6, #2672 |
(2) 0x401028 STP X25, X26, [SP, #64] |
(2) 0x40102c ADD X20, X7, #1264 |
(2) 0x401030 ADD X23, X8, #3952 |
(2) 0x401034 ADD X28, X28, #1392 |
(2) 0x401038 ADD X27, SP, #164 |
(2) 0x40103c LDR W26, [X22, #12] |
(2) 0x401040 ADD X25, SP, #160 |
(2) 0x401044 ADD X24, SP, #156 |
(2) 0x401048 STP D8, D9, [SP, #96] |
(2) 0x40104c LDP S9, S8, [X22, #4] |
(2) 0x401050 STR D10, [SP, #112] |
(2) 0x401054 LDR S10, [X22] |
(2) 0x401058 ADRP X22, |
(2) 0x40105c HINT #0 |
(3) 0x401060 FMOV S4, S9 |
(3) 0x401064 FMOV S3, S10 |
(3) 0x401068 ORR X7, XZR, X27 |
(3) 0x40106c ORR X6, XZR, X25 |
(3) 0x401070 ORR X5, XZR, X24 |
(3) 0x401074 LDR S2, [X3, X19,LSL #2] |
(3) 0x401078 ORR X4, XZR, X21 |
(3) 0x40107c ORR X2, XZR, X20 |
(3) 0x401080 STR X3, [SP, #128] |
(3) 0x401084 ORR X1, XZR, X23 |
(3) 0x401088 ORR W0, WZR, W26 |
(3) 0x40108c LDR S1, [X20, X19,LSL #2] |
(3) 0x401090 LDR S0, [X23, X19,LSL #2] |
(3) 0x401094 BL 401280 |
(3) 0x401098 ADRP X9, |
(3) 0x40109c ADD X11, X22, #112 |
(3) 0x4010a0 LDR S0, [X28, X19,LSL #2] |
(3) 0x4010a4 ADD X10, X9, #2800 |
(3) 0x4010a8 LDR S1, [X11, X19,LSL #2] |
(3) 0x4010ac LDR S2, [X10, X19,LSL #2] |
(3) 0x4010b0 LDP S3, S5, [SP, #156] |
(3) 0x4010b4 LDR S4, [SP, #164] |
(3) 0x4010b8 LDR W12, [SP, #140] |
(3) 0x4010bc FMADD S6, S8, S5, S2 |
(3) 0x4010c0 FMADD S16, S8, S3, S0 |
(3) 0x4010c4 LDR X3, [SP, #128] |
(3) 0x4010c8 FMADD S7, S8, S4, S1 |
(3) 0x4010cc STR S6, [X10, X19,LSL #2] |
(3) 0x4010d0 STR S16, [X28, X19,LSL #2] |
(3) 0x4010d4 STR S7, [X11, X19,LSL #2] |
(3) 0x4010d8 ADD X19, X19, #1 |
(3) 0x4010dc CMP W12, W19 |
(3) 0x4010e0 B.GT 401060 |
(2) 0x4010e4 LDP D8, D9, [SP, #96] |
(2) 0x4010e8 LDP X25, X26, [SP, #64] |
(2) 0x4010ec LDP X27, X28, [SP, #80] |
(2) 0x4010f0 LDR D10, [SP, #112] |
(2) 0x4010f4 ADRP X13, |
(2) 0x4010f8 LDR X14, [X13, #4064] |
(2) 0x4010fc LDR X2, [SP, #168] |
(2) 0x401100 LDR X1, [X14] |
(2) 0x401104 SUBS X2, X2, X1 |
(2) 0x401108 MOVZ X1, #0 |
(2) 0x40110c B.NE 401130 |
(2) 0x401110 LDP X19, X20, [SP, #16] |
(2) 0x401114 LDP X21, X22, [SP, #32] |
(2) 0x401118 LDP X23, X24, [SP, #48] |
(2) 0x40111c LDP X29, X30, [SP], #176 |
(2) 0x401120 RET |
(2) 0x401124 ADD W20, W20, #1 |
(2) 0x401128 MOVZ W3, #0 |
(2) 0x40112c B 400ff0 |
0x401130 STP X25, X26, [SP, #64] |
0x401134 STP X27, X28, [SP, #80] |
0x401138 STP D8, D9, [SP, #96] |
0x40113c STR D10, [SP, #112] |
0x401140 BL 400a80 |
Coverage (%) | Name | Source Location | Module |
---|---|---|---|
►97.99+ | __kmp_GOMP_microtask_wrapper(i[...] | libomp.so | |
○ | __kmp_invoke_microtask | libomp.so | |
►2.01+ | GOMP_parallel | libomp.so | |
○ | main | main.c:152 | exec |
○ | __libc_start_main | libc-2.31.so | |
○ | _start | main.c:192 | exec |
Path / |
Source file and lines | main.c:139-146 |
Module | exec |
nb instructions | 25 |
loop length | 100 |
nb stack references | 0 |
front end | 3.13 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 2.00 | 2.00 | 3.25 | 3.25 | 3.25 | 3.25 | 1.00 | 1.00 | 0.00 | 0.00 | 4.50 | 4.50 | 3.00 | 3.50 | 3.50 |
cycles | 2.00 | 2.00 | 3.25 | 3.25 | 3.25 | 3.25 | 1.00 | 1.00 | 0.00 | 0.00 | 4.50 | 4.50 | 3.00 | 3.50 | 3.50 |
Cycles executing div or sqrt instructions | 1.00-0.50 |
Front-end | 3.13 |
Overall L1 | 4.50 |
all | 14% |
load | NA (no load vectorizable/vectorized instructions) |
store | 50% |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 0% |
fma | NA (no fma vectorizable/vectorized instructions) |
div/sqrt | 0% |
other | 0% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ADRP X1, <411fa0> | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
STP X29, X30, [SP, #848]! | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
ADD X29, SP, #0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
LDR X2, [X1, #4064] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 |
STP X23, X24, [SP, #48] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
LDR W23, [X0, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 |
STP X19, X20, [SP, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
STP X21, X22, [SP, #32] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
ORR X22, XZR, X0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
LDR X0, [X2] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 |
STR X0, [SP, #168] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
MOVZ X0, #0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
BL 400a90 <@plt_start@+0xa0> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
ORR W21, WZR, W0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
BL 400a40 <@plt_start@+0x50> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
SDIV W20, W23, W21 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5-12 | 1-0.50 |
ORR W19, WZR, W0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
MSUB W3, W20, W21, W23 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 |
CMP W0, W3 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.33 |
B.LT 401124 <main._omp_fn.1+0x184> | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
STP X25, X26, [SP, #64] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
STP X27, X28, [SP, #80] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
STP D8, D9, [SP, #96] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
STR D10, [SP, #112] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
BL 400a80 <@plt_start@+0x90> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
Source file and lines | main.c:139-146 |
Module | exec |
nb instructions | 25 |
loop length | 100 |
nb stack references | 0 |
front end | 3.13 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 2.00 | 2.00 | 3.25 | 3.25 | 3.25 | 3.25 | 1.00 | 1.00 | 0.00 | 0.00 | 4.50 | 4.50 | 3.00 | 3.50 | 3.50 |
cycles | 2.00 | 2.00 | 3.25 | 3.25 | 3.25 | 3.25 | 1.00 | 1.00 | 0.00 | 0.00 | 4.50 | 4.50 | 3.00 | 3.50 | 3.50 |
Cycles executing div or sqrt instructions | 1.00-0.50 |
Front-end | 3.13 |
Overall L1 | 4.50 |
all | 14% |
load | NA (no load vectorizable/vectorized instructions) |
store | 50% |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 0% |
fma | NA (no fma vectorizable/vectorized instructions) |
div/sqrt | 0% |
other | 0% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ADRP X1, <411fa0> | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
STP X29, X30, [SP, #848]! | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
ADD X29, SP, #0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
LDR X2, [X1, #4064] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 |
STP X23, X24, [SP, #48] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
LDR W23, [X0, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 |
STP X19, X20, [SP, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
STP X21, X22, [SP, #32] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
ORR X22, XZR, X0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
LDR X0, [X2] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 |
STR X0, [SP, #168] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
MOVZ X0, #0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
BL 400a90 <@plt_start@+0xa0> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
ORR W21, WZR, W0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
BL 400a40 <@plt_start@+0x50> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
SDIV W20, W23, W21 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5-12 | 1-0.50 |
ORR W19, WZR, W0 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
MSUB W3, W20, W21, W23 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 |
CMP W0, W3 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.33 |
B.LT 401124 <main._omp_fn.1+0x184> | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
STP X25, X26, [SP, #64] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
STP X27, X28, [SP, #80] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 |
STP D8, D9, [SP, #96] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
STR D10, [SP, #112] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
BL 400a80 <@plt_start@+0x90> | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
Name | Coverage (%) | Time (s) |
---|---|---|
▼main._omp_fn.1– | 0.04 | 0.01 |
▼Loop 2 - main.c:139-146 - exec– | 0 | 0 |
○Loop 3 - main.c:142-146 - exec | 0.04 | 0.01 |