Loop Id: 5 | Module: exec | Source: main.c:113-116 | Coverage: 0.02% |
---|
Loop Id: 5 | Module: exec | Source: main.c:113-116 | Coverage: 0.02% |
---|
0x400fac UBFM X14, X11, #62, #61 |
0x400fb0 FADD Z27.S, Z24.S, Z19.S |
0x400fb4 FADD Z24.S, Z24.S, Z18.S |
0x400fb8 ADD X15, X28, X14 |
0x400fbc FADD Z25.S, Z21.S, Z7.S |
0x400fc0 FADD Z26.S, Z23.S, Z17.S |
0x400fc4 FADD Z23.S, Z23.S, Z16.S |
0x400fc8 FADD Z21.S, Z21.S, Z6.S |
0x400fcc ADD Z29.S, Z22.S, Z31.S |
0x400fd0 FADD Z28.S, Z27.S, Z18.S |
0x400fd4 FADD Z30.S, Z26.S, Z16.S |
0x400fd8 ST1W {Z24.S}, P0, [X15, X23,LSL #2] [3] |
0x400fdc ADD X15, X27, X14 |
0x400fe0 ST1W {Z28.S}, P0, [X26, X11,LSL #2] [5] |
0x400fe4 ST1W {Z23.S}, P0, [X15, X23,LSL #2] [1] |
0x400fe8 ADD X15, X24, X14 |
0x400fec ADD X14, X19, X14 |
0x400ff0 FADD Z23.S, Z25.S, Z6.S |
0x400ff4 ST1W {Z30.S}, P0, [X20, X11,LSL #2] [7] |
0x400ff8 ST1W {Z21.S}, P0, [X15, X23,LSL #2] [6] |
0x400ffc ST1W {Z23.S}, P0, [X21, X11,LSL #2] [4] |
0x401000 MOVPRFX Z21, Z22 |
0x401004 SCVTF Z21.S, P0/M, Z22.S |
0x401008 MOVPRFX Z23, Z29 |
0x40100c SCVTF Z23.S, P0/M, Z29.S |
0x401010 FMAD Z21.S, P0/M, Z8.S, Z24.S |
0x401014 FMAD Z23.S, P0/M, Z8.S, Z28.S |
0x401018 FADD Z24.S, Z27.S, Z19.S |
0x40101c ADD Z22.S, Z20.S, Z22.S |
0x401020 ST1W {Z21.S}, P0, [X14, X23,LSL #2] [8] |
0x401024 ST1W {Z23.S}, P0, [X22, X11,LSL #2] [2] |
0x401028 ADD X11, X11, X16 |
0x40102c FADD Z21.S, Z25.S, Z7.S |
0x401030 FADD Z23.S, Z26.S, Z17.S |
0x401034 CMP X13, X11 |
0x401038 B.NE 400fac |
/home/hbollore/qaas-runs/170-256-3563/intel/HACCmk/build/HACCmk/src/main.c: 113 - 116 |
-------------------------------------------------------------------------------- |
113: xx[i] = xx[i-1] + dx1; |
114: yy[i] = yy[i-1] + dy1; |
115: zz[i] = zz[i-1] + dz1; |
116: mass[i] = (float)i * 0.01f + xx[i]; |
Coverage (%) | Name | Source Location | Module |
---|---|---|---|
►100.00+ | __libc_start_main | libc-2.31.so | |
○ | _start | exec |
Path / |
Metric | Value |
---|---|
CQA speedup if no scalar integer | 1.00 |
CQA speedup if FP arith vectorized | 1.00 |
CQA speedup if fully vectorized | 1.00 |
CQA speedup if no inter-iteration dependency | NA |
CQA speedup if next bottleneck killed | 1.43 |
Bottlenecks | P6, P7, |
Function | main |
Source | main.c:113-116 |
Source loop unroll info | unrolled by 16 |
Source loop unroll confidence level | high |
Unroll/vectorization loop type | NA |
Unroll factor | 16 |
CQA cycles | 10.00 |
CQA cycles if no scalar integer | 10.00 |
CQA cycles if FP arith vectorized | 10.00 |
CQA cycles if fully vectorized | 10.00 |
Front-end cycles | 4.50 |
DIV/SQRT cycles | 0.50 |
P0 cycles | 0.50 |
P1 cycles | 1.75 |
P2 cycles | 1.75 |
P3 cycles | 1.75 |
P4 cycles | 1.75 |
P5 cycles | 10.00 |
P6 cycles | 10.00 |
P7 cycles | 7.00 |
P8 cycles | 7.00 |
P9 cycles | 4.00 |
P10 cycles | 4.00 |
P11 cycles | 0.00 |
P12 cycles | 0.00 |
P13 cycles | 0.00 |
P14 cycles | 0.00 |
Inter-iter dependencies cycles | 4 |
FE+BE cycles (UFS) | NA |
Stall cycles (UFS) | NA |
Nb insns | 36.00 |
Nb uops | 36.00 |
Nb loads | NA |
Nb stores | 8.00 |
Nb stack references | 0.00 |
FLOP/cycle | 12.80 |
Nb FLOP add-sub | 96.00 |
Nb FLOP mul | 0.00 |
Nb FLOP fma | 16.00 |
Nb FLOP div | 0.00 |
Nb FLOP rcp | 0.00 |
Nb FLOP sqrt | 0.00 |
Nb FLOP rsqrt | 0.00 |
Bytes/cycle | 25.60 |
Bytes prefetched | 0.00 |
Bytes loaded | 0.00 |
Bytes stored | 256.00 |
Stride 0 | NA |
Stride 1 | NA |
Stride n | NA |
Stride unknown | NA |
Stride indirect | NA |
Vectorization ratio all | 92.86 |
Vectorization ratio load | NA |
Vectorization ratio store | 100.00 |
Vectorization ratio mul | NA |
Vectorization ratio add_sub | 100.00 |
Vectorization ratio fma | 100.00 |
Vectorization ratio div_sqrt | NA |
Vectorization ratio other | 50.00 |
Vector-efficiency ratio all | 100.00 |
Vector-efficiency ratio load | NA |
Vector-efficiency ratio store | 100.00 |
Vector-efficiency ratio mul | NA |
Vector-efficiency ratio add_sub | 100.00 |
Vector-efficiency ratio fma | 100.00 |
Vector-efficiency ratio div_sqrt | NA |
Vector-efficiency ratio other | 100.00 |
Metric | Value |
---|---|
CQA speedup if no scalar integer | 1.00 |
CQA speedup if FP arith vectorized | 1.00 |
CQA speedup if fully vectorized | 1.00 |
CQA speedup if no inter-iteration dependency | NA |
CQA speedup if next bottleneck killed | 1.43 |
Bottlenecks | P6, P7, |
Function | main |
Source | main.c:113-116 |
Source loop unroll info | unrolled by 16 |
Source loop unroll confidence level | high |
Unroll/vectorization loop type | NA |
Unroll factor | 16 |
CQA cycles | 10.00 |
CQA cycles if no scalar integer | 10.00 |
CQA cycles if FP arith vectorized | 10.00 |
CQA cycles if fully vectorized | 10.00 |
Front-end cycles | 4.50 |
DIV/SQRT cycles | 0.50 |
P0 cycles | 0.50 |
P1 cycles | 1.75 |
P2 cycles | 1.75 |
P3 cycles | 1.75 |
P4 cycles | 1.75 |
P5 cycles | 10.00 |
P6 cycles | 10.00 |
P7 cycles | 7.00 |
P8 cycles | 7.00 |
P9 cycles | 4.00 |
P10 cycles | 4.00 |
P11 cycles | 0.00 |
P12 cycles | 0.00 |
P13 cycles | 0.00 |
P14 cycles | 0.00 |
Inter-iter dependencies cycles | 4 |
FE+BE cycles (UFS) | NA |
Stall cycles (UFS) | NA |
Nb insns | 36.00 |
Nb uops | 36.00 |
Nb loads | NA |
Nb stores | 8.00 |
Nb stack references | 0.00 |
FLOP/cycle | 12.80 |
Nb FLOP add-sub | 96.00 |
Nb FLOP mul | 0.00 |
Nb FLOP fma | 16.00 |
Nb FLOP div | 0.00 |
Nb FLOP rcp | 0.00 |
Nb FLOP sqrt | 0.00 |
Nb FLOP rsqrt | 0.00 |
Bytes/cycle | 25.60 |
Bytes prefetched | 0.00 |
Bytes loaded | 0.00 |
Bytes stored | 256.00 |
Stride 0 | NA |
Stride 1 | NA |
Stride n | NA |
Stride unknown | NA |
Stride indirect | NA |
Vectorization ratio all | 92.86 |
Vectorization ratio load | NA |
Vectorization ratio store | 100.00 |
Vectorization ratio mul | NA |
Vectorization ratio add_sub | 100.00 |
Vectorization ratio fma | 100.00 |
Vectorization ratio div_sqrt | NA |
Vectorization ratio other | 50.00 |
Vector-efficiency ratio all | 100.00 |
Vector-efficiency ratio load | NA |
Vector-efficiency ratio store | 100.00 |
Vector-efficiency ratio mul | NA |
Vector-efficiency ratio add_sub | 100.00 |
Vector-efficiency ratio fma | 100.00 |
Vector-efficiency ratio div_sqrt | NA |
Vector-efficiency ratio other | 100.00 |
Path / |
Function | main |
Source file and lines | main.c:113-116 |
Module | exec |
nb instructions | 36 |
loop length | 144 |
nb stack references | 0 |
front end | 4.50 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 0.50 | 0.50 | 1.75 | 1.75 | 1.75 | 1.75 | 10.00 | 10.00 | 4.00 | 4.00 | 4.00 | 4.00 | 0.00 | 0.00 | 0.00 |
cycles | 0.50 | 0.50 | 1.75 | 1.75 | 1.75 | 1.75 | 10.00 | 10.00 | 7.00 | 7.00 | 4.00 | 4.00 | 0.00 | 0.00 | 0.00 |
Cycles executing div or sqrt instructions | NA |
Longest recurrence chain latency (RecMII) | 4.00 |
Front-end | 4.50 |
Data deps. | 4.00 |
Overall L1 | 10.00 |
all | 85% |
load | NA (no load vectorizable/vectorized instructions) |
store | 100% |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 100% |
fma | NA (no fma vectorizable/vectorized instructions) |
other | 50% |
all | 100% |
load | NA (no load vectorizable/vectorized instructions) |
store | NA (no store vectorizable/vectorized instructions) |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 100% |
fma | 100% |
div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
other | NA (no other vectorizable/vectorized instructions) |
all | 92% |
load | NA (no load vectorizable/vectorized instructions) |
store | 100% |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 100% |
fma | 100% |
div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
other | 50% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
UBFM X14, X11, #62, #61 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
FADD Z27.S, Z24.S, Z19.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
FADD Z24.S, Z24.S, Z18.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
ADD X15, X28, X14 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
FADD Z25.S, Z21.S, Z7.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
FADD Z26.S, Z23.S, Z17.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
FADD Z23.S, Z23.S, Z16.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
FADD Z21.S, Z21.S, Z6.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
ADD Z29.S, Z22.S, Z31.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
FADD Z28.S, Z27.S, Z18.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
FADD Z30.S, Z26.S, Z16.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
ST1W {Z24.S}, P0, [X15, X23,LSL #2] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
ADD X15, X27, X14 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
ST1W {Z28.S}, P0, [X26, X11,LSL #2] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
ST1W {Z23.S}, P0, [X15, X23,LSL #2] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
ADD X15, X24, X14 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
ADD X14, X19, X14 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
FADD Z23.S, Z25.S, Z6.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
ST1W {Z30.S}, P0, [X20, X11,LSL #2] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
ST1W {Z21.S}, P0, [X15, X23,LSL #2] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
ST1W {Z23.S}, P0, [X21, X11,LSL #2] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
MOVPRFX Z21, Z22 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
SCVTF Z21.S, P0/M, Z22.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 2 |
MOVPRFX Z23, Z29 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
SCVTF Z23.S, P0/M, Z29.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 2 |
FMAD Z21.S, P0/M, Z8.S, Z24.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
FMAD Z23.S, P0/M, Z8.S, Z28.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
FADD Z24.S, Z27.S, Z19.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
ADD Z22.S, Z20.S, Z22.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
ST1W {Z21.S}, P0, [X14, X23,LSL #2] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
ST1W {Z23.S}, P0, [X22, X11,LSL #2] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
ADD X11, X11, X16 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
FADD Z21.S, Z25.S, Z7.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
FADD Z23.S, Z26.S, Z17.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
CMP X13, X11 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.33 |
B.NE 400fac <main+0x2ec> | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
Function | main |
Source file and lines | main.c:113-116 |
Module | exec |
nb instructions | 36 |
loop length | 144 |
nb stack references | 0 |
front end | 4.50 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 0.50 | 0.50 | 1.75 | 1.75 | 1.75 | 1.75 | 10.00 | 10.00 | 4.00 | 4.00 | 4.00 | 4.00 | 0.00 | 0.00 | 0.00 |
cycles | 0.50 | 0.50 | 1.75 | 1.75 | 1.75 | 1.75 | 10.00 | 10.00 | 7.00 | 7.00 | 4.00 | 4.00 | 0.00 | 0.00 | 0.00 |
Cycles executing div or sqrt instructions | NA |
Longest recurrence chain latency (RecMII) | 4.00 |
Front-end | 4.50 |
Data deps. | 4.00 |
Overall L1 | 10.00 |
all | 85% |
load | NA (no load vectorizable/vectorized instructions) |
store | 100% |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 100% |
fma | NA (no fma vectorizable/vectorized instructions) |
other | 50% |
all | 100% |
load | NA (no load vectorizable/vectorized instructions) |
store | NA (no store vectorizable/vectorized instructions) |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 100% |
fma | 100% |
div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
other | NA (no other vectorizable/vectorized instructions) |
all | 92% |
load | NA (no load vectorizable/vectorized instructions) |
store | 100% |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 100% |
fma | 100% |
div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
other | 50% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
UBFM X14, X11, #62, #61 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
FADD Z27.S, Z24.S, Z19.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
FADD Z24.S, Z24.S, Z18.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
ADD X15, X28, X14 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
FADD Z25.S, Z21.S, Z7.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
FADD Z26.S, Z23.S, Z17.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
FADD Z23.S, Z23.S, Z16.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
FADD Z21.S, Z21.S, Z6.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
ADD Z29.S, Z22.S, Z31.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
FADD Z28.S, Z27.S, Z18.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
FADD Z30.S, Z26.S, Z16.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
ST1W {Z24.S}, P0, [X15, X23,LSL #2] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
ADD X15, X27, X14 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
ST1W {Z28.S}, P0, [X26, X11,LSL #2] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
ST1W {Z23.S}, P0, [X15, X23,LSL #2] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
ADD X15, X24, X14 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
ADD X14, X19, X14 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
FADD Z23.S, Z25.S, Z6.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
ST1W {Z30.S}, P0, [X20, X11,LSL #2] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
ST1W {Z21.S}, P0, [X15, X23,LSL #2] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
ST1W {Z23.S}, P0, [X21, X11,LSL #2] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
MOVPRFX Z21, Z22 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
SCVTF Z21.S, P0/M, Z22.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 2 |
MOVPRFX Z23, Z29 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
SCVTF Z23.S, P0/M, Z29.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 2 |
FMAD Z21.S, P0/M, Z8.S, Z24.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
FMAD Z23.S, P0/M, Z8.S, Z28.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
FADD Z24.S, Z27.S, Z19.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
ADD Z22.S, Z20.S, Z22.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
ST1W {Z21.S}, P0, [X14, X23,LSL #2] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
ST1W {Z23.S}, P0, [X22, X11,LSL #2] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
ADD X11, X11, X16 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
FADD Z21.S, Z25.S, Z7.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
FADD Z23.S, Z26.S, Z17.S | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0.50 |
CMP X13, X11 | 1 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.33 |
B.NE 400fac <main+0x2ec> | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |