Loop Id: 5 | Module: exec | Source: Step10_orig.c:19-35 | Coverage: 99.8% |
---|
Loop Id: 5 | Module: exec | Source: Step10_orig.c:19-35 | Coverage: 99.8% |
---|
0x401c20 VMOVUPS (%RSI,%RBX,4),%YMM0 [2] |
0x401c25 VSUBPS 0x40(%RSP),%YMM0,%YMM0 [5] |
0x401c2b VMOVUPS (%RDX,%RBX,4),%YMM11 [6] |
0x401c30 VSUBPS 0x20(%RSP),%YMM11,%YMM15 [5] |
0x401c36 VMOVUPS (%RCX,%RBX,4),%YMM14 [1] |
0x401c3b VMULPS %YMM0,%YMM0,%YMM11 |
0x401c3f VFMADD231PS %YMM15,%YMM15,%YMM11 |
0x401c44 VSUBPS %YMM8,%YMM14,%YMM14 |
0x401c49 VFMADD231PS %YMM14,%YMM14,%YMM11 |
0x401c4e VADDPS %YMM10,%YMM11,%YMM3 |
0x401c53 VCVTPS2PD %XMM3,%YMM2 |
0x401c57 VSQRTPD %YMM2,%YMM1 |
0x401c5b VBROADCASTSS 0x75e4(%RIP),%YMM5 [4] |
0x401c64 VBROADCASTSS 0x75df(%RIP),%YMM7 [4] |
0x401c6d VFMADD213PS %YMM7,%YMM11,%YMM5 |
0x401c72 VBROADCASTSS 0x75d5(%RIP),%YMM7 [4] |
0x401c7b VFMADD213PS %YMM7,%YMM11,%YMM5 |
0x401c80 VMULPD %YMM2,%YMM2,%YMM2 |
0x401c84 VBROADCASTSS 0x75c7(%RIP),%YMM7 [4] |
0x401c8d VFMADD213PS %YMM7,%YMM11,%YMM5 |
0x401c92 VDIVPD %YMM2,%YMM13,%YMM2 |
0x401c96 VBROADCASTSS 0x75b9(%RIP),%YMM7 [4] |
0x401c9f VFMADD213PS %YMM7,%YMM11,%YMM5 |
0x401ca4 VBROADCASTSS 0x75af(%RIP),%YMM7 [4] |
0x401cad VFMADD213PS %YMM7,%YMM11,%YMM5 |
0x401cb2 VCVTPS2PD %XMM5,%YMM7 |
0x401cb6 VFMADD231PD %YMM2,%YMM1,%YMM7 |
0x401cbb VEXTRACTF128 $0x1,%YMM3,%XMM1 |
0x401cc1 VCVTPS2PD %XMM1,%YMM1 |
0x401cc5 VMULPD %YMM1,%YMM1,%YMM2 |
0x401cc9 VSQRTPD %YMM1,%YMM1 |
0x401ccd VDIVPD %YMM2,%YMM13,%YMM2 |
0x401cd1 VEXTRACTF128 $0x1,%YMM5,%XMM3 |
0x401cd7 VCVTPS2PD %XMM3,%YMM3 |
0x401cdb VCVTPD2PS %YMM7,%XMM5 |
0x401cdf VFMADD231PD %YMM2,%YMM1,%YMM3 |
0x401ce4 VCVTPD2PS %YMM3,%XMM1 |
0x401ce8 VINSERTF128 $0x1,%XMM1,%YMM5,%YMM1 |
0x401cee VCMPPS $0x1,%YMM9,%YMM11,%YMM2 |
0x401cf4 VMASKMOVPS (%R8,%RBX,4),%YMM2,%YMM2 [3] |
0x401cfa VMULPS %YMM1,%YMM2,%YMM1 |
0x401cfe VXORPS %XMM2,%XMM2,%XMM2 |
0x401d02 VCMPPS $0x1,%YMM11,%YMM2,%YMM2 |
0x401d08 VANDPS %YMM1,%YMM2,%YMM1 |
0x401d0c VFMADD231PS %YMM0,%YMM1,%YMM4 |
0x401d11 VFMADD231PS %YMM15,%YMM1,%YMM12 |
0x401d16 VFMADD231PS %YMM14,%YMM1,%YMM6 |
0x401d1b ADD $0x8,%RBX |
0x401d1f CMP %RDI,%RBX |
0x401d22 JB 401c20 |
/home/kcamus/qaas_runs/169-401-3406/intel/HACCmk/build/HACCmk/src/Step10_orig.c: 19 - 35 |
-------------------------------------------------------------------------------- |
19: for ( j = 0; j < count1; j++ ) |
20: { |
21: dxc = xx1[j] - xxi; |
22: dyc = yy1[j] - yyi; |
23: dzc = zz1[j] - zzi; |
24: |
25: r2 = dxc * dxc + dyc * dyc + dzc * dzc; |
26: |
27: m = ( r2 < fsrrmax2 ) ? mass1[j] : 0.0f; |
28: |
29: f = pow( r2 + mp_rsm2, -1.5 ) - ( ma0 + r2*(ma1 + r2*(ma2 + r2*(ma3 + r2*(ma4 + r2*ma5))))); |
30: |
31: f = ( r2 > 0.0f ) ? m * f : 0.0f; |
32: |
33: xi = xi + f * dxc; |
34: yi = yi + f * dyc; |
35: zi = zi + f * dzc; |
Coverage (%) | Name | Source Location | Module |
---|---|---|---|
►100.00+ | main.extracted.8 | main.c:142 | exec |
○ | __kmp_invoke_microtask | libiomp5.so | |
○ | __kmp_fork_call | libiomp5.so | |
○ | __kmpc_fork_call | libiomp5.so | |
○ | main | main.c:139 | exec |
○ | __libc_init_first | libc.so.6 |
Path / |
Metric | Value |
---|---|
CQA speedup if no scalar integer | 1.00 |
CQA speedup if FP arith vectorized | 1.00 |
CQA speedup if fully vectorized | 1.00 |
CQA speedup if no inter-iteration dependency | NA |
CQA speedup if next bottleneck killed | 2.13 - 2.50 |
Bottlenecks | P0, |
Function | Step10_orig |
Source | Step10_orig.c:19-35 |
Source loop unroll info | not unrolled or unrolled with no peel/tail loop |
Source loop unroll confidence level | max |
Unroll/vectorization loop type | NA |
Unroll factor | NA |
CQA cycles | 34.00 - 40.00 |
CQA cycles if no scalar integer | 34.00 - 40.00 |
CQA cycles if FP arith vectorized | 34.00 - 40.00 |
CQA cycles if fully vectorized | 34.00 - 40.00 |
Front-end cycles | 14.00 |
DIV/SQRT cycles | 16.00 |
P0 cycles | 16.00 |
P1 cycles | 6.00 |
P2 cycles | 6.00 |
P3 cycles | 0.00 |
P4 cycles | 11.00 |
P5 cycles | 2.00 |
P6 cycles | 0.00 |
P7 cycles | 34.00 - 40.00 |
Inter-iter dependencies cycles | 4 |
FE+BE cycles (UFS) | 34.83 - 42.29 |
Stall cycles (UFS) | 20.38 - 27.84 |
Nb insns | 50.00 |
Nb uops | 56.00 |
Nb loads | 12.00 |
Nb stores | 0.00 |
Nb stack references | 2.00 |
FLOP/cycle | 7.29 - 6.20 |
Nb FLOP add-sub | 32.00 |
Nb FLOP mul | 24.00 |
Nb FLOP fma | 88.00 |
Nb FLOP div | 8.00 |
Nb FLOP rcp | 0.00 |
Nb FLOP sqrt | 8.00 |
Nb FLOP rsqrt | 0.00 |
Bytes/cycle | 5.40 - 6.35 |
Bytes prefetched | 0.00 |
Bytes loaded | 216.00 |
Bytes stored | 0.00 |
Stride 0 | 2.00 |
Stride 1 | 4.00 |
Stride n | 0.00 |
Stride unknown | 0.00 |
Stride indirect | 0.00 |
Vectorization ratio all | 87.23 |
Vectorization ratio load | 50.00 |
Vectorization ratio store | NA |
Vectorization ratio mul | 100.00 |
Vectorization ratio add_sub | 100.00 |
Vectorization ratio fma | 100.00 |
Vectorization ratio div_sqrt | 100.00 |
Vectorization ratio other | 68.42 |
Vector-efficiency ratio all | 40.16 |
Vector-efficiency ratio load | 28.13 |
Vector-efficiency ratio store | NA |
Vector-efficiency ratio mul | 50.00 |
Vector-efficiency ratio add_sub | 50.00 |
Vector-efficiency ratio fma | 50.00 |
Vector-efficiency ratio div_sqrt | 50.00 |
Vector-efficiency ratio other | 25.66 |
Metric | Value |
---|---|
CQA speedup if no scalar integer | 1.00 |
CQA speedup if FP arith vectorized | 1.00 |
CQA speedup if fully vectorized | 1.00 |
CQA speedup if no inter-iteration dependency | NA |
CQA speedup if next bottleneck killed | 2.13 - 2.50 |
Bottlenecks | P0, |
Function | Step10_orig |
Source | Step10_orig.c:19-35 |
Source loop unroll info | not unrolled or unrolled with no peel/tail loop |
Source loop unroll confidence level | max |
Unroll/vectorization loop type | NA |
Unroll factor | NA |
CQA cycles | 34.00 - 40.00 |
CQA cycles if no scalar integer | 34.00 - 40.00 |
CQA cycles if FP arith vectorized | 34.00 - 40.00 |
CQA cycles if fully vectorized | 34.00 - 40.00 |
Front-end cycles | 14.00 |
DIV/SQRT cycles | 16.00 |
P0 cycles | 16.00 |
P1 cycles | 6.00 |
P2 cycles | 6.00 |
P3 cycles | 0.00 |
P4 cycles | 11.00 |
P5 cycles | 2.00 |
P6 cycles | 0.00 |
P7 cycles | 34.00 - 40.00 |
Inter-iter dependencies cycles | 4 |
FE+BE cycles (UFS) | 34.83 - 42.29 |
Stall cycles (UFS) | 20.38 - 27.84 |
Nb insns | 50.00 |
Nb uops | 56.00 |
Nb loads | 12.00 |
Nb stores | 0.00 |
Nb stack references | 2.00 |
FLOP/cycle | 7.29 - 6.20 |
Nb FLOP add-sub | 32.00 |
Nb FLOP mul | 24.00 |
Nb FLOP fma | 88.00 |
Nb FLOP div | 8.00 |
Nb FLOP rcp | 0.00 |
Nb FLOP sqrt | 8.00 |
Nb FLOP rsqrt | 0.00 |
Bytes/cycle | 5.40 - 6.35 |
Bytes prefetched | 0.00 |
Bytes loaded | 216.00 |
Bytes stored | 0.00 |
Stride 0 | 2.00 |
Stride 1 | 4.00 |
Stride n | 0.00 |
Stride unknown | 0.00 |
Stride indirect | 0.00 |
Vectorization ratio all | 87.23 |
Vectorization ratio load | 50.00 |
Vectorization ratio store | NA |
Vectorization ratio mul | 100.00 |
Vectorization ratio add_sub | 100.00 |
Vectorization ratio fma | 100.00 |
Vectorization ratio div_sqrt | 100.00 |
Vectorization ratio other | 68.42 |
Vector-efficiency ratio all | 40.16 |
Vector-efficiency ratio load | 28.13 |
Vector-efficiency ratio store | NA |
Vector-efficiency ratio mul | 50.00 |
Vector-efficiency ratio add_sub | 50.00 |
Vector-efficiency ratio fma | 50.00 |
Vector-efficiency ratio div_sqrt | 50.00 |
Vector-efficiency ratio other | 25.66 |
Path / |
Function | Step10_orig |
Source file and lines | Step10_orig.c:19-35 |
Module | exec |
nb instructions | 50 |
nb uops | 56 |
loop length | 264 |
used x86 registers | 7 |
used mmx registers | 0 |
used xmm registers | 4 |
used ymm registers | 16 |
used zmm registers | 0 |
nb stack references | 2 |
ADD-SUB / MUL ratio | 1.00 |
micro-operation queue | 14.00 cycles |
front end | 14.00 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | |
---|---|---|---|---|---|---|---|---|
uops | 16.00 | 16.00 | 6.00 | 6.00 | 0.00 | 11.00 | 2.00 | 0.00 |
cycles | 16.00 | 16.00 | 6.00 | 6.00 | 0.00 | 11.00 | 2.00 | 0.00 |
Cycles executing div or sqrt instructions | 34.00-40.00 |
Longest recurrence chain latency (RecMII) | 4.00 |
FE+BE cycles | 34.83-42.29 |
Stall cycles | 20.38-27.84 |
RS full (events) | 0.11 |
PRF_FLOAT full (events) | 23.36-31.80 |
Front-end | 14.00 |
Dispatch | 16.00 |
DIV/SQRT | 34.00-40.00 |
Data deps. | 4.00 |
Overall L1 | 34.00-40.00 |
all | 87% |
load | 50% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 100% |
add-sub | 100% |
fma | 100% |
div/sqrt | 100% |
other | 68% |
all | 40% |
load | 28% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 50% |
add-sub | 50% |
fma | 50% |
div/sqrt | 50% |
other | 25% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|
VMOVUPS (%RSI,%RBX,4),%YMM0 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5-6 | 0.50 |
VSUBPS 0x40(%RSP),%YMM0,%YMM0 | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMOVUPS (%RDX,%RBX,4),%YMM11 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5-6 | 0.50 |
VSUBPS 0x20(%RSP),%YMM11,%YMM15 | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMOVUPS (%RCX,%RBX,4),%YMM14 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5-6 | 0.50 |
VMULPS %YMM0,%YMM0,%YMM11 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231PS %YMM15,%YMM15,%YMM11 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VSUBPS %YMM8,%YMM14,%YMM14 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231PS %YMM14,%YMM14,%YMM11 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VADDPS %YMM10,%YMM11,%YMM3 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTPS2PD %XMM3,%YMM2 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 7 | 1 |
VSQRTPD %YMM2,%YMM1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13-19 | 9-12 |
VBROADCASTSS 0x75e4(%RIP),%YMM5 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5 | 0.50 |
VBROADCASTSS 0x75df(%RIP),%YMM7 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5 | 0.50 |
VFMADD213PS %YMM7,%YMM11,%YMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VBROADCASTSS 0x75d5(%RIP),%YMM7 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5 | 0.50 |
VFMADD213PS %YMM7,%YMM11,%YMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMULPD %YMM2,%YMM2,%YMM2 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VBROADCASTSS 0x75c7(%RIP),%YMM7 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5 | 0.50 |
VFMADD213PS %YMM7,%YMM11,%YMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VDIVPD %YMM2,%YMM13,%YMM2 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13-14 | 8 |
VBROADCASTSS 0x75b9(%RIP),%YMM7 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5 | 0.50 |
VFMADD213PS %YMM7,%YMM11,%YMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VBROADCASTSS 0x75af(%RIP),%YMM7 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5 | 0.50 |
VFMADD213PS %YMM7,%YMM11,%YMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTPS2PD %XMM5,%YMM7 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 7 | 1 |
VFMADD231PD %YMM2,%YMM1,%YMM7 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VEXTRACTF128 $0x1,%YMM3,%XMM1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 3 | 1 |
VCVTPS2PD %XMM1,%YMM1 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 7 | 1 |
VMULPD %YMM1,%YMM1,%YMM2 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VSQRTPD %YMM1,%YMM1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13-19 | 9-12 |
VDIVPD %YMM2,%YMM13,%YMM2 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13-14 | 8 |
VEXTRACTF128 $0x1,%YMM5,%XMM3 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 3 | 1 |
VCVTPS2PD %XMM3,%YMM3 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 7 | 1 |
VCVTPD2PS %YMM7,%XMM5 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 7 | 1 |
VFMADD231PD %YMM2,%YMM1,%YMM3 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTPD2PS %YMM3,%XMM1 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 7 | 1 |
VINSERTF128 $0x1,%XMM1,%YMM5,%YMM1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 3 | 1 |
VCMPPS $0x1,%YMM9,%YMM11,%YMM2 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMASKMOVPS (%R8,%RBX,4),%YMM2,%YMM2 | 2 | 0.33 | 0.33 | 0.50 | 0.50 | 0 | 0.33 | 0 | 0 | 3 | 0.50 |
VMULPS %YMM1,%YMM2,%YMM1 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VXORPS %XMM2,%XMM2,%XMM2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VCMPPS $0x1,%YMM11,%YMM2,%YMM2 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VANDPS %YMM1,%YMM2,%YMM1 | 1 | 0.33 | 0.33 | 0 | 0 | 0 | 0.33 | 0 | 0 | 1 | 0.33 |
VFMADD231PS %YMM0,%YMM1,%YMM4 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231PS %YMM15,%YMM1,%YMM12 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231PS %YMM14,%YMM1,%YMM6 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
ADD $0x8,%RBX | 1 | 0.25 | 0.25 | 0 | 0 | 0 | 0.25 | 0.25 | 0 | 1 | 0.25 |
CMP %RDI,%RBX | 1 | 0.25 | 0.25 | 0 | 0 | 0 | 0.25 | 0.25 | 0 | 1 | 0.25 |
JB 401c20 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0.50-1 |
Function | Step10_orig |
Source file and lines | Step10_orig.c:19-35 |
Module | exec |
nb instructions | 50 |
nb uops | 56 |
loop length | 264 |
used x86 registers | 7 |
used mmx registers | 0 |
used xmm registers | 4 |
used ymm registers | 16 |
used zmm registers | 0 |
nb stack references | 2 |
ADD-SUB / MUL ratio | 1.00 |
micro-operation queue | 14.00 cycles |
front end | 14.00 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | |
---|---|---|---|---|---|---|---|---|
uops | 16.00 | 16.00 | 6.00 | 6.00 | 0.00 | 11.00 | 2.00 | 0.00 |
cycles | 16.00 | 16.00 | 6.00 | 6.00 | 0.00 | 11.00 | 2.00 | 0.00 |
Cycles executing div or sqrt instructions | 34.00-40.00 |
Longest recurrence chain latency (RecMII) | 4.00 |
FE+BE cycles | 34.83-42.29 |
Stall cycles | 20.38-27.84 |
RS full (events) | 0.11 |
PRF_FLOAT full (events) | 23.36-31.80 |
Front-end | 14.00 |
Dispatch | 16.00 |
DIV/SQRT | 34.00-40.00 |
Data deps. | 4.00 |
Overall L1 | 34.00-40.00 |
all | 87% |
load | 50% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 100% |
add-sub | 100% |
fma | 100% |
div/sqrt | 100% |
other | 68% |
all | 40% |
load | 28% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 50% |
add-sub | 50% |
fma | 50% |
div/sqrt | 50% |
other | 25% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|
VMOVUPS (%RSI,%RBX,4),%YMM0 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5-6 | 0.50 |
VSUBPS 0x40(%RSP),%YMM0,%YMM0 | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMOVUPS (%RDX,%RBX,4),%YMM11 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5-6 | 0.50 |
VSUBPS 0x20(%RSP),%YMM11,%YMM15 | 1 | 0.50 | 0.50 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMOVUPS (%RCX,%RBX,4),%YMM14 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5-6 | 0.50 |
VMULPS %YMM0,%YMM0,%YMM11 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231PS %YMM15,%YMM15,%YMM11 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VSUBPS %YMM8,%YMM14,%YMM14 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231PS %YMM14,%YMM14,%YMM11 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VADDPS %YMM10,%YMM11,%YMM3 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTPS2PD %XMM3,%YMM2 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 7 | 1 |
VSQRTPD %YMM2,%YMM1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13-19 | 9-12 |
VBROADCASTSS 0x75e4(%RIP),%YMM5 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5 | 0.50 |
VBROADCASTSS 0x75df(%RIP),%YMM7 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5 | 0.50 |
VFMADD213PS %YMM7,%YMM11,%YMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VBROADCASTSS 0x75d5(%RIP),%YMM7 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5 | 0.50 |
VFMADD213PS %YMM7,%YMM11,%YMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMULPD %YMM2,%YMM2,%YMM2 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VBROADCASTSS 0x75c7(%RIP),%YMM7 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5 | 0.50 |
VFMADD213PS %YMM7,%YMM11,%YMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VDIVPD %YMM2,%YMM13,%YMM2 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13-14 | 8 |
VBROADCASTSS 0x75b9(%RIP),%YMM7 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5 | 0.50 |
VFMADD213PS %YMM7,%YMM11,%YMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VBROADCASTSS 0x75af(%RIP),%YMM7 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5 | 0.50 |
VFMADD213PS %YMM7,%YMM11,%YMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTPS2PD %XMM5,%YMM7 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 7 | 1 |
VFMADD231PD %YMM2,%YMM1,%YMM7 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VEXTRACTF128 $0x1,%YMM3,%XMM1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 3 | 1 |
VCVTPS2PD %XMM1,%YMM1 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 7 | 1 |
VMULPD %YMM1,%YMM1,%YMM2 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VSQRTPD %YMM1,%YMM1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13-19 | 9-12 |
VDIVPD %YMM2,%YMM13,%YMM2 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13-14 | 8 |
VEXTRACTF128 $0x1,%YMM5,%XMM3 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 3 | 1 |
VCVTPS2PD %XMM3,%YMM3 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 7 | 1 |
VCVTPD2PS %YMM7,%XMM5 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 7 | 1 |
VFMADD231PD %YMM2,%YMM1,%YMM3 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTPD2PS %YMM3,%XMM1 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 7 | 1 |
VINSERTF128 $0x1,%XMM1,%YMM5,%YMM1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 3 | 1 |
VCMPPS $0x1,%YMM9,%YMM11,%YMM2 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMASKMOVPS (%R8,%RBX,4),%YMM2,%YMM2 | 2 | 0.33 | 0.33 | 0.50 | 0.50 | 0 | 0.33 | 0 | 0 | 3 | 0.50 |
VMULPS %YMM1,%YMM2,%YMM1 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VXORPS %XMM2,%XMM2,%XMM2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VCMPPS $0x1,%YMM11,%YMM2,%YMM2 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VANDPS %YMM1,%YMM2,%YMM1 | 1 | 0.33 | 0.33 | 0 | 0 | 0 | 0.33 | 0 | 0 | 1 | 0.33 |
VFMADD231PS %YMM0,%YMM1,%YMM4 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231PS %YMM15,%YMM1,%YMM12 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231PS %YMM14,%YMM1,%YMM6 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
ADD $0x8,%RBX | 1 | 0.25 | 0.25 | 0 | 0 | 0 | 0.25 | 0.25 | 0 | 1 | 0.25 |
CMP %RDI,%RBX | 1 | 0.25 | 0.25 | 0 | 0 | 0 | 0.25 | 0.25 | 0 | 1 | 0.25 |
JB 401c20 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0.50-1 |
Metric | run_0 |
---|---|
Coverage (% app. time) | 99.8 |
Time (s) | 40.36 |
Instance Count | 2190000 |
Iteration Count - min | 50 |
Iteration Count - avg | 961 |
Iteration Count - max | 1872 |
Cycles per Iteration - min | 39.67 |
Cycles per Iteration - avg | 40.65 |
Cycles per Iteration - max | 4264.08 |
Metric | Value |
---|---|
Bucket Coverage (% loop time) | 99.93 |
Instance Count | 2190000 |
ORIG CPI:min | 56.12 |
ORIG CPI:med | 59.16 |
ORIG CPI:max | 61.28 |
DL1 CPI:min | 57.68 |
DL1 CPI:med | 61.44 |
DL1 CPI:max | 70.12 |
ORIG (min) / DL1 (min) | 0.97 |
ORIG (med) / DL1 (med) | 0.96 |
ORIG (max) / DL1 (max) | 0.87 |
Nb Iteration:min | 50 |
Nb Iteration:med | 50.00 |
Nb Iteration:max | 50 |
ORIG: min (cycles) | 2806 |
ORIG: med (cycles) | 2958.00 |
ORIG: max (cycles) | 3064 |
DL1:min (cycles) | 2884 |
DL1:med (cycles) | 3072.00 |
DL1:max (cycles) | 3506 |
Metric (average per iteration except for Time and Iteration Count) | ORIG | DL1 | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Min (Thread) | Med (Thread) | Avg (Thread) | Max (Thread) | Min (Instances) | Med (Instances) | Max (Instances) | Min (Thread) | Med (Thread) | Avg (Thread) | Max (Thread) | Min (Instances) | Med (Instances) | Max (Instances) | |
Time | 2958.00 | 2958.00 | 2958.00 | 2958.00 | 2806.00 | 2958.00 | 3064.00 | 3072.00 | 3072.00 | 3072.00 | 3072.00 | 2884.00 | 3072.00 | 3506.00 |
CPI MIN | 56.12 | 57.68 | ||||||||||||
CPI MED | 59.16 | 59.16 | 59.16 | 59.16 | 56.12 | 59.16 | 61.28 | 61.44 | 61.44 | 61.44 | 61.44 | 57.68 | 61.44 | 70.12 |
CPI AVG | 59.17 | 61.31 | ||||||||||||
CPI MAX | 61.28 | 70.12 | ||||||||||||
Iteration Count | 50.00 | 50.00 | 50.00 | 50.00 | 50.00 | 50.00 | 50.00 | 50.00 | 50.00 | 50.00 | 50.00 | 50.00 | 50.00 | 50.00 |
ORIG | DL1 | Original Code |
---|---|---|
0x6bdf2e ADDQ $0x1,-0x21f6(%RIP) 0x6bdf36 VMOVUPS (%RSI,%RBX,4),%YMM0 | 0x6be37d VMOVUPS -0x2d85(%RIP),%YMM0 | 0x401c20 VMOVUPS (%RSI,%RBX,4),%YMM0 |
0x6bdf3b VSUBPS 0x40(%RSP),%YMM0,%YMM0 | 0x6be385 VSUBPS -0x318d(%RIP),%YMM0,%YMM0 | 0x401c25 VSUBPS 0x40(%RSP),%YMM0,%YMM0 |
0x6bdf41 VMOVUPS (%RDX,%RBX,4),%YMM11 | 0x6be38d VMOVUPS -0x2d95(%RIP),%YMM11 | 0x401c2b VMOVUPS (%RDX,%RBX,4),%YMM11 |
0x6bdf46 VSUBPS 0x20(%RSP),%YMM11,%YMM15 | 0x6be395 VSUBPS -0x315d(%RIP),%YMM11,%YMM15 | 0x401c30 VSUBPS 0x20(%RSP),%YMM11,%YMM15 |
0x6bdf4c VMOVUPS (%RCX,%RBX,4),%YMM14 | 0x6be39d VMOVUPS -0x2da5(%RIP),%YMM14 | 0x401c36 VMOVUPS (%RCX,%RBX,4),%YMM14 |
0x6bdf51 VMULPS %YMM0,%YMM0,%YMM11 | 0x6be3a5 VMULPS %YMM0,%YMM0,%YMM11 | 0x401c3b VMULPS %YMM0,%YMM0,%YMM11 |
0x6bdf55 VFMADD231PS %YMM15,%YMM15,%YMM11 | 0x6be3a9 VFMADD231PS %YMM15,%YMM15,%YMM11 | 0x401c3f VFMADD231PS %YMM15,%YMM15,%YMM11 |
0x6bdf5a VSUBPS %YMM8,%YMM14,%YMM14 | 0x6be3ae VSUBPS %YMM8,%YMM14,%YMM14 | 0x401c44 VSUBPS %YMM8,%YMM14,%YMM14 |
0x6bdf5f VFMADD231PS %YMM14,%YMM14,%YMM11 | 0x6be3b3 VFMADD231PS %YMM14,%YMM14,%YMM11 | 0x401c49 VFMADD231PS %YMM14,%YMM14,%YMM11 |
0x6bdf64 VADDPS %YMM10,%YMM11,%YMM3 | 0x6be3b8 VADDPS %YMM10,%YMM11,%YMM3 | 0x401c4e VADDPS %YMM10,%YMM11,%YMM3 |
0x6bdf69 VCVTPS2PD %XMM3,%YMM2 | 0x6be3bd VCVTPS2PD %XMM3,%YMM2 | 0x401c53 VCVTPS2PD %XMM3,%YMM2 |
0x6bdf6d VSQRTPD %YMM2,%YMM1 | 0x6be3c1 VSQRTPD -0x2f89(%RIP),%YMM1 | 0x401c57 VSQRTPD %YMM2,%YMM1 |
0x6bdf71 VBROADCASTSS -0x2b4d32(%RIP),%YMM5 | 0x6be3c9 VBROADCASTSS -0x2b518a(%RIP),%YMM5 | 0x401c5b VBROADCASTSS 0x75e4(%RIP),%YMM5 |
0x6bdf7a VBROADCASTSS -0x2b4d37(%RIP),%YMM7 | 0x6be3d2 VBROADCASTSS -0x2b518f(%RIP),%YMM7 | 0x401c64 VBROADCASTSS 0x75df(%RIP),%YMM7 |
0x6bdf83 VFMADD213PS %YMM7,%YMM11,%YMM5 | 0x6be3db VFMADD213PS %YMM7,%YMM11,%YMM5 | 0x401c6d VFMADD213PS %YMM7,%YMM11,%YMM5 |
0x6bdf88 VBROADCASTSS -0x2b4d41(%RIP),%YMM7 | 0x6be3e0 VBROADCASTSS -0x2b5199(%RIP),%YMM7 | 0x401c72 VBROADCASTSS 0x75d5(%RIP),%YMM7 |
0x6bdf91 VFMADD213PS %YMM7,%YMM11,%YMM5 | 0x6be3e9 VFMADD213PS %YMM7,%YMM11,%YMM5 | 0x401c7b VFMADD213PS %YMM7,%YMM11,%YMM5 |
0x6bdf96 VMULPD %YMM2,%YMM2,%YMM2 | 0x6be3ee VMULPD %YMM2,%YMM2,%YMM2 | 0x401c80 VMULPD %YMM2,%YMM2,%YMM2 |
0x6bdf9a VBROADCASTSS -0x2b4d4f(%RIP),%YMM7 | 0x6be3f2 VBROADCASTSS -0x2b51a7(%RIP),%YMM7 | 0x401c84 VBROADCASTSS 0x75c7(%RIP),%YMM7 |
0x6bdfa3 VFMADD213PS %YMM7,%YMM11,%YMM5 | 0x6be3fb VFMADD213PS %YMM7,%YMM11,%YMM5 | 0x401c8d VFMADD213PS %YMM7,%YMM11,%YMM5 |
0x6bdfa8 VDIVPD %YMM2,%YMM13,%YMM2 | 0x6be400 VMOVUPD -0x2f48(%RIP),%YMM13 0x6be408 VDIVPD -0x2f90(%RIP),%YMM13,%YMM2 | 0x401c92 VDIVPD %YMM2,%YMM13,%YMM2 |
0x6bdfac VBROADCASTSS -0x2b4d5d(%RIP),%YMM7 | 0x6be410 VBROADCASTSS -0x2b51c1(%RIP),%YMM7 | 0x401c96 VBROADCASTSS 0x75b9(%RIP),%YMM7 |
0x6bdfb5 VFMADD213PS %YMM7,%YMM11,%YMM5 | 0x6be419 VFMADD213PS %YMM7,%YMM11,%YMM5 | 0x401c9f VFMADD213PS %YMM7,%YMM11,%YMM5 |
0x6bdfba VBROADCASTSS -0x2b4d67(%RIP),%YMM7 | 0x6be41e VBROADCASTSS -0x2b51cb(%RIP),%YMM7 | 0x401ca4 VBROADCASTSS 0x75af(%RIP),%YMM7 |
0x6bdfc3 VFMADD213PS %YMM7,%YMM11,%YMM5 | 0x6be427 VFMADD213PS %YMM7,%YMM11,%YMM5 | 0x401cad VFMADD213PS %YMM7,%YMM11,%YMM5 |
0x6bdfc8 VCVTPS2PD %XMM5,%YMM7 | 0x6be42c VCVTPS2PD %XMM5,%YMM7 | 0x401cb2 VCVTPS2PD %XMM5,%YMM7 |
0x6bdfcc VFMADD231PD %YMM2,%YMM1,%YMM7 | 0x6be430 VFMADD231PD %YMM2,%YMM1,%YMM7 | 0x401cb6 VFMADD231PD %YMM2,%YMM1,%YMM7 |
0x6bdfd1 VEXTRACTF128 $0x1,%YMM3,%XMM1 | 0x6be435 VEXTRACTF128 $0x1,%YMM3,%XMM1 | 0x401cbb VEXTRACTF128 $0x1,%YMM3,%XMM1 |
0x6bdfd7 VCVTPS2PD %XMM1,%YMM1 | 0x6be43b VCVTPS2PD %XMM1,%YMM1 | 0x401cc1 VCVTPS2PD %XMM1,%YMM1 |
0x6bdfdb VMULPD %YMM1,%YMM1,%YMM2 | 0x6be43f VMULPD %YMM1,%YMM1,%YMM2 | 0x401cc5 VMULPD %YMM1,%YMM1,%YMM2 |
0x6bdfdf VSQRTPD %YMM1,%YMM1 | 0x6be443 VSQRTPD -0x2f0b(%RIP),%YMM1 | 0x401cc9 VSQRTPD %YMM1,%YMM1 |
0x6bdfe3 VDIVPD %YMM2,%YMM13,%YMM2 | 0x6be44b VMOVUPD -0x2e93(%RIP),%YMM13 0x6be453 VDIVPD -0x2edb(%RIP),%YMM13,%YMM2 | 0x401ccd VDIVPD %YMM2,%YMM13,%YMM2 |
0x6bdfe7 VEXTRACTF128 $0x1,%YMM5,%XMM3 | 0x6be45b VEXTRACTF128 $0x1,%YMM5,%XMM3 | 0x401cd1 VEXTRACTF128 $0x1,%YMM5,%XMM3 |
0x6bdfed VCVTPS2PD %XMM3,%YMM3 | 0x6be461 VCVTPS2PD %XMM3,%YMM3 | 0x401cd7 VCVTPS2PD %XMM3,%YMM3 |
0x6bdff1 VCVTPD2PS %YMM7,%XMM5 | 0x6be465 VCVTPD2PS %YMM7,%XMM5 | 0x401cdb VCVTPD2PS %YMM7,%XMM5 |
0x6bdff5 VFMADD231PD %YMM2,%YMM1,%YMM3 | 0x6be469 VFMADD231PD %YMM2,%YMM1,%YMM3 | 0x401cdf VFMADD231PD %YMM2,%YMM1,%YMM3 |
0x6bdffa VCVTPD2PS %YMM3,%XMM1 | 0x6be46e VCVTPD2PS %YMM3,%XMM1 | 0x401ce4 VCVTPD2PS %YMM3,%XMM1 |
0x6bdffe VINSERTF128 $0x1,%XMM1,%YMM5,%YMM1 | 0x6be472 VINSERTF128 $0x1,%XMM1,%YMM5,%YMM1 | 0x401ce8 VINSERTF128 $0x1,%XMM1,%YMM5,%YMM1 |
0x6be004 VCMPPS $0x1,%YMM9,%YMM11,%YMM2 | 0x6be478 VCMPPS $0x1,%YMM9,%YMM11,%YMM2 | 0x401cee VCMPPS $0x1,%YMM9,%YMM11,%YMM2 |
0x6be00a VMASKMOVPS (%R8,%RBX,4),%YMM2,%YMM2 | 0x6be47e VMASKMOVPS -0x2e87(%RIP),%YMM2,%YMM2 | 0x401cf4 VMASKMOVPS (%R8,%RBX,4),%YMM2,%YMM2 |
0x6be010 VMULPS %YMM1,%YMM2,%YMM1 | 0x6be487 VMULPS %YMM1,%YMM2,%YMM1 | 0x401cfa VMULPS %YMM1,%YMM2,%YMM1 |
0x6be014 VXORPS %XMM2,%XMM2,%XMM2 | 0x6be48b VXORPS %XMM2,%XMM2,%XMM2 | 0x401cfe VXORPS %XMM2,%XMM2,%XMM2 |
0x6be018 VCMPPS $0x1,%YMM11,%YMM2,%YMM2 | 0x6be48f VCMPPS $0x1,%YMM11,%YMM2,%YMM2 | 0x401d02 VCMPPS $0x1,%YMM11,%YMM2,%YMM2 |
0x6be01e VANDPS %YMM1,%YMM2,%YMM1 | 0x6be495 VANDPS %YMM1,%YMM2,%YMM1 | 0x401d08 VANDPS %YMM1,%YMM2,%YMM1 |
0x6be022 VFMADD231PS %YMM0,%YMM1,%YMM4 | 0x6be499 VFMADD231PS %YMM0,%YMM1,%YMM4 | 0x401d0c VFMADD231PS %YMM0,%YMM1,%YMM4 |
0x6be027 VFMADD231PS %YMM15,%YMM1,%YMM12 | 0x6be49e VFMADD231PS %YMM15,%YMM1,%YMM12 | 0x401d11 VFMADD231PS %YMM15,%YMM1,%YMM12 |
0x6be02c VFMADD231PS %YMM14,%YMM1,%YMM6 | 0x6be4a3 VFMADD231PS %YMM14,%YMM1,%YMM6 | 0x401d16 VFMADD231PS %YMM14,%YMM1,%YMM6 |
0x6be031 ADD $0x8,%RBX | 0x6be4a8 ADD $0x8,%RBX | 0x401d1b ADD $0x8,%RBX |
0x6be035 CMP %RDI,%RBX | 0x6be4ac CMP %RDI,%RBX | 0x401d1f CMP %RDI,%RBX |
0x6be038 JB 6bdf2e | 0x6be4af JB 6be37d | 0x401d22 JB 401c20 |
Path / |
Metric | ORIG | DL1 | Original |
---|---|---|---|
FP operations per cycle L1 | 6.20, 7.29, | 6.20, 7.29, | 6.20, 7.29, |
cycles L1 CQA | 40.00 | 40.00 | 40.00 |
cycles UFS | 42.54 | 41.79 | 42.29 |
bytes loaded | 224.00 | 408.00 | 216.00 |
bytes stored | 8.00 | 0.00 | 0.00 |
nb loads | 13.00 | 18.00 | 12.00 |
nb stores | 1.00 | 0.00 | 0.00 |
cycles dispatch | 16.00 | 16.00 | 16.00 |
cycles front end | 14.50 | 14.50 | 14.00 |
cycles P0 | 16.00 | 16.00 | 16.00 |
cycles P1 | 16.00 | 16.00 | 16.00 |
cycles P2 | 6.50 | 9.00 | 6.00 |
cycles P3 | 6.50 | 9.00 | 6.00 |
cycles P4 | 1.00 | 0.00 | 0.00 |
cycles P5 | 11.00 | 11.00 | 11.00 |
cycles P6 | 3.00 | 2.00 | 2.00 |
cycles P7 | 1.00 | 0.00 | 0.00 |
stall cycles | 27.57 | 26.84 | 27.84 |
LB full | 0.00 | 0.00 | 0.00 |
LM full | 0.00 | 0.00 | 0.00 |
PRF full | 0.00 | 0.00 | 0.00 |
PRF_FLOAT full | 31.78 | 30.81 | 31.80 |
PRF_INT full | 0.00 | 0.00 | 0.00 |
ROB full | 0.00 | 0.00 | 0.00 |
RS full | 0.14 | 0.07 | 0.11 |
SB full | 0.00 | 0.00 | 0.00 |
nb uops | 58.00 | 58.00 | 56.00 |
uops P0 | 16.00 | 16.00 | 16.00 |
uops P1 | 16.00 | 16.00 | 16.00 |
uops P2 | 6.50 | 9.00 | 6.00 |
uops P3 | 6.50 | 9.00 | 6.00 |
uops P4 | 1.00 | 0.00 | 0.00 |
uops P5 | 11.00 | 11.00 | 11.00 |
uops P6 | 3.00 | 2.00 | 2.00 |
uops P7 | 1.00 | 0.00 | 0.00 |
ID | 11 | 13 | 5 |
Metric | ORIG | DL1 | Original |
---|---|---|---|
FP operations per cycle L1 | 6.20, 7.29, | 6.20, 7.29, | 6.20, 7.29, |
cycles L1 CQA | 40.00 | 40.00 | 40.00 |
cycles UFS | 42.54 | 41.79 | 42.29 |
bytes loaded | 224.00 | 408.00 | 216.00 |
bytes stored | 8.00 | 0.00 | 0.00 |
nb loads | 13.00 | 18.00 | 12.00 |
nb stores | 1.00 | 0.00 | 0.00 |
cycles dispatch | 16.00 | 16.00 | 16.00 |
cycles front end | 14.50 | 14.50 | 14.00 |
cycles P0 | 16.00 | 16.00 | 16.00 |
cycles P1 | 16.00 | 16.00 | 16.00 |
cycles P2 | 6.50 | 9.00 | 6.00 |
cycles P3 | 6.50 | 9.00 | 6.00 |
cycles P4 | 1.00 | 0.00 | 0.00 |
cycles P5 | 11.00 | 11.00 | 11.00 |
cycles P6 | 3.00 | 2.00 | 2.00 |
cycles P7 | 1.00 | 0.00 | 0.00 |
stall cycles | 27.57 | 26.84 | 27.84 |
LB full | 0.00 | 0.00 | 0.00 |
LM full | 0.00 | 0.00 | 0.00 |
PRF full | 0.00 | 0.00 | 0.00 |
PRF_FLOAT full | 31.78 | 30.81 | 31.80 |
PRF_INT full | 0.00 | 0.00 | 0.00 |
ROB full | 0.00 | 0.00 | 0.00 |
RS full | 0.14 | 0.07 | 0.11 |
SB full | 0.00 | 0.00 | 0.00 |
nb uops | 58.00 | 58.00 | 56.00 |
uops P0 | 16.00 | 16.00 | 16.00 |
uops P1 | 16.00 | 16.00 | 16.00 |
uops P2 | 6.50 | 9.00 | 6.00 |
uops P3 | 6.50 | 9.00 | 6.00 |
uops P4 | 1.00 | 0.00 | 0.00 |
uops P5 | 11.00 | 11.00 | 11.00 |
uops P6 | 3.00 | 2.00 | 2.00 |
uops P7 | 1.00 | 0.00 | 0.00 |
ID | 11 | 13 | 5 |