Loop Id: 3 | Module: exec | Source: Step10_orig.c:19-35 | Coverage: 0.11% |
---|
Loop Id: 3 | Module: exec | Source: Step10_orig.c:19-35 | Coverage: 0.11% |
---|
0x401f20 VMOVSS (%RDX,%RDI,4),%XMM31 [2] |
0x401f27 VMOVSS (%RSI,%RDI,4),%XMM18 [6] |
0x401f2e VXORPS %XMM24,%XMM24,%XMM24 |
0x401f34 VMOVSS (%RCX,%RDI,4),%XMM26 [4] |
0x401f3b VSUBSS %XMM1,%XMM31,%XMM28 |
0x401f41 VSUBSS %XMM7,%XMM18,%XMM29 |
0x401f47 VSUBSS %XMM2,%XMM26,%XMM25 |
0x401f4d VMULSS %XMM28,%XMM28,%XMM0 |
0x401f53 VFMADD231SS %XMM29,%XMM29,%XMM0 |
0x401f59 VFMADD231SS %XMM25,%XMM25,%XMM0 |
0x401f5f VCOMISS %XMM0,%XMM3 |
0x401f63 JBE 401f6c |
0x401f65 VMOVSS (%R8,%RDI,4),%XMM24 [8] |
0x401f6c VCOMISS %XMM6,%XMM0 |
0x401f70 JBE 401fd2 |
0x401f72 VMOVAPS %XMM0,%XMM23 |
0x401f78 VADDSS %XMM0,%XMM4,%XMM5 |
0x401f7c VFMADD132SS %XMM8,%XMM9,%XMM23 |
0x401f82 VCVTSS2SD %XMM5,%XMM5,%XMM5 |
0x401f86 VFMADD132SS %XMM0,%XMM10,%XMM23 |
0x401f8c VFMADD132SS %XMM0,%XMM11,%XMM23 |
0x401f92 VFMADD132SS %XMM0,%XMM12,%XMM23 |
0x401f98 VFMADD132SS %XMM0,%XMM13,%XMM23 |
0x401f9e VSQRTSD %XMM5,%XMM5,%XMM0 |
0x401fa2 VMULSD %XMM0,%XMM5,%XMM5 |
0x401fa6 VCVTSS2SD %XMM23,%XMM23,%XMM20 |
0x401fac VDIVSD %XMM5,%XMM14,%XMM0 |
0x401fb0 VADDSD %XMM20,%XMM0,%XMM5 |
0x401fb6 VCVTSD2SS %XMM5,%XMM5,%XMM0 |
0x401fba VMULSS %XMM24,%XMM0,%XMM5 |
0x401fc0 VFMADD231SS %XMM5,%XMM29,%XMM15 |
0x401fc6 VFMADD231SS %XMM5,%XMM28,%XMM16 |
0x401fcc VFMADD231SS %XMM5,%XMM25,%XMM17 |
0x401fd2 INC %RDI |
0x401fd5 VMOVSS (%RDX,%RDI,4),%XMM23 [1] |
0x401fdc VMOVSS (%RSI,%RDI,4),%XMM25 [5] |
0x401fe3 VXORPS %XMM19,%XMM19,%XMM19 |
0x401fe9 VMOVSS (%RCX,%RDI,4),%XMM22 [3] |
0x401ff0 VSUBSS %XMM1,%XMM23,%XMM20 |
0x401ff6 VSUBSS %XMM7,%XMM25,%XMM24 |
0x401ffc VSUBSS %XMM2,%XMM22,%XMM30 |
0x402002 VMULSS %XMM20,%XMM20,%XMM0 |
0x402008 VFMADD231SS %XMM24,%XMM24,%XMM0 |
0x40200e VFMADD231SS %XMM30,%XMM30,%XMM0 |
0x402014 VCOMISS %XMM0,%XMM3 |
0x402018 JBE 402021 |
0x40201a VMOVSS (%R8,%RDI,4),%XMM19 [7] |
0x402021 VCOMISS %XMM6,%XMM0 |
0x402025 JBE 402087 |
0x402027 VADDSS %XMM0,%XMM4,%XMM5 |
0x40202b VMOVAPS %XMM0,%XMM21 |
0x402031 VFMADD132SS %XMM8,%XMM9,%XMM21 |
0x402037 VCVTSS2SD %XMM5,%XMM5,%XMM5 |
0x40203b VSQRTSD %XMM5,%XMM5,%XMM27 |
0x402041 VMULSD %XMM27,%XMM5,%XMM5 |
0x402047 VFMADD132SS %XMM0,%XMM10,%XMM21 |
0x40204d VDIVSD %XMM5,%XMM14,%XMM5 |
0x402051 VFMADD132SS %XMM0,%XMM11,%XMM21 |
0x402057 VFMADD132SS %XMM0,%XMM12,%XMM21 |
0x40205d VFMADD132SS %XMM21,%XMM13,%XMM0 |
0x402063 VCVTSS2SD %XMM0,%XMM0,%XMM0 |
0x402067 VADDSD %XMM0,%XMM5,%XMM5 |
0x40206b VCVTSD2SS %XMM5,%XMM5,%XMM0 |
0x40206f VMULSS %XMM19,%XMM0,%XMM5 |
0x402075 VFMADD231SS %XMM5,%XMM24,%XMM15 |
0x40207b VFMADD231SS %XMM5,%XMM20,%XMM16 |
0x402081 VFMADD231SS %XMM5,%XMM30,%XMM17 |
0x402087 INC %RDI |
0x40208a CMP %EDI,%EAX |
0x40208c JG 401f20 |
/home/kcamus/qaas_runs/169-401-3406/intel/HACCmk/build/HACCmk/src/Step10_orig.c: 19 - 35 |
-------------------------------------------------------------------------------- |
19: for ( j = 0; j < count1; j++ ) |
20: { |
21: dxc = xx1[j] - xxi; |
22: dyc = yy1[j] - yyi; |
23: dzc = zz1[j] - zzi; |
24: |
25: r2 = dxc * dxc + dyc * dyc + dzc * dzc; |
26: |
27: m = ( r2 < fsrrmax2 ) ? mass1[j] : 0.0f; |
28: |
29: f = pow( r2 + mp_rsm2, -1.5 ) - ( ma0 + r2*(ma1 + r2*(ma2 + r2*(ma3 + r2*(ma4 + r2*ma5))))); |
30: |
31: f = ( r2 > 0.0f ) ? m * f : 0.0f; |
32: |
33: xi = xi + f * dxc; |
34: yi = yi + f * dyc; |
35: zi = zi + f * dzc; |
Coverage (%) | Name | Source Location | Module |
---|---|---|---|
►88.89+ | main._omp_fn.1 | main.c:144 | exec |
○ | GOMP_parallel | libgomp.h:985 | libgomp.so.1.0.0 |
►11.11+ | main._omp_fn.1 | main.c:144 | exec |
○ | GOMP_parallel | libgomp.h:985 | libgomp.so.1.0.0 |
Path / |
Metric | Value |
---|---|
CQA speedup if no scalar integer | 1.00 |
CQA speedup if FP arith vectorized | 2.50 |
CQA speedup if fully vectorized | 2.94 - 2.50 |
CQA speedup if no inter-iteration dependency | NA |
CQA speedup if next bottleneck killed | NA |
Bottlenecks | P0, P1, |
Function | Step10_orig |
Source | Step10_orig.c:19-35 |
Source loop unroll info | not unrolled or unrolled with no peel/tail loop |
Source loop unroll confidence level | max |
Unroll/vectorization loop type | NA |
Unroll factor | NA |
CQA cycles | 25.00 |
CQA cycles if no scalar integer | 25.00 |
CQA cycles if FP arith vectorized | 8.50 - 10.00 |
CQA cycles if fully vectorized | 8.50 - 10.00 |
Front-end cycles | 18.75 |
DIV/SQRT cycles | 25.00 |
P0 cycles | 25.00 |
P1 cycles | 4.00 |
P2 cycles | 4.00 |
P3 cycles | 0.00 |
P4 cycles | 6.50 |
P5 cycles | 6.50 |
P6 cycles | 0.00 |
P7 cycles | 17.00 - 20.00 |
Inter-iter dependencies cycles | NA |
FE+BE cycles (UFS) | 32.13 - 33.75 |
Stall cycles (UFS) | 12.86 - 14.47 |
Nb insns | 70.00 |
Nb uops | 75.00 |
Nb loads | 8.00 |
Nb stores | 0.00 |
Nb stack references | 0.00 |
FLOP/cycle | 2.40 |
Nb FLOP add-sub | 10.00 |
Nb FLOP mul | 6.00 |
Nb FLOP fma | 20.00 |
Nb FLOP div | 2.00 |
Nb FLOP rcp | 0.00 |
Nb FLOP sqrt | 2.00 |
Nb FLOP rsqrt | 0.00 |
Bytes/cycle | 1.28 |
Bytes prefetched | 0.00 |
Bytes loaded | 32.00 |
Bytes stored | 0.00 |
Stride 0 | NA |
Stride 1 | NA |
Stride n | NA |
Stride unknown | NA |
Stride indirect | NA |
Vectorization ratio all | 6.45 |
Vectorization ratio load | 0.00 |
Vectorization ratio store | NA |
Vectorization ratio mul | 0.00 |
Vectorization ratio add_sub | 0.00 |
Vectorization ratio fma | 0.00 |
Vectorization ratio div_sqrt | 0.00 |
Vectorization ratio other | 28.57 |
Vector-efficiency ratio all | 8.47 |
Vector-efficiency ratio load | 6.25 |
Vector-efficiency ratio store | NA |
Vector-efficiency ratio mul | 8.33 |
Vector-efficiency ratio add_sub | 7.50 |
Vector-efficiency ratio fma | 6.25 |
Vector-efficiency ratio div_sqrt | 12.50 |
Vector-efficiency ratio other | 12.50 |
Metric | Value |
---|---|
CQA speedup if no scalar integer | 1.00 |
CQA speedup if FP arith vectorized | 2.50 |
CQA speedup if fully vectorized | 2.94 - 2.50 |
CQA speedup if no inter-iteration dependency | NA |
CQA speedup if next bottleneck killed | NA |
Bottlenecks | P0, P1, |
Function | Step10_orig |
Source | Step10_orig.c:19-35 |
Source loop unroll info | not unrolled or unrolled with no peel/tail loop |
Source loop unroll confidence level | max |
Unroll/vectorization loop type | NA |
Unroll factor | NA |
CQA cycles | 25.00 |
CQA cycles if no scalar integer | 25.00 |
CQA cycles if FP arith vectorized | 8.50 - 10.00 |
CQA cycles if fully vectorized | 8.50 - 10.00 |
Front-end cycles | 18.75 |
DIV/SQRT cycles | 25.00 |
P0 cycles | 25.00 |
P1 cycles | 4.00 |
P2 cycles | 4.00 |
P3 cycles | 0.00 |
P4 cycles | 6.50 |
P5 cycles | 6.50 |
P6 cycles | 0.00 |
P7 cycles | 17.00 - 20.00 |
Inter-iter dependencies cycles | NA |
FE+BE cycles (UFS) | 32.13 - 33.75 |
Stall cycles (UFS) | 12.86 - 14.47 |
Nb insns | 70.00 |
Nb uops | 75.00 |
Nb loads | 8.00 |
Nb stores | 0.00 |
Nb stack references | 0.00 |
FLOP/cycle | 2.40 |
Nb FLOP add-sub | 10.00 |
Nb FLOP mul | 6.00 |
Nb FLOP fma | 20.00 |
Nb FLOP div | 2.00 |
Nb FLOP rcp | 0.00 |
Nb FLOP sqrt | 2.00 |
Nb FLOP rsqrt | 0.00 |
Bytes/cycle | 1.28 |
Bytes prefetched | 0.00 |
Bytes loaded | 32.00 |
Bytes stored | 0.00 |
Stride 0 | NA |
Stride 1 | NA |
Stride n | NA |
Stride unknown | NA |
Stride indirect | NA |
Vectorization ratio all | 6.45 |
Vectorization ratio load | 0.00 |
Vectorization ratio store | NA |
Vectorization ratio mul | 0.00 |
Vectorization ratio add_sub | 0.00 |
Vectorization ratio fma | 0.00 |
Vectorization ratio div_sqrt | 0.00 |
Vectorization ratio other | 28.57 |
Vector-efficiency ratio all | 8.47 |
Vector-efficiency ratio load | 6.25 |
Vector-efficiency ratio store | NA |
Vector-efficiency ratio mul | 8.33 |
Vector-efficiency ratio add_sub | 7.50 |
Vector-efficiency ratio fma | 6.25 |
Vector-efficiency ratio div_sqrt | 12.50 |
Vector-efficiency ratio other | 12.50 |
Path / |
Function | Step10_orig |
Source file and lines | Step10_orig.c:19-35 |
Module | exec |
nb instructions | 70 |
nb uops | 75 |
loop length | 370 |
used x86 registers | 6 |
used mmx registers | 0 |
used xmm registers | 32 |
used ymm registers | 0 |
used zmm registers | 0 |
nb stack references | 0 |
ADD-SUB / MUL ratio | 1.67 |
micro-operation queue | 18.75 cycles |
front end | 18.75 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | |
---|---|---|---|---|---|---|---|---|
uops | 25.00 | 25.00 | 4.00 | 4.00 | 0.00 | 6.50 | 6.50 | 0.00 |
cycles | 25.00 | 25.00 | 4.00 | 4.00 | 0.00 | 6.50 | 6.50 | 0.00 |
Cycles executing div or sqrt instructions | 17.00-20.00 |
FE+BE cycles | 32.13-33.75 |
Stall cycles | 12.86-14.47 |
RS full (events) | 0.57-0.31 |
PRF_FLOAT full (events) | 17.82-20.08 |
Front-end | 18.75 |
Dispatch | 25.00 |
DIV/SQRT | 17.00-20.00 |
Overall L1 | 25.00 |
all | 6% |
load | 0% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 0% |
add-sub | 0% |
fma | 0% |
div/sqrt | 0% |
other | 28% |
all | 8% |
load | 6% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 8% |
add-sub | 7% |
fma | 6% |
div/sqrt | 12% |
other | 12% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|
VMOVSS (%RDX,%RDI,4),%XMM31 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4-5 | 0.50 |
VMOVSS (%RSI,%RDI,4),%XMM18 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4-5 | 0.50 |
VXORPS %XMM24,%XMM24,%XMM24 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VMOVSS (%RCX,%RDI,4),%XMM26 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4-5 | 0.50 |
VSUBSS %XMM1,%XMM31,%XMM28 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VSUBSS %XMM7,%XMM18,%XMM29 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VSUBSS %XMM2,%XMM26,%XMM25 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMULSS %XMM28,%XMM28,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM29,%XMM29,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM25,%XMM25,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCOMISS %XMM0,%XMM3 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 1 |
JBE 401f6c | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0.50-1 |
VMOVSS (%R8,%RDI,4),%XMM24 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4-5 | 0.50 |
VCOMISS %XMM6,%XMM0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 1 |
JBE 401fd2 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0.50-1 |
VMOVAPS %XMM0,%XMM23 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VADDSS %XMM0,%XMM4,%XMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132SS %XMM8,%XMM9,%XMM23 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTSS2SD %XMM5,%XMM5,%XMM5 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 5 | 1 |
VFMADD132SS %XMM0,%XMM10,%XMM23 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132SS %XMM0,%XMM11,%XMM23 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132SS %XMM0,%XMM12,%XMM23 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132SS %XMM0,%XMM13,%XMM23 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VSQRTSD %XMM5,%XMM5,%XMM0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13-19 | 4.50-6 |
VMULSD %XMM0,%XMM5,%XMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTSS2SD %XMM23,%XMM23,%XMM20 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 5 | 1 |
VDIVSD %XMM5,%XMM14,%XMM0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13-14 | 4 |
VADDSD %XMM20,%XMM0,%XMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTSD2SS %XMM5,%XMM5,%XMM0 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 5 | 1 |
VMULSS %XMM24,%XMM0,%XMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM5,%XMM29,%XMM15 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM5,%XMM28,%XMM16 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM5,%XMM25,%XMM17 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
INC %RDI | 1 | 0.25 | 0.25 | 0 | 0 | 0 | 0.25 | 0.25 | 0 | 1 | 0.25 |
VMOVSS (%RDX,%RDI,4),%XMM23 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4-5 | 0.50 |
VMOVSS (%RSI,%RDI,4),%XMM25 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4-5 | 0.50 |
VXORPS %XMM19,%XMM19,%XMM19 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VMOVSS (%RCX,%RDI,4),%XMM22 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4-5 | 0.50 |
VSUBSS %XMM1,%XMM23,%XMM20 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VSUBSS %XMM7,%XMM25,%XMM24 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VSUBSS %XMM2,%XMM22,%XMM30 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMULSS %XMM20,%XMM20,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM24,%XMM24,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM30,%XMM30,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCOMISS %XMM0,%XMM3 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 1 |
JBE 402021 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0.50-1 |
VMOVSS (%R8,%RDI,4),%XMM19 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4-5 | 0.50 |
VCOMISS %XMM6,%XMM0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 1 |
JBE 402087 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0.50-1 |
VADDSS %XMM0,%XMM4,%XMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMOVAPS %XMM0,%XMM21 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VFMADD132SS %XMM8,%XMM9,%XMM21 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTSS2SD %XMM5,%XMM5,%XMM5 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 5 | 1 |
VSQRTSD %XMM5,%XMM5,%XMM27 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13-19 | 4.50-6 |
VMULSD %XMM27,%XMM5,%XMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132SS %XMM0,%XMM10,%XMM21 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VDIVSD %XMM5,%XMM14,%XMM5 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13-14 | 4 |
VFMADD132SS %XMM0,%XMM11,%XMM21 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132SS %XMM0,%XMM12,%XMM21 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132SS %XMM21,%XMM13,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTSS2SD %XMM0,%XMM0,%XMM0 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 5 | 1 |
VADDSD %XMM0,%XMM5,%XMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTSD2SS %XMM5,%XMM5,%XMM0 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 5 | 1 |
VMULSS %XMM19,%XMM0,%XMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM5,%XMM24,%XMM15 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM5,%XMM20,%XMM16 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM5,%XMM30,%XMM17 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
INC %RDI | 1 | 0.25 | 0.25 | 0 | 0 | 0 | 0.25 | 0.25 | 0 | 1 | 0.25 |
CMP %EDI,%EAX | 1 | 0.25 | 0.25 | 0 | 0 | 0 | 0.25 | 0.25 | 0 | 1 | 0.25 |
JG 401f20 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0.50-1 |
Function | Step10_orig |
Source file and lines | Step10_orig.c:19-35 |
Module | exec |
nb instructions | 70 |
nb uops | 75 |
loop length | 370 |
used x86 registers | 6 |
used mmx registers | 0 |
used xmm registers | 32 |
used ymm registers | 0 |
used zmm registers | 0 |
nb stack references | 0 |
ADD-SUB / MUL ratio | 1.67 |
micro-operation queue | 18.75 cycles |
front end | 18.75 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | |
---|---|---|---|---|---|---|---|---|
uops | 25.00 | 25.00 | 4.00 | 4.00 | 0.00 | 6.50 | 6.50 | 0.00 |
cycles | 25.00 | 25.00 | 4.00 | 4.00 | 0.00 | 6.50 | 6.50 | 0.00 |
Cycles executing div or sqrt instructions | 17.00-20.00 |
FE+BE cycles | 32.13-33.75 |
Stall cycles | 12.86-14.47 |
RS full (events) | 0.57-0.31 |
PRF_FLOAT full (events) | 17.82-20.08 |
Front-end | 18.75 |
Dispatch | 25.00 |
DIV/SQRT | 17.00-20.00 |
Overall L1 | 25.00 |
all | 6% |
load | 0% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 0% |
add-sub | 0% |
fma | 0% |
div/sqrt | 0% |
other | 28% |
all | 8% |
load | 6% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 8% |
add-sub | 7% |
fma | 6% |
div/sqrt | 12% |
other | 12% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|
VMOVSS (%RDX,%RDI,4),%XMM31 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4-5 | 0.50 |
VMOVSS (%RSI,%RDI,4),%XMM18 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4-5 | 0.50 |
VXORPS %XMM24,%XMM24,%XMM24 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VMOVSS (%RCX,%RDI,4),%XMM26 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4-5 | 0.50 |
VSUBSS %XMM1,%XMM31,%XMM28 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VSUBSS %XMM7,%XMM18,%XMM29 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VSUBSS %XMM2,%XMM26,%XMM25 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMULSS %XMM28,%XMM28,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM29,%XMM29,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM25,%XMM25,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCOMISS %XMM0,%XMM3 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 1 |
JBE 401f6c | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0.50-1 |
VMOVSS (%R8,%RDI,4),%XMM24 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4-5 | 0.50 |
VCOMISS %XMM6,%XMM0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 1 |
JBE 401fd2 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0.50-1 |
VMOVAPS %XMM0,%XMM23 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VADDSS %XMM0,%XMM4,%XMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132SS %XMM8,%XMM9,%XMM23 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTSS2SD %XMM5,%XMM5,%XMM5 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 5 | 1 |
VFMADD132SS %XMM0,%XMM10,%XMM23 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132SS %XMM0,%XMM11,%XMM23 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132SS %XMM0,%XMM12,%XMM23 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132SS %XMM0,%XMM13,%XMM23 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VSQRTSD %XMM5,%XMM5,%XMM0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13-19 | 4.50-6 |
VMULSD %XMM0,%XMM5,%XMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTSS2SD %XMM23,%XMM23,%XMM20 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 5 | 1 |
VDIVSD %XMM5,%XMM14,%XMM0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13-14 | 4 |
VADDSD %XMM20,%XMM0,%XMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTSD2SS %XMM5,%XMM5,%XMM0 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 5 | 1 |
VMULSS %XMM24,%XMM0,%XMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM5,%XMM29,%XMM15 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM5,%XMM28,%XMM16 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM5,%XMM25,%XMM17 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
INC %RDI | 1 | 0.25 | 0.25 | 0 | 0 | 0 | 0.25 | 0.25 | 0 | 1 | 0.25 |
VMOVSS (%RDX,%RDI,4),%XMM23 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4-5 | 0.50 |
VMOVSS (%RSI,%RDI,4),%XMM25 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4-5 | 0.50 |
VXORPS %XMM19,%XMM19,%XMM19 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VMOVSS (%RCX,%RDI,4),%XMM22 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4-5 | 0.50 |
VSUBSS %XMM1,%XMM23,%XMM20 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VSUBSS %XMM7,%XMM25,%XMM24 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VSUBSS %XMM2,%XMM22,%XMM30 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMULSS %XMM20,%XMM20,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM24,%XMM24,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM30,%XMM30,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCOMISS %XMM0,%XMM3 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 1 |
JBE 402021 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0.50-1 |
VMOVSS (%R8,%RDI,4),%XMM19 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4-5 | 0.50 |
VCOMISS %XMM6,%XMM0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 1 |
JBE 402087 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0.50-1 |
VADDSS %XMM0,%XMM4,%XMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMOVAPS %XMM0,%XMM21 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VFMADD132SS %XMM8,%XMM9,%XMM21 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTSS2SD %XMM5,%XMM5,%XMM5 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 5 | 1 |
VSQRTSD %XMM5,%XMM5,%XMM27 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13-19 | 4.50-6 |
VMULSD %XMM27,%XMM5,%XMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132SS %XMM0,%XMM10,%XMM21 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VDIVSD %XMM5,%XMM14,%XMM5 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13-14 | 4 |
VFMADD132SS %XMM0,%XMM11,%XMM21 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132SS %XMM0,%XMM12,%XMM21 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132SS %XMM21,%XMM13,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTSS2SD %XMM0,%XMM0,%XMM0 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 5 | 1 |
VADDSD %XMM0,%XMM5,%XMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTSD2SS %XMM5,%XMM5,%XMM0 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 5 | 1 |
VMULSS %XMM19,%XMM0,%XMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM5,%XMM24,%XMM15 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM5,%XMM20,%XMM16 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM5,%XMM30,%XMM17 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
INC %RDI | 1 | 0.25 | 0.25 | 0 | 0 | 0 | 0.25 | 0.25 | 0 | 1 | 0.25 |
CMP %EDI,%EAX | 1 | 0.25 | 0.25 | 0 | 0 | 0 | 0.25 | 0.25 | 0 | 1 | 0.25 |
JG 401f20 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0.50-1 |
Metric | run_0 |
---|---|
Coverage (% app. time) | 0.11 |
Time (s) | 0.04 |
Instance Count | 1095000 |
Iteration Count - min | 2 |
Iteration Count - avg | 2 |
Iteration Count - max | 2 |
Cycles per Iteration - min | 83 |
Cycles per Iteration - avg | 102.16 |
Cycles per Iteration - max | 8499 |