Function: apply#0x7f1060 | Module: bench | Source: dftw-direct.c:47-56 | Coverage (incl. loops): 0.89% | (excl. loops): 0.08% |
---|
Function: apply#0x7f1060 | Module: bench | Source: dftw-direct.c:47-56 | Coverage (incl. loops): 0.89% | (excl. loops): 0.08% |
---|
/home/fmusial/FFTW_Benchmarks/fftw-3.3.10-gcc-G3-sve/dft/dftw-direct.c: 47 - 56 |
-------------------------------------------------------------------------------- |
47: { |
48: const P *ego = (const P *) ego_; |
49: INT i; |
50: ASSERT_ALIGNED_DOUBLE; |
51: for (i = 0; i < ego->v; ++i, rio += ego->vs, iio += ego->vs) { |
52: INT mb = ego->mb, ms = ego->ms; |
53: ego->k(rio + mb*ms, iio + mb*ms, ego->td->W, |
54: ego->rs, mb, ego->me, ms); |
55: } |
56: } |
0x7f1060 STP X29, X30, [SP, #976]! |
0x7f1064 ADD X29, SP, #0 |
0x7f1068 STP X19, X20, [SP, #16] |
0x7f106c ORR X19, XZR, X0 |
0x7f1070 LDR X0, [X0, #104] |
0x7f1074 CMP X0, #0 |
0x7f1078 B.LE 7f10d0 |
0x7f107c ORR X20, XZR, X1 |
0x7f1080 STP X21, X22, [SP, #32] |
0x7f1084 ORR X21, XZR, X2 |
0x7f1088 MOVZ X22, #0 |
(510) 0x7f108c ADD X22, X22, #1 |
(510) 0x7f1090 LDR X6, [X19, #96] |
(510) 0x7f1094 LDP X4, X5, [X19, #120] |
(510) 0x7f1098 LDR X1, [X19, #152] |
(510) 0x7f109c MADD X0, X4, X6, XZR |
(510) 0x7f10a0 LDR X7, [X19, #64] |
(510) 0x7f10a4 LDR X3, [X19, #80] |
(510) 0x7f10a8 LDR X2, [X1] |
(510) 0x7f10ac ADD X1, X21, X0,LSL #3 |
(510) 0x7f10b0 ADD X0, X20, X0,LSL #3 |
(510) 0x7f10b4 BLR X7 |
(510) 0x7f10b8 LDP X1, X0, [X19, #104] |
(510) 0x7f10bc ADD X20, X20, X0,LSL #3 |
(510) 0x7f10c0 ADD X21, X21, X0,LSL #3 |
(510) 0x7f10c4 CMP X1, X22 |
(510) 0x7f10c8 B.GT 7f108c |
0x7f10cc LDP X21, X22, [SP, #32] |
0x7f10d0 LDP X19, X20, [SP, #16] |
0x7f10d4 LDP X29, X30, [SP], #48 |
0x7f10d8 RET |
0x7f10dc HINT #0 |
Coverage (%) | Name | Source Location | Module |
---|---|---|---|
►77.80+ | apply#0x411b60 | vrank-geq1.c:61 | bench |
○ | apply_dit#0x7ec720 | ct.c:43 | bench |
○ | apply_dit#0x7ec720 | ct.c:43 | bench |
○ | doit | fftw-bench.c:274 | bench |
○ | speed | speed.c:123 | bench |
○ | bench_main | bench-main.c:87 | bench |
○ | __libc_start_call_main | libc.so.6 | |
○ | __libc_start_main | libc.so.6 | |
○ | _start | bench | |
►13.60+ | apply_dit#0x7ec720 | ct.c:43 | bench |
○ | doit | fftw-bench.c:274 | bench |
○ | speed | speed.c:123 | bench |
○ | bench_main | bench-main.c:87 | bench |
○ | __libc_start_call_main | libc.so.6 | |
○ | __libc_start_main | libc.so.6 | |
○ | _start | bench | |
►6.68+ | doit | fftw-bench.c:274 | bench |
○ | speed | speed.c:123 | bench |
○ | bench_main | bench-main.c:87 | bench |
○ | __libc_start_call_main | libc.so.6 | |
○ | __libc_start_main | libc.so.6 | |
○ | _start | bench | |
►1.91+ | apply_dit#0x7ec720 | ct.c:43 | bench |
○ | apply_dit#0x7ec720 | ct.c:43 | bench |
○ | doit | fftw-bench.c:274 | bench |
○ | speed | speed.c:123 | bench |
○ | bench_main | bench-main.c:87 | bench |
○ | __libc_start_call_main | libc.so.6 | |
○ | __libc_start_main | libc.so.6 | |
○ | _start | bench |
min | med | avg | max |
---|---|---|---|
Percentile Index | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
---|---|---|---|---|---|---|---|---|---|---|
Value |
min | med | avg | max |
---|---|---|---|
Percentile Index | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
---|---|---|---|---|---|---|---|---|---|---|
Value |
Path / |
The code analyzed by CQA in that panel excludes loops and represents 0.08% of application time for run run_0
Source file and lines | dftw-direct.c:47-56 |
Module | bench |
nb instructions | 16 |
loop length | 64 |
nb stack references | 0 |
front end | 1.88 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | P15 | P16 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 1.00 | 1.00 | 1.25 | 1.25 | 1.17 | 1.17 | 1.08 | 1.08 | 0.00 | 0.00 | 0.00 | 0.00 | 2.50 | 2.17 | 2.33 | 1.50 | 1.50 |
cycles | 1.00 | 1.00 | 1.25 | 1.25 | 1.17 | 1.17 | 1.08 | 1.08 | 0.00 | 0.00 | 0.00 | 0.00 | 2.50 | 2.17 | 2.33 | 1.50 | 1.50 |
Cycles executing div or sqrt instructions | NA |
Front-end | 1.88 |
Overall L1 | 2.50 |
all | 0% |
load | 0% |
store | 0% |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 0% |
fma | NA (no fma vectorizable/vectorized instructions) |
div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
other | 0% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | P15 | P16 | Latency | Recip. throughput | Vectorization |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
STP X29, X30, [SP, #976]! | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 | scal (100.0%) |
ADD X29, SP, #0 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (50.0%) |
STP X19, X20, [SP, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 | scal (100.0%) |
ORR X19, XZR, X0 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (50.0%) |
LDR X0, [X0, #104] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 | N/A |
CMP X0, #0 | 1 | 0 | 0 | 0.25 | 0.25 | 0 | 0 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.33 | N/A |
B.LE 7f10d0 <apply+0x70> | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 | N/A |
ORR X20, XZR, X1 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (50.0%) |
STP X21, X22, [SP, #32] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 | scal (100.0%) |
ORR X21, XZR, X2 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (50.0%) |
MOVZ X22, #0 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (50.0%) |
LDP X21, X22, [SP, #32] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.50 | scal (100.0%) |
LDP X19, X20, [SP, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.50 | scal (100.0%) |
LDP X29, X30, [SP], #48 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.50 | scal (100.0%) |
RET | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 | N/A |
HINT #0 | N/A |
The code analyzed by CQA in that panel excludes loops and represents 0.08% of application time for run run_0
Source file and lines | dftw-direct.c:47-56 |
Module | bench |
nb instructions | 16 |
loop length | 64 |
nb stack references | 0 |
front end | 1.88 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | P15 | P16 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 1.00 | 1.00 | 1.25 | 1.25 | 1.17 | 1.17 | 1.08 | 1.08 | 0.00 | 0.00 | 0.00 | 0.00 | 2.50 | 2.17 | 2.33 | 1.50 | 1.50 |
cycles | 1.00 | 1.00 | 1.25 | 1.25 | 1.17 | 1.17 | 1.08 | 1.08 | 0.00 | 0.00 | 0.00 | 0.00 | 2.50 | 2.17 | 2.33 | 1.50 | 1.50 |
Cycles executing div or sqrt instructions | NA |
Front-end | 1.88 |
Overall L1 | 2.50 |
all | 0% |
load | 0% |
store | 0% |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 0% |
fma | NA (no fma vectorizable/vectorized instructions) |
div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
other | 0% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | P15 | P16 | Latency | Recip. throughput | Vectorization |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
STP X29, X30, [SP, #976]! | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 | scal (100.0%) |
ADD X29, SP, #0 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (50.0%) |
STP X19, X20, [SP, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 | scal (100.0%) |
ORR X19, XZR, X0 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (50.0%) |
LDR X0, [X0, #104] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.33 | N/A |
CMP X0, #0 | 1 | 0 | 0 | 0.25 | 0.25 | 0 | 0 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.33 | N/A |
B.LE 7f10d0 <apply+0x70> | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 | N/A |
ORR X20, XZR, X1 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (50.0%) |
STP X21, X22, [SP, #32] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0.50 | 1 | 0.50 | scal (100.0%) |
ORR X21, XZR, X2 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (50.0%) |
MOVZ X22, #0 | 1 | 0 | 0 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 | scal (50.0%) |
LDP X21, X22, [SP, #32] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.50 | scal (100.0%) |
LDP X19, X20, [SP, #16] | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.50 | scal (100.0%) |
LDP X29, X30, [SP], #48 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 4 | 0.50 | scal (100.0%) |
RET | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 | N/A |
HINT #0 | N/A |
Name | Coverage (%) | Time (s) |
---|---|---|
▼apply#0x7f1060– | 0.89 | 2.09 |
○Loop 510 - dftw-direct.c:51-53 - bench | 0.81 | 1.90 |