OV - QMCKL bench_pop using GNU Compilers on Skylake

Help is available by moving the cursor above any symbol or by checking MAQAO website.

Total Time (s)		250.28
Max (Thread Active Time) (s)		250.08
Average Active Time (s)		11.79
Activity Ratio (%)		4.71
Average number of active threads		2.450
Affinity Stability (%)		95.7
Time in analyzed loops (%)		40.5
Time in analyzed innermost loops (%)		39.5
Time in user code (%)		40.5
Compilation Options Score (%)		25.0
Array Access Efficiency (%)		16.5

Potential Speedups
Perfect Flow Complexity		1.02
Perfect OpenMP/MPI/Pthread/TBB		1.00
Perfect OpenMP/MPI/Pthread/TBB + Perfect Load Distribution		51.9
No Scalar Integer	Potential Speedup	1.43
No Scalar Integer	Nb Loops to get 80%	2
FP Vectorised	Potential Speedup	1.33
FP Vectorised	Nb Loops to get 80%	1
Fully Vectorised	Potential Speedup	1.59
Fully Vectorised	Nb Loops to get 80%	3
FP Arithmetic Only	Potential Speedup	1.51
FP Arithmetic Only	Nb Loops to get 80%	2

Source Object	Issue
▼libqmckl.so.0.0.0–
▼qmckl_blas.c–
○	-O3 or -Ofast is missing.
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○	-funroll-loops is missing.
▼qmckl_jastrow_champ_f.F90–
○	-O3 or -Ofast is missing.
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○	-funroll-loops is missing.
▼qmckl_jastrow_champ_single_f.F90–
○	-O3 or -Ofast is missing.
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○	-funroll-loops is missing.
▼qmckl_distance_f.F90–
○	-O3 or -Ofast is missing.
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○	-funroll-loops is missing.
▼qmckl_jastrow_champ.c–
○	-O3 or -Ofast is missing.
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○	-funroll-loops is missing.
▼qmckl_jastrow_champ_single.c–
○	-O3 or -Ofast is missing.
○	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○	-funroll-loops is missing.

Experiment Name	QMCKL bench_pop using GNU Compilers on Skylake
Application	bench_pop
Timestamp	2025-10-17 16:42:49	Universal Timestamp	1760712169
Number of processes observed	1	Number of threads observed	52
Experiment Type	OpenMP;
Machine	skylake
Model Name	Intel(R) Xeon(R) Platinum 8170 CPU @ 2.10GHz
Architecture	x86_64	Micro Architecture	SKYLAKE
Cache Size	36608 KB	Number of Cores	26
OS Version	Linux 6.17.2-arch1-1 #1 SMP PREEMPT_DYNAMIC Sun, 12 Oct 2025 12:45:18 +0000
Architecture used during static analysis	x86_64	Micro Architecture used during static analysis	SKYLAKE
Frequency Driver	intel_cpufreq	Frequency Governor	schedutil
Huge Pages	always	Hyperthreading	off
Number of sockets	2	Number of cores per socket	26
Compilation Options	libqmckl.so.0.0.0: GNU Fortran2008 15.2.1 20250813 -mtune=generic -march=x86-64 -g -fno-omit-frame-pointer -fopenmp -fPIC -fintrinsic-modules-path /usr/lib/gcc/x86_64-pc-linux-gnu/15.2.1/finclude -fpre-include=/usr/include/finclude/math-vector-fortran.h
Comments	ITERMAX = 50, -O1

Dataset
Run Command	<executable> /home/fmusial/qmckl_bench/data/Alz_large.h5
Number Processes	1
Number Nodes	1
Filter	Not Used
Profile Start	Not Used
Profile Stop	Not Used