OV - - Global

Help is available by moving the cursor above any symbol or by checking MAQAO website.

378 threads covering less than 1% of profiled time ( = Max (Thread Active Time)) were discarded, cumulating 40.64 seconds CPU time. You can adjust the threshold below which a thread will be discarded with the thread-filter-threshold option.

Total Time (s)		64.20
Max (Thread Active Time) (s)		56.41
Average Active Time (s)		47.25
Activity Ratio (%)		75.8
Average number of active threads		141.319
Affinity Stability (%)		95.4
GFLOPS		9.62 E3
Time in analyzed loops (%)		97.3
Time in analyzed innermost loops (%)		96.0
Time in user code (%)		0.03
Compilation Options Score (%)		0
Array Access Efficiency (%)		50.4

Potential Speedups
Perfect Flow Complexity		1.00
Perfect OpenMP/MPI/Pthread/TBB		1.16
Perfect OpenMP/MPI/Pthread/TBB + Perfect Load Distribution		1.22
No Scalar Integer	Potential Speedup	1.01
No Scalar Integer	Nb Loops to get 80%	4
FP Vectorised	Potential Speedup	1.00
FP Vectorised	Nb Loops to get 80%	4
Fully Vectorised	Potential Speedup	1.01
Fully Vectorised	Nb Loops to get 80%	7
FP Arithmetic Only	Potential Speedup	1.02
FP Arithmetic Only	Nb Loops to get 80%	6

Source Object	Issue
▼xhpl_intel64_dynamic–
▼–
○	-g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target)
○	-O2, -O3 or -Ofast is missing.
○	-march=(target) is missing.

Experiment Name
Application	./xhpl_intel64_dynamic
Timestamp	2025-07-07 17:02:59	Universal Timestamp	1751900579
Number of processes observed	6	Number of threads observed	192
Experiment Type	MPI; OpenMP;
Machine	isix06.benchmarkcenter.megware.com
Model Name	Intel(R) Xeon(R) 6972P
Architecture	x86_64	Micro Architecture	GRANITE_RAPIDS
Cache Size	491520 KB	Number of Cores	96
OS Version	Linux 5.14.0-503.19.1.el9_5.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Jan 7 17:08:27 EST 2025
Architecture used during static analysis	x86_64	Micro Architecture used during static analysis	GRANITE_RAPIDS
Frequency Driver	intel_pstate	Frequency Governor	powersave
Huge Pages	always	Hyperthreading	on
Number of sockets	2	Number of cores per socket	96
Compilation Options	xhpl_intel64_dynamic: N/A
Comments	HPL benchmark compiled with Intel OneAPI 2025.0, using Intel MPI and MKL. Matrix order: 30K, 5 reruns, block size 384. Run on Intel GNR with 6 NUMA nodes and 32 cores per NUMA node