Help is available by moving the cursor above any symbol or by checking MAQAO website.
▶Filter Information
378 threads covering less than 1% of profiled time ( = Max (Thread Active Time)) were discarded, cumulating 40.64 seconds CPU time. You can adjust the threshold below which a thread will be discarded with the thread-filter-threshold option.
Global Metrics
Total Time (s)
64.20
Max (Thread Active Time) (s)
56.41
Average Active Time (s)
47.25
Activity Ratio (%)
75.8
Average number of active threads
141.319
Affinity Stability (%)
95.4
GFLOPS
9.62 E3
Time in analyzed loops (%)
97.3
Time in analyzed innermost loops (%)
96.0
Time in user code (%)
0.03
Compilation Options Score (%)
0
Array Access Efficiency (%)
50.4
Potential Speedups
Perfect Flow Complexity
1.00
Perfect OpenMP/MPI/Pthread/TBB
1.16
Perfect OpenMP/MPI/Pthread/TBB + Perfect Load Distribution
1.22
No Scalar Integer
Potential Speedup
1.01
Nb Loops to get 80%
4
FP Vectorised
Potential Speedup
1.00
Nb Loops to get 80%
4
Fully Vectorised
Potential Speedup
1.01
Nb Loops to get 80%
7
FP Arithmetic Only
Potential Speedup
1.02
Nb Loops to get 80%
6
CQA Potential Speedups Summary
Average Active Threads Count⏎
Loop Based Profile⏎
Innermost Loop Based Profile⏎
Application Categorization⏎
Compilation Options⏎
Source Object
Issue
▼xhpl_intel64_dynamic–
▼–
○
-g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target)
○
-O2, -O3 or -Ofast is missing.
○
-march=(target) is missing.
Loop Path Count Profile⏎
Cumulated Speedup If No Scalar Integer⏎
Cumulated Speedup If FP Vectorized⏎
Cumulated Speedup If Fully Vectorized⏎
Cumulated Speedup If FP Arithmetic Only⏎
Experiment Summary
Experiment Name
Application
./xhpl_intel64_dynamic
Timestamp
2025-07-07 17:02:59
Universal Timestamp
1751900579
Number of processes observed
6
Number of threads observed
192
Experiment Type
MPI; OpenMP;
Machine
isix06.benchmarkcenter.megware.com
Model Name
Intel(R) Xeon(R) 6972P
Architecture
x86_64
Micro Architecture
GRANITE_RAPIDS
Cache Size
491520 KB
Number of Cores
96
OS Version
Linux 5.14.0-503.19.1.el9_5.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Jan 7 17:08:27 EST 2025
Architecture used during static analysis
x86_64
Micro Architecture used during static analysis
GRANITE_RAPIDS
Frequency Driver
intel_pstate
Frequency Governor
powersave
Huge Pages
always
Hyperthreading
on
Number of sockets
2
Number of cores per socket
96
Compilation Options
xhpl_intel64_dynamic: N/A
Comments
HPL benchmark compiled with Intel OneAPI 2025.0, using Intel MPI and MKL. Matrix order: 30K, 5 reruns, block size 384. Run on Intel GNR with 6 NUMA nodes and 32 cores per NUMA node