Help is available by moving the cursor above any symbol or by checking MAQAO website.
- r0: 2x1
- r1: 2x2
- r2: 2x4
- r3: 2x8
- r4: 2x16
- r5: 2x32
- r6: 2x64
- r7: 2x96
Metric | r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7 |
---|
Total Time (s) | 37.10 | 38.71 | 39.89 | 41.88 | 62.86 | 74.65 | 146.82 | 207.56 |
Profiled Time (s) | 34.96 | 36.69 | 37.76 | 39.72 | 60.23 | 72.06 | 143.46 | 202.78 |
GFLOPS | 12.693 | 24.237 | 47.173 | 89.842 | 119.762 | 201.730 | 205.202 | 217.734 |
Time in analyzed loops (%) | 0.06 | 0.05 | 0.08 | 0.07 | 0.10 | 0.12 | 0.14 | 0.12 |
Time in analyzed innermost loops (%) | 0.01 | 0.01 | 0.03 | 0.01 | 0.02 | 0.02 | 0.02 | 0.02 |
Time in user code (%) | 0.06 | 0.07 | 0.09 | 0.09 | 0.10 | 0.12 | 0.14 | 0.12 |
Compilation Options Score (%) | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
Array Access Efficiency (%) | 72.4 | 72.4 | 72.4 | 72.4 | 72.4 | 72.4 | 72.4 | 72.4 |
Scalability - Gap | 1.00 | 2.61 | 4.84 | 10.16 | 28.81 | 66.41 | 257.26 | 542.73 |
|
Potential Speedups |
Perfect Flow Complexity | Not Available | Not Available | Not Available | Not Available | Not Available | Not Available | Not Available | Not Available |
Perfect OpenMP + MPI + Pthread | 1.01 | 1.01 | 1.00 | 1.00 | 1.01 | 1.00 | 1.01 | 1.00 |
Perfect OpenMP + MPI + Pthread + Perfect Load Distribution | 1.01 | 1.28 | 1.16 | 1.18 | 1.32 | 1.15 | 1.17 | 1.11 |
No Scalar Integer | Potential Speedup | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Nb Loops to get 80% | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
FP Vectorised | Potential Speedup | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Nb Loops to get 80% | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Fully Vectorised | Potential Speedup | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Nb Loops to get 80% | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Only FP Arithmetic | Potential Speedup | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Nb Loops to get 80% | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
OpenMP perfectly balanced | Potential Speedup | 1.00 | 1.01 | 1.03 | 1.03 | 1.15 | 1.11 | 1.15 | 1.10 |
Nb Loops to get 80% | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Source Object | Issue |
▼exec– | |
▼miniqmc.cpp– | |
○ | |
Source Object | Issue |
▼exec– | |
▼miniqmc.cpp– | |
○ | |
Source Object | Issue |
▼exec– | |
▼miniqmc.cpp– | |
○ | |
Source Object | Issue |
▼exec– | |
▼miniqmc.cpp– | |
○ | |
Source Object | Issue |
▼exec– | |
▼miniqmc.cpp– | |
○ | |
Source Object | Issue |
▼exec– | |
▼miniqmc.cpp– | |
○ | |
Source Object | Issue |
▼exec– | |
▼miniqmc.cpp– | |
○ | |
Source Object | Issue |
▼exec– | |
▼miniqmc.cpp– | |
○ | |
| r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7 |
Experiment Name | | | | | | | | |
Application | /beegfs/hackathon/users/eoseret/qaas_runs/170-855-3059/intel/miniqmc/run/binaries/aocc_13/exec | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Timestamp | 2024-02-23 11:56:10 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Experiment Type | MPI; | MPI; OpenMP; | same as r1 | same as r1 | same as r1 | same as r1 | same as r1 | same as r1 |
Machine | ins01.benchmarkcenter.megware.com | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Architecture | x86_64 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Micro Architecture | ZEN_V4 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Model Name | AMD EPYC 9654 96-Core Processor | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Cache Size | 1024 KB | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Number of Cores | 96 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Maximal Frequency | 3.707812 GHz | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
OS Version | Linux 5.14.0-362.13.1.el9_3.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Dec 21 07:12:43 EST 2023 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Architecture used during static analysis | x86_64 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Micro Architecture used during static analysis | ZEN_V4 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Compilation Options |
exec: AMD clang version 16.0.3 (CLANG: AOCC_4.1.0-Build#270 2023_07_10) /cluster/comp/aocc/4.1.0/bin/clang-16 --driver-mode=g++ -D ADD_ -D H5_USE_16_API -D HAVE_CONFIG_H -D MPICH_SKIP_MPICXX -D OMPI_SKIP_MPICXX -D _MPICC_H -D restrict=__restrict__ -I /beegfs/hackathon/users/eoseret/qaas_runs/170-855-3059/intel/miniqmc/build/miniqmc/src -I src -I /beegfs/hackathon/users/eoseret/qaas_runs/170-855-3059/intel/miniqmc/build/miniqmc/src/Particle -I /beegfs/hackathon/users/eoseret/qaas_runs/170-855-3059/intel/miniqmc/build/miniqmc/src/Utilities -I /beegfs/hackathon/users/eoseret/qaas_runs/170-855-3059/intel/miniqmc/build/miniqmc/src/Platforms -I /beegfs/hackathon/users/eoseret/qaas_runs/170-855-3059/intel/miniqmc/build/miniqmc/src/Platforms/Host -O2 -march=znver4 -flto=full -g -grecord-command-line -fno-omit-frame-pointer -fcf-protection=none -nopie -fopenmp -fstrict-aliasing -Wvla -Wall -Wno-unused-variable -Wno-overloaded-virtual -Wno-unused-private-field -Wno-unused-local-typedef -Wno-unknown-pragmas -Wmisleading-indentation -ffast-math -D NDEBUG -std=c++17 -MD -MT src/Drivers/CMakeFiles/miniqmc.dir/miniqmc.cpp.o -MF src/Drivers/CMakeFiles/miniqmc.dir/miniqmc.cpp.o.d -o src/Drivers/CMakeFiles/miniqmc.dir/miniqmc.cpp.o -c /beegfs/hackathon/users/eoseret/qaas_runs/170-855-3059/intel/miniqmc/build/miniqmc/src/Drivers/miniqmc.cpp -I /cluster/intel/oneapi/2024.0.0/mpi/2021.11/include | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Number of processes observed | 2 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Number of threads observed | 2 | 5 | 9 | 18 | 34 | 66 | 130 | 194 |
Frequency Driver | acpi-cpufreq | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Frequency Governor | performance | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Huge Pages | always | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Hyperthreading | on | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Number of sockets | 2 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Number of cores per socket | 96 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
MAQAO version | 2.19.1 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
MAQAO build | e26c8ffcefb997f114892e36591c060f98f53e6a::20240206-190005 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Comments | Execution on the Megware (https://www.megware.com/en/) benchmarking cluster | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |