OV - gmx_mpi - Global

gmx_mpi - 2023-08-08 09:56:51 - MAQAO 2.17.7

Help is available by moving the cursor above any symbol or by checking MAQAO website.

Global Metrics

Total Time (s)		56.97
Profiled Time (s)		48.48
Time in analyzed loops (%)		80.8
Time in analyzed innermost loops (%)		64.4
Time in user code (%)		82.4
Compilation Options Score (%)		75.0
Perfect Flow Complexity		1.05
Array Access Efficiency (%)		58.3
GFLOPS		1.26 E3
Perfect OpenMP + MPI + Pthread		1.04
Perfect OpenMP + MPI + Pthread + Perfect Load Distribution		1.17
No Scalar Integer	Potential Speedup	1.06
No Scalar Integer	Nb Loops to get 80%	10
FP Vectorised	Potential Speedup	1.07
FP Vectorised	Nb Loops to get 80%	10
Fully Vectorised	Potential Speedup	1.39
Fully Vectorised	Nb Loops to get 80%	21
FP Arithmetic Only	Potential Speedup	1.26
FP Arithmetic Only	Nb Loops to get 80%	21

CQA Potential Speedups Summary

Loop Based Profile⏎

Innermost Loop Based Profile⏎

Application Categorization⏎

Compilation Options⏎

Source Object	Issue
▼libgromacs_mpi.so.7–
○pairlist_simd_4xm.h	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○fft5d.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○threaded_force_buffer.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○pme_gather.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○listed_forces.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○kernel_outer.h	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○manage_threading.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○kernel_prune.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○pairs.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○pairlist.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○update.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○md_support.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○pme.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○kernel_common.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○mdatoms.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○lincs.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○pbc.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○constr.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○pme_grid.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○localtopology.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○vector.tcc	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○pme_solve.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○pme_spread.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○calc_verletbuf.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○bonded.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○vec.h	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○sim_util.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○grid.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○partition.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○arrayref.h	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○domdec_constraints.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○settle.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○kerneldispatch.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.
○atomdata.cpp	-march=x86-64 is used but it should be replaced by a more architecture specific option or -march=native.

Loop Path Count Profile⏎

Cumulated Speedup If No Scalar Integer⏎

Cumulated Speedup If FP Vectorized⏎

Cumulated Speedup If Fully Vectorized⏎

Cumulated Speedup If FP Arithmetic Only⏎

Experiment Summary

Application	/ccc/work/cont001/ocre/oserete/gromacs-2022.4-install-gcc-ompi/bin/gmx_mpi
Timestamp	2023-08-08 09:56:51	Universal Timestamp	1691481411
Number of processes observed	1	Number of threads observed	52
Experiment Type	MPI; OpenMP;
Machine	inti6206
Model Name	AMD EPYC 7763 64-Core Processor
Architecture	x86_64	Micro Architecture	ZEN_V3
Cache Size	512 KB	Number of Cores	64
OS Version	Linux 4.18.0-305.88.1.el8_4.x86_64 #1 SMP Thu Apr 6 10:22:46 EDT 2023
Architecture used during static analysis	x86_64	Micro Architecture used during static analysis	ZEN_V3
Frequency Driver	acpi-cpufreq	Frequency Governor	performance
Huge Pages	always	Hyperthreading	on
Number of sockets	2	Number of cores per socket	64
Compilation Options	libgromacs_mpi.so.7: GNU C++17 12.2.0 -mavx2 -mfma -mtune=generic -march=x86-64 -g -g -O2 -std=c++17 -fno-omit-frame-pointer -fcf-protection=none -fPIC -fexcess-precision=fast -funroll-all-loops -fopenmp -fexceptions
Comments	GROMACS compiled with gcc 12.2.0 + OpenMPI, Zen 3, OV1, 10000 steps, 52 cores

Configuration Summary

Dataset
Run Command	<executable> mdrun -s ion_channel.tpr -nsteps 10000 -pin on -deffnm gcc
MPI Command	ccc_mprun -p milan-bxi -T 1200 -n <number_processes> -x -E --enable_perf -c 128
Number Processes	1
Number Nodes	1
Number Processes per Nodes	1
Filter	Not Used
Profile Start	Not Used
Maximal Path Number	4

Report Configuration

gmx_mpi - 2023-08-08 09:56:51 - MAQAO 2.17.7

Global Metrics

CQA Potential Speedups Summary

Loop Based Profile⏎

Innermost Loop Based Profile⏎

Application Categorization⏎

Compilation Options⏎

Loop Path Count Profile⏎

Cumulated Speedup If No Scalar Integer⏎

Cumulated Speedup If FP Vectorized⏎

Cumulated Speedup If Fully Vectorized⏎

Cumulated Speedup If FP Arithmetic Only⏎

Experiment Summary

Configuration Summary