* Info: Selecting the 'perf-high-ppn' engine for node ins01.benchmarkcenter.megware.com
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 10524)
* Info: "ref-cycles" not supported on ins01.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 10529)miniqmc not built from git repository
number of ranks : 2, number of accelerators : 0
Number of orbitals/splines = 3072
Tile size = 3072
Number of tiles = 1
Number of electrons = 6144
Rmax = 1.7
AcceptanceRatio = 0.5
Iterations = 5
MPI processes = 2
OpenMP threads = 1
Number of walkers per rank = 1
SPO coefficients size = 1572864000 bytes (1500 MB)
delayed update rank = 32
Using the reference implementation for Jastrow,
determinant update, and distance table + einspline of the
reference implementation
==================================
Use --enable-timers= command line option to increase or decrease level of timing information
Stack timer profile
Timer Inclusive_time Exclusive_time Calls Time_per_call
Setup 0.1506 0.1506 1 0.150640481
ParticleSet:::update 0.0000 0.0000 1 0.000004870
Total 34.5353 0.0003 1 34.535317218
Diffusion 23.8612 0.0165 5 4.772236134
Complete Updates 0.2103 0.0000 5 0.042050944
DeterminantRef::update 0.2102 0.2102 10 0.021022682
Current Gradient 0.5728 0.0098 30720 0.000018647
DeterminantRef::ratio 0.5580 0.5580 30720 0.000018165
OneBodyJastrowRef 0.0028 0.0028 30720 0.000000090
TwoBodyJastrowRef 0.0023 0.0023 30720 0.000000073
Kinetic Energy 0.1192 0.1189 5 0.023832527
OneBodyJastrowRef 0.0002 0.0002 5 0.000036379
TwoBodyJastrowRef 0.0001 0.0001 5 0.000022498
New Gradient 4.1935 0.0138 30720 0.000136507
DeterminantRef::ratio 0.0487 0.0487 30720 0.000001584
DeterminantRef::spovgl 3.7604 0.2605 30720 0.000122407
Single-Particle Orbitals 3.4999 3.4999 30720 0.000113929
OneBodyJastrowRef 0.0263 0.0263 30720 0.000000856
TwoBodyJastrowRef 0.3444 0.3444 30720 0.000011211
ParticleSet:::acceptMove 0.4212 0.0045 15371 0.000027399
DTAAOMPTarget::update_e_e 0.4081 0.4081 15371 0.000026550
DTABOMPTarget::update_ion_e 0.0085 0.0085 15371 0.000000554
ParticleSet:::computeNewPosDT 0.4734 0.0071 30720 0.000015411
DTAAOMPTarget::move_e_e 0.4191 0.4191 30720 0.000013641
DTABOMPTarget::move_ion_e 0.0473 0.0473 30720 0.000001540
ParticleSet:::donePbyP 0.0000 0.0000 5 0.000002246
Update 17.8543 0.0078 15371 0.001161556
DeterminantRef::update 17.5249 17.5249 15371 0.001140129
OneBodyJastrowRef 0.0016 0.0016 15371 0.000000105
TwoBodyJastrowRef 0.3199 0.3199 15371 0.000020814
Initialization 4.4457 0.1665 1 4.445664048
DeterminantRef::inverse 3.4064 3.4064 2 1.703216794
DeterminantRef::spovgl 0.7478 0.0537 2 0.373914075
Single-Particle Orbitals 0.6941 0.6941 6144 0.000112973
OneBodyJastrowRef 0.0055 0.0055 1 0.005498877
ParticleSet:::update 0.0599 0.0091 2 0.029962859
DTAAOMPTarget::evaluate_e_e 0.0463 0.0463 1 0.046334390
DTABOMPTarget::evaluate_ion_e 0.0045 0.0001 1 0.004516940
DTABOMPTarget::offload_ion_e 0.0045 0.0045 1 0.004455161
TwoBodyJastrowRef 0.0595 0.0595 1 0.059496677
Pseudopotential 6.2282 0.0317 5 1.245634114
DeterminantRef::spoval 4.6897 0.1534 10215 0.000459101
Single-Particle Orbitals 4.5363 4.5363 122580 0.000037007
OneBodyJastrowRef 0.0123 0.0123 10215 0.000001202
ParticleSet:::update 0.8996 0.0058 10215 0.000088069
DTABOMPTarget::evaluate_e_virtual 0.8171 0.0017 10215 0.000079986
DTABOMPTarget::offload_e_virtual 0.8153 0.8153 10215 0.000079815
DTABOMPTarget::evaluate_ion_virtual 0.0768 0.0018 10215 0.000007515
DTABOMPTarget::offload_ion_virtual 0.0750 0.0750 10215 0.000007341
TwoBodyJastrowRef 0.5948 0.5948 10215 0.000058229
========== Throughput ============
Total throughput ( N_walkers * N_elec^3 / Total time ) = 1.34314e+10
Diffusion throughput ( N_walkers * N_elec^3 / Diffusion time ) = 1.94398e+10
Pseudopotential throughput ( N_walkers * N_elec^2 / Pseudopotential time ) = 1.21219e+07
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 10524)
* Warning: (host ins01.benchmarkcenter.megware.com, process 10524) Observed more threads (2) than expected (1): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=2.
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 10529)
* Warning: (host ins01.benchmarkcenter.megware.com, process 10529) Observed more threads (2) than expected (1): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=2.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_0
To display your profiling results:
##########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
##########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_0 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_0 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_0 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_0 #
##########################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node ins01.benchmarkcenter.megware.com
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 10590)
* Info: "ref-cycles" not supported on ins01.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 10595)miniqmc not built from git repository
number of ranks : 2, number of accelerators : 0
Number of orbitals/splines = 3072
Tile size = 3072
Number of tiles = 1
Number of electrons = 6144
Rmax = 1.7
AcceptanceRatio = 0.5
Iterations = 5
MPI processes = 2
OpenMP threads = 2
Number of walkers per rank = 2
SPO coefficients size = 1572864000 bytes (1500 MB)
delayed update rank = 32
Using the reference implementation for Jastrow,
determinant update, and distance table + einspline of the
reference implementation
==================================
Use --enable-timers= command line option to increase or decrease level of timing information
Stack timer profile
Timer Inclusive_time Exclusive_time Calls Time_per_call
Setup 0.0821 0.0821 1 0.082144740
ParticleSet:::update 0.0000 0.0000 1 0.000005570
Total 35.0532 0.0232 1 35.053229609
Diffusion 23.7142 0.0185 5 4.742836491
Complete Updates 0.2149 0.0000 5 0.042978076
DeterminantRef::update 0.2149 0.2149 10 0.021486858
Current Gradient 0.5859 0.0113 30720 0.000019074
DeterminantRef::ratio 0.5698 0.5698 30720 0.000018548
OneBodyJastrowRef 0.0027 0.0027 30720 0.000000087
TwoBodyJastrowRef 0.0021 0.0021 30720 0.000000070
Kinetic Energy 0.1224 0.1222 5 0.024482301
OneBodyJastrowRef 0.0001 0.0001 5 0.000022316
TwoBodyJastrowRef 0.0001 0.0001 5 0.000021870
New Gradient 3.8729 0.0156 30720 0.000126071
DeterminantRef::ratio 0.0484 0.0484 30720 0.000001574
DeterminantRef::spovgl 3.4402 0.2651 30720 0.000111986
Single-Particle Orbitals 3.1751 3.1751 30720 0.000103357
OneBodyJastrowRef 0.0270 0.0270 30720 0.000000878
TwoBodyJastrowRef 0.3417 0.3417 30720 0.000011123
ParticleSet:::acceptMove 0.4203 0.0054 15371 0.000027345
DTAAOMPTarget::update_e_e 0.4062 0.4062 15371 0.000026426
DTABOMPTarget::update_ion_e 0.0087 0.0087 15371 0.000000567
ParticleSet:::computeNewPosDT 0.4914 0.0072 30720 0.000015997
DTAAOMPTarget::move_e_e 0.4298 0.4298 30720 0.000013992
DTABOMPTarget::move_ion_e 0.0544 0.0544 30720 0.000001770
ParticleSet:::donePbyP 0.0000 0.0000 5 0.000001840
Update 17.9878 0.0080 15371 0.001170245
DeterminantRef::update 17.6581 17.6581 15371 0.001148791
OneBodyJastrowRef 0.0015 0.0015 15371 0.000000097
TwoBodyJastrowRef 0.3203 0.3203 15371 0.000020836
Initialization 4.4179 0.1878 1 4.417923617
DeterminantRef::inverse 3.4000 3.4000 2 1.700014999
DeterminantRef::spovgl 0.7021 0.0541 2 0.351029833
Single-Particle Orbitals 0.6480 0.6480 6144 0.000105461
OneBodyJastrowRef 0.0062 0.0062 1 0.006239878
ParticleSet:::update 0.0615 0.0099 2 0.030757753
DTAAOMPTarget::evaluate_e_e 0.0464 0.0464 1 0.046381683
DTABOMPTarget::evaluate_ion_e 0.0052 0.0001 1 0.005217510
DTABOMPTarget::offload_ion_e 0.0052 0.0052 1 0.005160551
TwoBodyJastrowRef 0.0603 0.0603 1 0.060253157
Pseudopotential 6.8980 0.0333 5 1.379590129
DeterminantRef::spoval 5.3548 0.1542 10215 0.000524212
Single-Particle Orbitals 5.2006 5.2006 122580 0.000042426
OneBodyJastrowRef 0.0127 0.0127 10215 0.000001245
ParticleSet:::update 0.9001 0.0060 10215 0.000088115
DTABOMPTarget::evaluate_e_virtual 0.8172 0.0019 10215 0.000079996
DTABOMPTarget::offload_e_virtual 0.8153 0.8153 10215 0.000079810
DTABOMPTarget::evaluate_ion_virtual 0.0770 0.0018 10215 0.000007537
DTABOMPTarget::offload_ion_virtual 0.0752 0.0752 10215 0.000007358
TwoBodyJastrowRef 0.5970 0.5970 10215 0.000058443
========== Throughput ============
Total throughput ( N_walkers * N_elec^3 / Total time ) = 2.64658e+10
Diffusion throughput ( N_walkers * N_elec^3 / Diffusion time ) = 3.91206e+10
Pseudopotential throughput ( N_walkers * N_elec^2 / Pseudopotential time ) = 2.18898e+07
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 10590)
* Warning: (host ins01.benchmarkcenter.megware.com, process 10590) Observed more threads (3) than expected (2): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=3.
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 10595)
* Warning: (host ins01.benchmarkcenter.megware.com, process 10595) Observed more threads (3) than expected (2): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=3.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_1
To display your profiling results:
##########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
##########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_1 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_1 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_1 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_1 #
##########################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node ins01.benchmarkcenter.megware.com
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 10670)
* Info: "ref-cycles" not supported on ins01.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 10675)miniqmc not built from git repository
number of ranks : 2, number of accelerators : 0
Number of orbitals/splines = 3072
Tile size = 3072
Number of tiles = 1
Number of electrons = 6144
Rmax = 1.7
AcceptanceRatio = 0.5
Iterations = 5
MPI processes = 2
OpenMP threads = 4
Number of walkers per rank = 4
SPO coefficients size = 1572864000 bytes (1500 MB)
delayed update rank = 32
Using the reference implementation for Jastrow,
determinant update, and distance table + einspline of the
reference implementation
==================================
Use --enable-timers= command line option to increase or decrease level of timing information
Stack timer profile
Timer Inclusive_time Exclusive_time Calls Time_per_call
Setup 0.0467 0.0467 1 0.046659425
ParticleSet:::update 0.0000 0.0000 1 0.000004320
Total 37.9215 0.7275 1 37.921520932
Diffusion 24.1738 0.0216 5 4.834754682
Complete Updates 0.2205 0.0000 5 0.044093560
DeterminantRef::update 0.2204 0.2204 10 0.022043915
Current Gradient 0.5903 0.0129 30720 0.000019215
DeterminantRef::ratio 0.5723 0.5723 30720 0.000018630
OneBodyJastrowRef 0.0029 0.0029 30720 0.000000095
TwoBodyJastrowRef 0.0022 0.0022 30720 0.000000070
Kinetic Energy 0.1328 0.1326 5 0.026564228
OneBodyJastrowRef 0.0001 0.0001 5 0.000026509
TwoBodyJastrowRef 0.0001 0.0001 5 0.000022528
New Gradient 4.2954 0.0172 30720 0.000139823
DeterminantRef::ratio 0.0489 0.0489 30720 0.000001591
DeterminantRef::spovgl 3.8560 0.2760 30720 0.000125522
Single-Particle Orbitals 3.5801 3.5801 30720 0.000116539
OneBodyJastrowRef 0.0289 0.0289 30720 0.000000940
TwoBodyJastrowRef 0.3444 0.3444 30720 0.000011210
ParticleSet:::acceptMove 0.4254 0.0052 15371 0.000027674
DTAAOMPTarget::update_e_e 0.4114 0.4114 15371 0.000026764
DTABOMPTarget::update_ion_e 0.0088 0.0088 15371 0.000000573
ParticleSet:::computeNewPosDT 0.4972 0.0072 30720 0.000016186
DTAAOMPTarget::move_e_e 0.4385 0.4385 30720 0.000014273
DTABOMPTarget::move_ion_e 0.0516 0.0516 30720 0.000001680
ParticleSet:::donePbyP 0.0000 0.0000 5 0.000002402
Update 17.9907 0.0086 15371 0.001170429
DeterminantRef::update 17.6551 17.6551 15371 0.001148595
OneBodyJastrowRef 0.0017 0.0017 15371 0.000000108
TwoBodyJastrowRef 0.3253 0.3253 15371 0.000021163
Initialization 4.5935 0.3649 1 4.593472626
DeterminantRef::inverse 3.3310 3.3310 2 1.665502763
DeterminantRef::spovgl 0.7673 0.0532 2 0.383671976
Single-Particle Orbitals 0.7141 0.7141 6144 0.000116235
OneBodyJastrowRef 0.0058 0.0058 1 0.005758439
ParticleSet:::update 0.0654 0.0119 2 0.032705549
DTAAOMPTarget::evaluate_e_e 0.0465 0.0465 1 0.046548976
DTABOMPTarget::evaluate_ion_e 0.0069 0.0002 1 0.006921424
DTABOMPTarget::offload_ion_e 0.0068 0.0068 1 0.006759738
TwoBodyJastrowRef 0.0591 0.0591 1 0.059090752
Pseudopotential 8.4267 0.0373 5 1.685344981
DeterminantRef::spoval 6.8417 0.1562 10215 0.000669769
Single-Particle Orbitals 6.6855 6.6855 122580 0.000054540
OneBodyJastrowRef 0.0130 0.0130 10215 0.000001276
ParticleSet:::update 0.9377 0.0059 10215 0.000091797
DTABOMPTarget::evaluate_e_virtual 0.8481 0.0019 10215 0.000083021
DTABOMPTarget::offload_e_virtual 0.8461 0.8461 10215 0.000082832
DTABOMPTarget::evaluate_ion_virtual 0.0838 0.0018 10215 0.000008204
DTABOMPTarget::offload_ion_virtual 0.0820 0.0820 10215 0.000008026
TwoBodyJastrowRef 0.5970 0.5970 10215 0.000058445
========== Throughput ============
Total throughput ( N_walkers * N_elec^3 / Total time ) = 4.8928e+10
Diffusion throughput ( N_walkers * N_elec^3 / Diffusion time ) = 7.67537e+10
Pseudopotential throughput ( N_walkers * N_elec^2 / Pseudopotential time ) = 3.58372e+07
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 10675)
* Warning: (host ins01.benchmarkcenter.megware.com, process 10675) Observed more threads (5) than expected (4): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=5.
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 10670)
* Warning: (host ins01.benchmarkcenter.megware.com, process 10670) Observed more threads (5) than expected (4): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=5.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_2
To display your profiling results:
##########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
##########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_2 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_2 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_2 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_2 #
##########################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node ins01.benchmarkcenter.megware.com
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 10736)
* Info: "ref-cycles" not supported on ins01.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 10742)miniqmc not built from git repository
number of ranks : 2, number of accelerators : 0
Number of orbitals/splines = 3072
Tile size = 3072
Number of tiles = 1
Number of electrons = 6144
Rmax = 1.7
AcceptanceRatio = 0.5
Iterations = 5
MPI processes = 2
OpenMP threads = 8
Number of walkers per rank = 8
SPO coefficients size = 1572864000 bytes (1500 MB)
delayed update rank = 32
Using the reference implementation for Jastrow,
determinant update, and distance table + einspline of the
reference implementation
==================================
Use --enable-timers= command line option to increase or decrease level of timing information
Stack timer profile
Timer Inclusive_time Exclusive_time Calls Time_per_call
Setup 0.0277 0.0277 1 0.027731689
ParticleSet:::update 0.0000 0.0000 1 0.000003930
Total 39.0489 1.4427 1 39.048934166
Diffusion 23.9821 0.0225 5 4.796417655
Complete Updates 0.2178 0.0000 5 0.043566485
DeterminantRef::update 0.2178 0.2178 10 0.021781294
Current Gradient 0.5933 0.0141 30720 0.000019314
DeterminantRef::ratio 0.5737 0.5737 30720 0.000018674
OneBodyJastrowRef 0.0031 0.0031 30720 0.000000099
TwoBodyJastrowRef 0.0025 0.0025 30720 0.000000082
Kinetic Energy 0.1283 0.1281 5 0.025659799
OneBodyJastrowRef 0.0001 0.0001 5 0.000022730
TwoBodyJastrowRef 0.0001 0.0001 5 0.000022632
New Gradient 3.9859 0.0178 30720 0.000129749
DeterminantRef::ratio 0.0486 0.0486 30720 0.000001581
DeterminantRef::spovgl 3.5465 0.2606 30720 0.000115445
Single-Particle Orbitals 3.2859 3.2859 30720 0.000106963
OneBodyJastrowRef 0.0306 0.0306 30720 0.000000997
TwoBodyJastrowRef 0.3424 0.3424 30720 0.000011147
ParticleSet:::acceptMove 0.4328 0.0047 15371 0.000028157
DTAAOMPTarget::update_e_e 0.4193 0.4193 15371 0.000027279
DTABOMPTarget::update_ion_e 0.0088 0.0088 15371 0.000000572
ParticleSet:::computeNewPosDT 0.4986 0.0074 30720 0.000016230
DTAAOMPTarget::move_e_e 0.4171 0.4171 30720 0.000013577
DTABOMPTarget::move_ion_e 0.0741 0.0741 30720 0.000002414
ParticleSet:::donePbyP 0.0000 0.0000 5 0.000002314
Update 18.1028 0.0090 15371 0.001177726
DeterminantRef::update 17.7679 17.7679 15371 0.001155940
OneBodyJastrowRef 0.0017 0.0017 15371 0.000000113
TwoBodyJastrowRef 0.3241 0.3241 15371 0.000021088
Initialization 4.6790 0.4955 1 4.678958897
DeterminantRef::inverse 3.3333 3.3333 2 1.666657822
DeterminantRef::spovgl 0.7193 0.0543 2 0.359636827
Single-Particle Orbitals 0.6650 0.6650 6144 0.000108236
OneBodyJastrowRef 0.0059 0.0059 1 0.005860267
ParticleSet:::update 0.0652 0.0132 2 0.032587268
DTAAOMPTarget::evaluate_e_e 0.0464 0.0464 1 0.046408928
DTABOMPTarget::evaluate_ion_e 0.0056 0.0002 1 0.005551654
DTABOMPTarget::offload_ion_e 0.0054 0.0054 1 0.005365528
TwoBodyJastrowRef 0.0599 0.0599 1 0.059851527
Pseudopotential 8.9452 0.0433 5 1.789038936
DeterminantRef::spoval 7.3072 0.1580 10215 0.000715337
Single-Particle Orbitals 7.1492 7.1492 122580 0.000058323
OneBodyJastrowRef 0.0135 0.0135 10215 0.000001323
ParticleSet:::update 0.9804 0.0072 10215 0.000095979
DTABOMPTarget::evaluate_e_virtual 0.8855 0.0024 10215 0.000086683
DTABOMPTarget::offload_e_virtual 0.8831 0.8831 10215 0.000086451
DTABOMPTarget::evaluate_ion_virtual 0.0878 0.0023 10215 0.000008596
DTABOMPTarget::offload_ion_virtual 0.0855 0.0855 10215 0.000008373
TwoBodyJastrowRef 0.6007 0.6007 10215 0.000058809
========== Throughput ============
Total throughput ( N_walkers * N_elec^3 / Total time ) = 9.50308e+10
Diffusion throughput ( N_walkers * N_elec^3 / Diffusion time ) = 1.54734e+11
Pseudopotential throughput ( N_walkers * N_elec^2 / Pseudopotential time ) = 6.752e+07
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 10736)
* Warning: (host ins01.benchmarkcenter.megware.com, process 10736) Observed more threads (9) than expected (8): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=9.
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 10742)
* Warning: (host ins01.benchmarkcenter.megware.com, process 10742) Observed more threads (9) than expected (8): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=9.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_3
To display your profiling results:
##########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
##########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_3 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_3 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_3 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_3 #
##########################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node ins01.benchmarkcenter.megware.com
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 10842)
* Info: "ref-cycles" not supported on ins01.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 10847)miniqmc not built from git repository
number of ranks : 2, number of accelerators : 0
Number of orbitals/splines = 3072
Tile size = 3072
Number of tiles = 1
Number of electrons = 6144
Rmax = 1.7
AcceptanceRatio = 0.5
Iterations = 5
MPI processes = 2
OpenMP threads = 16
Number of walkers per rank = 16
SPO coefficients size = 1572864000 bytes (1500 MB)
delayed update rank = 32
Using the reference implementation for Jastrow,
determinant update, and distance table + einspline of the
reference implementation
==================================
Use --enable-timers= command line option to increase or decrease level of timing information
Stack timer profile
Timer Inclusive_time Exclusive_time Calls Time_per_call
Setup 0.0390 0.0390 1 0.038969086
ParticleSet:::update 0.0000 0.0000 1 0.000004110
Total 60.8123 2.5198 1 60.812255605
Diffusion 39.7634 0.0242 5 7.952670049
Complete Updates 0.6626 0.0000 5 0.132518393
DeterminantRef::update 0.6626 0.6626 10 0.066255673
Current Gradient 0.6791 0.0159 30720 0.000022106
DeterminantRef::ratio 0.6576 0.6576 30720 0.000021406
OneBodyJastrowRef 0.0031 0.0031 30720 0.000000100
TwoBodyJastrowRef 0.0026 0.0026 30720 0.000000083
Kinetic Energy 0.3265 0.3262 5 0.065301688
OneBodyJastrowRef 0.0002 0.0002 5 0.000038933
TwoBodyJastrowRef 0.0001 0.0001 5 0.000026736
New Gradient 5.5642 0.0185 30720 0.000181127
DeterminantRef::ratio 0.0496 0.0496 30720 0.000001616
DeterminantRef::spovgl 5.1077 0.2809 30720 0.000166266
Single-Particle Orbitals 4.8267 4.8267 30720 0.000157120
OneBodyJastrowRef 0.0302 0.0302 30720 0.000000982
TwoBodyJastrowRef 0.3582 0.3582 30720 0.000011661
ParticleSet:::acceptMove 0.7262 0.0055 15371 0.000047247
DTAAOMPTarget::update_e_e 0.7100 0.7100 15371 0.000046191
DTABOMPTarget::update_ion_e 0.0107 0.0107 15371 0.000000698
ParticleSet:::computeNewPosDT 0.4976 0.0076 30720 0.000016198
DTAAOMPTarget::move_e_e 0.4280 0.4280 30720 0.000013931
DTABOMPTarget::move_ion_e 0.0621 0.0621 30720 0.000002021
ParticleSet:::donePbyP 0.0000 0.0000 5 0.000002712
Update 31.2828 0.0106 15371 0.002035186
DeterminantRef::update 30.8737 30.8737 15371 0.002008571
OneBodyJastrowRef 0.0018 0.0018 15371 0.000000115
TwoBodyJastrowRef 0.3968 0.3968 15371 0.000025813
Initialization 5.1519 0.4056 1 5.151923367
DeterminantRef::inverse 3.4984 3.4984 2 1.749183001
DeterminantRef::spovgl 1.0602 0.0647 2 0.530083384
Single-Particle Orbitals 0.9954 0.9954 6144 0.000162016
OneBodyJastrowRef 0.0055 0.0055 1 0.005517164
ParticleSet:::update 0.1231 0.0213 2 0.061530898
DTAAOMPTarget::evaluate_e_e 0.0915 0.0915 1 0.091466049
DTABOMPTarget::evaluate_ion_e 0.0103 0.0001 1 0.010287650
DTABOMPTarget::offload_ion_e 0.0102 0.0102 1 0.010218093
TwoBodyJastrowRef 0.0592 0.0592 1 0.059207473
Pseudopotential 13.3772 0.0646 5 2.675446116
DeterminantRef::spoval 10.8740 0.1989 10215 0.001064513
Single-Particle Orbitals 10.6751 10.6751 122580 0.000087087
OneBodyJastrowRef 0.0259 0.0259 10215 0.000002531
ParticleSet:::update 1.7148 0.0127 10215 0.000167868
DTABOMPTarget::evaluate_e_virtual 1.5008 0.0035 10215 0.000146922
DTABOMPTarget::offload_e_virtual 1.4973 1.4973 10215 0.000146579
DTABOMPTarget::evaluate_ion_virtual 0.2013 0.0045 10215 0.000019703
DTABOMPTarget::offload_ion_virtual 0.1968 0.1968 10215 0.000019263
TwoBodyJastrowRef 0.6980 0.6980 10215 0.000068332
========== Throughput ============
Total throughput ( N_walkers * N_elec^3 / Total time ) = 1.22043e+11
Diffusion throughput ( N_walkers * N_elec^3 / Diffusion time ) = 1.86647e+11
Pseudopotential throughput ( N_walkers * N_elec^2 / Pseudopotential time ) = 9.02997e+07
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 10847)
* Warning: (host ins01.benchmarkcenter.megware.com, process 10847) Observed more threads (17) than expected (16): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=17.
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 10842)
* Warning: (host ins01.benchmarkcenter.megware.com, process 10842) Observed more threads (17) than expected (16): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=17.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_4
To display your profiling results:
##########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
##########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_4 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_4 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_4 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_4 #
##########################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node ins01.benchmarkcenter.megware.com
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 10980)
* Info: "ref-cycles" not supported on ins01.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 10985)miniqmc not built from git repository
number of ranks : 2, number of accelerators : 0
Number of orbitals/splines = 3072
Tile size = 3072
Number of tiles = 1
Number of electrons = 6144
Rmax = 1.7
AcceptanceRatio = 0.5
Iterations = 5
MPI processes = 2
OpenMP threads = 32
Number of walkers per rank = 32
SPO coefficients size = 1572864000 bytes (1500 MB)
delayed update rank = 32
Using the reference implementation for Jastrow,
determinant update, and distance table + einspline of the
reference implementation
==================================
Use --enable-timers= command line option to increase or decrease level of timing information
Stack timer profile
Timer Inclusive_time Exclusive_time Calls Time_per_call
Setup 0.0471 0.0471 1 0.047076992
ParticleSet:::update 0.0000 0.0000 1 0.000004590
Total 72.4020 2.4148 1 72.401972073
Diffusion 43.6819 0.0284 5 8.736384735
Complete Updates 0.6191 0.0001 5 0.123820062
DeterminantRef::update 0.6190 0.6190 10 0.061904442
Current Gradient 0.7470 0.0208 30720 0.000024315
DeterminantRef::ratio 0.7204 0.7204 30720 0.000023452
OneBodyJastrowRef 0.0033 0.0033 30720 0.000000108
TwoBodyJastrowRef 0.0024 0.0024 30720 0.000000077
Kinetic Energy 0.3696 0.3690 5 0.073920943
OneBodyJastrowRef 0.0004 0.0004 5 0.000078303
TwoBodyJastrowRef 0.0002 0.0002 5 0.000037685
New Gradient 4.9327 0.0229 30720 0.000160569
DeterminantRef::ratio 0.0511 0.0511 30720 0.000001662
DeterminantRef::spovgl 4.4337 0.2925 30720 0.000144327
Single-Particle Orbitals 4.1413 4.1413 30720 0.000134807
OneBodyJastrowRef 0.0358 0.0358 30720 0.000001165
TwoBodyJastrowRef 0.3892 0.3892 30720 0.000012669
ParticleSet:::acceptMove 0.9819 0.0066 15371 0.000063880
DTAAOMPTarget::update_e_e 0.9620 0.9620 15371 0.000062586
DTABOMPTarget::update_ion_e 0.0133 0.0133 15371 0.000000865
ParticleSet:::computeNewPosDT 0.5195 0.0079 30720 0.000016909
DTAAOMPTarget::move_e_e 0.4447 0.4447 30720 0.000014476
DTABOMPTarget::move_ion_e 0.0669 0.0669 30720 0.000002177
ParticleSet:::donePbyP 0.0000 0.0000 5 0.000006526
Update 35.4838 0.0133 15371 0.002308487
DeterminantRef::update 35.0338 35.0338 15371 0.002279213
OneBodyJastrowRef 0.0018 0.0018 15371 0.000000118
TwoBodyJastrowRef 0.4348 0.4348 15371 0.000028289
Initialization 5.9679 0.9192 1 5.967853675
DeterminantRef::inverse 3.5327 3.5327 2 1.766365794
DeterminantRef::spovgl 1.3056 0.1092 2 0.652819295
Single-Particle Orbitals 1.1964 1.1964 6144 0.000194726
OneBodyJastrowRef 0.0075 0.0075 1 0.007520984
ParticleSet:::update 0.1395 0.0319 2 0.069737106
DTAAOMPTarget::evaluate_e_e 0.0913 0.0913 1 0.091260234
DTABOMPTarget::evaluate_ion_e 0.0163 0.0001 1 0.016271872
DTABOMPTarget::offload_ion_e 0.0162 0.0162 1 0.016174114
TwoBodyJastrowRef 0.0633 0.0633 1 0.063312495
Pseudopotential 20.3374 0.1086 5 4.067480435
DeterminantRef::spoval 16.9228 0.4312 10215 0.001656663
Single-Particle Orbitals 16.4916 16.4916 122580 0.000134537
OneBodyJastrowRef 0.0494 0.0494 10215 0.000004836
ParticleSet:::update 2.4007 0.0240 10215 0.000235019
DTABOMPTarget::evaluate_e_virtual 2.0929 0.0058 10215 0.000204888
DTABOMPTarget::offload_e_virtual 2.0871 2.0871 10215 0.000204315
DTABOMPTarget::evaluate_ion_virtual 0.2838 0.0070 10215 0.000027787
DTABOMPTarget::offload_ion_virtual 0.2769 0.2769 10215 0.000027106
TwoBodyJastrowRef 0.8558 0.8558 10215 0.000083781
========== Throughput ============
Total throughput ( N_walkers * N_elec^3 / Total time ) = 2.05014e+11
Diffusion throughput ( N_walkers * N_elec^3 / Diffusion time ) = 3.39807e+11
Pseudopotential throughput ( N_walkers * N_elec^2 / Pseudopotential time ) = 1.18792e+08
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 10985)
* Warning: (host ins01.benchmarkcenter.megware.com, process 10985) Observed more threads (33) than expected (32): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=33.
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 10980)
* Warning: (host ins01.benchmarkcenter.megware.com, process 10980) Observed more threads (33) than expected (32): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=33.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_5
To display your profiling results:
##########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
##########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_5 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_5 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_5 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_5 #
##########################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node ins01.benchmarkcenter.megware.com
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 11192)
* Info: "ref-cycles" not supported on ins01.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 11197)miniqmc not built from git repository
number of ranks : 2, number of accelerators : 0
Number of orbitals/splines = 3072
Tile size = 3072
Number of tiles = 1
Number of electrons = 6144
Rmax = 1.7
AcceptanceRatio = 0.5
Iterations = 5
MPI processes = 2
OpenMP threads = 64
Number of walkers per rank = 64
SPO coefficients size = 1572864000 bytes (1500 MB)
delayed update rank = 32
Using the reference implementation for Jastrow,
determinant update, and distance table + einspline of the
reference implementation
==================================
Use --enable-timers= command line option to increase or decrease level of timing information
Stack timer profile
Timer Inclusive_time Exclusive_time Calls Time_per_call
Setup 0.0819 0.0819 1 0.081878342
ParticleSet:::update 0.0000 0.0000 1 0.000004240
Total 144.4898 9.0358 1 144.489822856
Diffusion 85.1058 0.0385 5 17.021163321
Complete Updates 1.4654 0.0001 5 0.293073003
DeterminantRef::update 1.4653 1.4653 10 0.146530283
Current Gradient 1.1890 0.0294 30720 0.000038705
DeterminantRef::ratio 1.1519 1.1519 30720 0.000037496
OneBodyJastrowRef 0.0044 0.0044 30720 0.000000144
TwoBodyJastrowRef 0.0033 0.0033 30720 0.000000107
Kinetic Energy 0.7818 0.7809 5 0.156365937
OneBodyJastrowRef 0.0006 0.0006 5 0.000122264
TwoBodyJastrowRef 0.0003 0.0003 5 0.000069347
New Gradient 8.1921 0.0348 30720 0.000266669
DeterminantRef::ratio 0.0669 0.0669 30720 0.000002179
DeterminantRef::spovgl 7.4446 0.4268 30720 0.000242337
Single-Particle Orbitals 7.0178 7.0178 30720 0.000228443
OneBodyJastrowRef 0.0724 0.0724 30720 0.000002356
TwoBodyJastrowRef 0.5734 0.5734 30720 0.000018664
ParticleSet:::acceptMove 1.8804 0.0096 15371 0.000122333
DTAAOMPTarget::update_e_e 1.8445 1.8445 15371 0.000120001
DTABOMPTarget::update_ion_e 0.0263 0.0263 15371 0.000001710
ParticleSet:::computeNewPosDT 0.7151 0.0112 30720 0.000023279
DTAAOMPTarget::move_e_e 0.5901 0.5901 30720 0.000019207
DTABOMPTarget::move_ion_e 0.1139 0.1139 30720 0.000003708
ParticleSet:::donePbyP 0.0000 0.0000 5 0.000008480
Update 70.8435 0.0156 15371 0.004608907
DeterminantRef::update 70.0874 70.0874 15371 0.004559717
OneBodyJastrowRef 0.0028 0.0028 15371 0.000000183
TwoBodyJastrowRef 0.7377 0.7377 15371 0.000047993
Initialization 8.7707 2.2029 1 8.770723829
DeterminantRef::inverse 4.2089 4.2089 2 2.104468837
DeterminantRef::spovgl 2.0215 0.1668 2 1.010753705
Single-Particle Orbitals 1.8547 1.8547 6144 0.000301866
OneBodyJastrowRef 0.0073 0.0073 1 0.007268682
ParticleSet:::update 0.2685 0.0847 2 0.134233576
DTAAOMPTarget::evaluate_e_e 0.1518 0.1518 1 0.151788628
DTABOMPTarget::evaluate_ion_e 0.0320 0.0015 1 0.032019418
DTABOMPTarget::offload_ion_e 0.0305 0.0305 1 0.030495178
TwoBodyJastrowRef 0.0617 0.0617 1 0.061657174
Pseudopotential 41.5775 0.2524 5 8.315491395
DeterminantRef::spoval 33.3096 1.1568 10215 0.003260849
Single-Particle Orbitals 32.1527 32.1527 122580 0.000262300
OneBodyJastrowRef 0.1266 0.1266 10215 0.000012392
ParticleSet:::update 5.9741 0.0515 10215 0.000584835
DTABOMPTarget::evaluate_e_virtual 5.3686 0.0158 10215 0.000525558
DTABOMPTarget::offload_e_virtual 5.3528 5.3528 10215 0.000524011
DTABOMPTarget::evaluate_ion_virtual 0.5540 0.0177 10215 0.000054235
DTABOMPTarget::offload_ion_virtual 0.5363 0.5363 10215 0.000052500
TwoBodyJastrowRef 1.9148 1.9148 10215 0.000187450
========== Throughput ============
Total throughput ( N_walkers * N_elec^3 / Total time ) = 2.0546e+11
Diffusion throughput ( N_walkers * N_elec^3 / Diffusion time ) = 3.48822e+11
Pseudopotential throughput ( N_walkers * N_elec^2 / Pseudopotential time ) = 1.16213e+08
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 11197)
* Warning: (host ins01.benchmarkcenter.megware.com, process 11197) Observed more threads (65) than expected (64): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=65.
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 11192)
* Warning: (host ins01.benchmarkcenter.megware.com, process 11192) Observed more threads (65) than expected (64): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=65.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_6
To display your profiling results:
##########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
##########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_6 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_6 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_6 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_6 #
##########################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node ins01.benchmarkcenter.megware.com
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 11529)
* Info: "ref-cycles" not supported on ins01.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 11534)miniqmc not built from git repository
number of ranks : 2, number of accelerators : 0
Number of orbitals/splines = 3072
Tile size = 3072
Number of tiles = 1
Number of electrons = 6144
Rmax = 1.7
AcceptanceRatio = 0.5
Iterations = 5
MPI processes = 2
OpenMP threads = 96
Number of walkers per rank = 96
SPO coefficients size = 1572864000 bytes (1500 MB)
delayed update rank = 32
Using the reference implementation for Jastrow,
determinant update, and distance table + einspline of the
reference implementation
==================================
Use --enable-timers= command line option to increase or decrease level of timing information
Stack timer profile
Timer Inclusive_time Exclusive_time Calls Time_per_call
Setup 0.0975 0.0975 1 0.097513502
ParticleSet:::update 0.0000 0.0000 1 0.000005160
Total 204.9459 16.3330 1 204.945875173
Diffusion 118.0819 0.0499 5 23.616389721
Complete Updates 2.2506 0.0001 5 0.450122448
DeterminantRef::update 2.2505 2.2505 10 0.225048685
Current Gradient 1.8085 0.0402 30720 0.000058870
DeterminantRef::ratio 1.7543 1.7543 30720 0.000057107
OneBodyJastrowRef 0.0085 0.0085 30720 0.000000278
TwoBodyJastrowRef 0.0055 0.0055 30720 0.000000178
Kinetic Energy 1.2516 1.2500 5 0.250310493
OneBodyJastrowRef 0.0009 0.0009 5 0.000188035
TwoBodyJastrowRef 0.0006 0.0006 5 0.000119160
New Gradient 11.0659 0.0535 30720 0.000360217
DeterminantRef::ratio 0.0899 0.0899 30720 0.000002927
DeterminantRef::spovgl 9.9723 0.5091 30720 0.000324620
Single-Particle Orbitals 9.4632 9.4632 30720 0.000308047
OneBodyJastrowRef 0.1156 0.1156 30720 0.000003762
TwoBodyJastrowRef 0.8346 0.8346 30720 0.000027167
ParticleSet:::acceptMove 2.6355 0.0185 15371 0.000171459
DTAAOMPTarget::update_e_e 2.5744 2.5744 15371 0.000167485
DTABOMPTarget::update_ion_e 0.0426 0.0426 15371 0.000002769
ParticleSet:::computeNewPosDT 1.1022 0.0189 30720 0.000035878
DTAAOMPTarget::move_e_e 0.9338 0.9338 30720 0.000030398
DTABOMPTarget::move_ion_e 0.1494 0.1494 30720 0.000004864
ParticleSet:::donePbyP 0.0000 0.0000 5 0.000009194
Update 97.9178 0.0255 15371 0.006370294
DeterminantRef::update 96.8341 96.8341 15371 0.006299792
OneBodyJastrowRef 0.0053 0.0053 15371 0.000000343
TwoBodyJastrowRef 1.0529 1.0529 15371 0.000068497
Initialization 12.1054 3.1356 1 12.105418958
DeterminantRef::inverse 6.3592 6.3592 2 3.179622952
DeterminantRef::spovgl 2.0605 0.1380 2 1.030261826
Single-Particle Orbitals 1.9226 1.9226 6144 0.000312917
OneBodyJastrowRef 0.0056 0.0056 1 0.005555297
ParticleSet:::update 0.4855 0.1014 2 0.242737742
DTAAOMPTarget::evaluate_e_e 0.3109 0.3109 1 0.310943563
DTABOMPTarget::evaluate_ion_e 0.0731 0.0133 1 0.073110754
DTABOMPTarget::offload_ion_e 0.0598 0.0598 1 0.059790068
TwoBodyJastrowRef 0.0590 0.0590 1 0.058994009
Pseudopotential 58.4255 0.3037 5 11.685109360
DeterminantRef::spoval 48.0185 1.8013 10215 0.004700788
Single-Particle Orbitals 46.2173 46.2173 122580 0.000377037
OneBodyJastrowRef 0.1710 0.1710 10215 0.000016740
ParticleSet:::update 7.4549 0.0658 10215 0.000729798
DTABOMPTarget::evaluate_e_virtual 6.6616 0.0262 10215 0.000652142
DTABOMPTarget::offload_e_virtual 6.6354 6.6354 10215 0.000649577
DTABOMPTarget::evaluate_ion_virtual 0.7274 0.0224 10215 0.000071212
DTABOMPTarget::offload_ion_virtual 0.7050 0.7050 10215 0.000069020
TwoBodyJastrowRef 2.4775 2.4775 10215 0.000242532
========== Throughput ============
Total throughput ( N_walkers * N_elec^3 / Total time ) = 2.17278e+11
Diffusion throughput ( N_walkers * N_elec^3 / Diffusion time ) = 3.77113e+11
Pseudopotential throughput ( N_walkers * N_elec^2 / Pseudopotential time ) = 1.24051e+08
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 11534)
* Warning: (host ins01.benchmarkcenter.megware.com, process 11534) Observed more threads (97) than expected (96): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=97.
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 11529)
* Warning: (host ins01.benchmarkcenter.megware.com, process 11529) Observed more threads (97) than expected (96): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=97.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_7
To display your profiling results:
##########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
##########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_7 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_7 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_7 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/aocc_13/oneview_results_scal/tools/lprof_npsu_run_7 #
##########################################################################################################################################################################################################