* Info: Selecting the 'perf-high-ppn' engine for node ins01.benchmarkcenter.megware.com
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 11081)
* Info: "ref-cycles" not supported on ins01.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 11087)miniqmc not built from git repository
number of ranks : 2, number of accelerators : 0
Number of orbitals/splines = 3072
Tile size = 3072
Number of tiles = 1
Number of electrons = 6144
Rmax = 1.7
AcceptanceRatio = 0.5
Iterations = 5
MPI processes = 2
OpenMP threads = 1
Number of walkers per rank = 1
SPO coefficients size = 1572864000 bytes (1500 MB)
delayed update rank = 32
Using the reference implementation for Jastrow,
determinant update, and distance table + einspline of the
reference implementation
==================================
Use --enable-timers= command line option to increase or decrease level of timing information
Stack timer profile
Timer Inclusive_time Exclusive_time Calls Time_per_call
Setup 0.1516 0.1516 1 0.151611170
ParticleSet:::update 0.0000 0.0000 1 0.000004450
Total 45.7749 0.0004 1 45.774878881
Diffusion 29.8888 0.0153 5 5.977753402
Complete Updates 0.2120 0.0000 5 0.042392849
DeterminantRef::update 0.2119 0.2119 10 0.021194600
Current Gradient 0.5826 0.0095 30720 0.000018965
DeterminantRef::ratio 0.5686 0.5686 30720 0.000018509
OneBodyJastrowRef 0.0025 0.0025 30720 0.000000081
TwoBodyJastrowRef 0.0020 0.0020 30720 0.000000066
Kinetic Energy 0.1274 0.1271 5 0.025485379
OneBodyJastrowRef 0.0002 0.0002 5 0.000036339
TwoBodyJastrowRef 0.0001 0.0001 5 0.000020597
New Gradient 10.0924 0.0141 30720 0.000328529
DeterminantRef::ratio 0.0578 0.0578 30720 0.000001882
DeterminantRef::spovgl 9.6643 0.1508 30720 0.000314594
Single-Particle Orbitals 9.5136 9.5136 30720 0.000309686
OneBodyJastrowRef 0.0290 0.0290 30720 0.000000945
TwoBodyJastrowRef 0.3272 0.3272 30720 0.000010650
ParticleSet:::acceptMove 0.4468 0.0047 15371 0.000029067
DTAAOMPTarget::update_e_e 0.4334 0.4334 15371 0.000028195
DTABOMPTarget::update_ion_e 0.0087 0.0087 15371 0.000000566
ParticleSet:::computeNewPosDT 0.4571 0.0068 30720 0.000014879
DTAAOMPTarget::move_e_e 0.4040 0.4040 30720 0.000013151
DTABOMPTarget::move_ion_e 0.0463 0.0463 30720 0.000001508
ParticleSet:::donePbyP 0.0000 0.0000 5 0.000001050
Update 17.9551 0.0078 15371 0.001168118
DeterminantRef::update 17.5852 17.5852 15371 0.001144053
OneBodyJastrowRef 0.0014 0.0014 15371 0.000000091
TwoBodyJastrowRef 0.3607 0.3607 15371 0.000023469
Initialization 5.6411 0.1638 1 5.641131214
DeterminantRef::inverse 3.4098 3.4098 2 1.704919811
DeterminantRef::spovgl 1.9292 0.0320 2 0.964575216
Single-Particle Orbitals 1.8971 1.8971 6144 0.000308777
OneBodyJastrowRef 0.0055 0.0055 1 0.005538173
ParticleSet:::update 0.0754 0.0091 2 0.037718681
DTAAOMPTarget::evaluate_e_e 0.0460 0.0460 1 0.045992610
DTABOMPTarget::evaluate_ion_e 0.0203 0.0001 1 0.020330647
DTABOMPTarget::offload_ion_e 0.0203 0.0203 1 0.020274098
TwoBodyJastrowRef 0.0574 0.0574 1 0.057359412
Pseudopotential 10.2446 0.0302 5 2.048921322
DeterminantRef::spoval 4.6111 0.1539 10215 0.000451408
Single-Particle Orbitals 4.4573 4.4573 122580 0.000036362
OneBodyJastrowRef 0.0127 0.0127 10215 0.000001242
ParticleSet:::update 5.0823 0.0057 10215 0.000497535
DTABOMPTarget::evaluate_e_virtual 4.6825 0.0015 10215 0.000458397
DTABOMPTarget::offload_e_virtual 4.6811 4.6811 10215 0.000458254
DTABOMPTarget::evaluate_ion_virtual 0.3941 0.0015 10215 0.000038584
DTABOMPTarget::offload_ion_virtual 0.3927 0.3927 10215 0.000038440
TwoBodyJastrowRef 0.5083 0.5083 10215 0.000049758
========== Throughput ============
Total throughput ( N_walkers * N_elec^3 / Total time ) = 1.01334e+10
Diffusion throughput ( N_walkers * N_elec^3 / Diffusion time ) = 1.55194e+10
Pseudopotential throughput ( N_walkers * N_elec^2 / Pseudopotential time ) = 7.36948e+06
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 11081)
* Warning: (host ins01.benchmarkcenter.megware.com, process 11081) Observed more threads (2) than expected (1): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=2.
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 11087)
* Warning: (host ins01.benchmarkcenter.megware.com, process 11087) Observed more threads (2) than expected (1): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=2.
Info: 1/2 lprof instances finished
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_0
To display your profiling results:
########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_0 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_0 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_0 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_0 #
########################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node ins01.benchmarkcenter.megware.com
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 11164)
* Info: "ref-cycles" not supported on ins01.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 11170)miniqmc not built from git repository
number of ranks : 2, number of accelerators : 0
Number of orbitals/splines = 3072
Tile size = 3072
Number of tiles = 1
Number of electrons = 6144
Rmax = 1.7
AcceptanceRatio = 0.5
Iterations = 5
MPI processes = 2
OpenMP threads = 2
Number of walkers per rank = 2
SPO coefficients size = 1572864000 bytes (1500 MB)
delayed update rank = 32
Using the reference implementation for Jastrow,
determinant update, and distance table + einspline of the
reference implementation
==================================
Use --enable-timers= command line option to increase or decrease level of timing information
Stack timer profile
Timer Inclusive_time Exclusive_time Calls Time_per_call
Setup 0.0804 0.0804 1 0.080398397
ParticleSet:::update 0.0000 0.0000 1 0.000014009
Total 47.3440 0.0008 1 47.343981961
Diffusion 31.2085 0.0159 5 6.241696669
Complete Updates 0.2177 0.0000 5 0.043535747
DeterminantRef::update 0.2177 0.2177 10 0.021765949
Current Gradient 0.5891 0.0106 30720 0.000019177
DeterminantRef::ratio 0.5737 0.5737 30720 0.000018676
OneBodyJastrowRef 0.0027 0.0027 30720 0.000000088
TwoBodyJastrowRef 0.0021 0.0021 30720 0.000000067
Kinetic Energy 0.1283 0.1280 5 0.025662500
OneBodyJastrowRef 0.0002 0.0002 5 0.000034327
TwoBodyJastrowRef 0.0001 0.0001 5 0.000020546
New Gradient 11.0928 0.0157 30720 0.000361092
DeterminantRef::ratio 0.0578 0.0578 30720 0.000001883
DeterminantRef::spovgl 10.6616 0.1530 30720 0.000347057
Single-Particle Orbitals 10.5086 10.5086 30720 0.000342076
OneBodyJastrowRef 0.0300 0.0300 30720 0.000000978
TwoBodyJastrowRef 0.3276 0.3276 30720 0.000010663
ParticleSet:::acceptMove 0.4488 0.0052 15371 0.000029199
DTAAOMPTarget::update_e_e 0.4342 0.4342 15371 0.000028248
DTABOMPTarget::update_ion_e 0.0094 0.0094 15371 0.000000613
ParticleSet:::computeNewPosDT 0.4784 0.0068 30720 0.000015571
DTAAOMPTarget::move_e_e 0.4237 0.4237 30720 0.000013794
DTABOMPTarget::move_ion_e 0.0478 0.0478 30720 0.000001555
ParticleSet:::donePbyP 0.0000 0.0000 5 0.000000868
Update 18.2375 0.0084 15371 0.001186490
DeterminantRef::update 17.8652 17.8652 15371 0.001162265
OneBodyJastrowRef 0.0015 0.0015 15371 0.000000098
TwoBodyJastrowRef 0.3624 0.3624 15371 0.000023580
Initialization 5.8026 0.1668 1 5.802601303
DeterminantRef::inverse 3.4015 3.4015 2 1.700745887
DeterminantRef::spovgl 2.0961 0.0315 2 1.048046377
Single-Particle Orbitals 2.0646 2.0646 6144 0.000336038
OneBodyJastrowRef 0.0054 0.0054 1 0.005423816
ParticleSet:::update 0.0755 0.0093 2 0.037735786
DTAAOMPTarget::evaluate_e_e 0.0460 0.0460 1 0.045973792
DTABOMPTarget::evaluate_ion_e 0.0201 0.0001 1 0.020149256
DTABOMPTarget::offload_ion_e 0.0201 0.0201 1 0.020089157
TwoBodyJastrowRef 0.0573 0.0573 1 0.057310913
Pseudopotential 10.3321 0.0314 5 2.066425075
DeterminantRef::spoval 4.6742 0.1545 10215 0.000457586
Single-Particle Orbitals 4.5197 4.5197 122580 0.000036872
OneBodyJastrowRef 0.0128 0.0128 10215 0.000001253
ParticleSet:::update 5.1072 0.0055 10215 0.000499970
DTABOMPTarget::evaluate_e_virtual 4.6984 0.0016 10215 0.000459951
DTABOMPTarget::offload_e_virtual 4.6968 4.6968 10215 0.000459794
DTABOMPTarget::evaluate_ion_virtual 0.4033 0.0013 10215 0.000039478
DTABOMPTarget::offload_ion_virtual 0.4020 0.4020 10215 0.000039352
TwoBodyJastrowRef 0.5065 0.5065 10215 0.000049581
========== Throughput ============
Total throughput ( N_walkers * N_elec^3 / Total time ) = 1.95952e+10
Diffusion throughput ( N_walkers * N_elec^3 / Diffusion time ) = 2.97263e+10
Pseudopotential throughput ( N_walkers * N_elec^2 / Pseudopotential time ) = 1.46141e+07
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 11170)
* Warning: (host ins01.benchmarkcenter.megware.com, process 11170) Observed more threads (3) than expected (2): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=3.
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 11164)
* Warning: (host ins01.benchmarkcenter.megware.com, process 11164) Observed more threads (3) than expected (2): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=3.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_1
To display your profiling results:
########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_1 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_1 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_1 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_1 #
########################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node ins01.benchmarkcenter.megware.com
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 11250)
* Info: "ref-cycles" not supported on ins01.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 11255)miniqmc not built from git repository
number of ranks : 2, number of accelerators : 0
Number of orbitals/splines = 3072
Tile size = 3072
Number of tiles = 1
Number of electrons = 6144
Rmax = 1.7
AcceptanceRatio = 0.5
Iterations = 5
MPI processes = 2
OpenMP threads = 4
Number of walkers per rank = 4
SPO coefficients size = 1572864000 bytes (1500 MB)
delayed update rank = 32
Using the reference implementation for Jastrow,
determinant update, and distance table + einspline of the
reference implementation
==================================
Use --enable-timers= command line option to increase or decrease level of timing information
Stack timer profile
Timer Inclusive_time Exclusive_time Calls Time_per_call
Setup 0.0431 0.0431 1 0.043110393
ParticleSet:::update 0.0000 0.0000 1 0.000006950
Total 48.3651 0.0008 1 48.365076335
Diffusion 31.4730 0.0172 5 6.294608016
Complete Updates 0.2200 0.0000 5 0.043997558
DeterminantRef::update 0.2200 0.2200 10 0.021996746
Current Gradient 0.5895 0.0125 30720 0.000019189
DeterminantRef::ratio 0.5720 0.5720 30720 0.000018619
OneBodyJastrowRef 0.0028 0.0028 30720 0.000000092
TwoBodyJastrowRef 0.0022 0.0022 30720 0.000000072
Kinetic Energy 0.1275 0.1272 5 0.025494411
OneBodyJastrowRef 0.0001 0.0001 5 0.000023961
TwoBodyJastrowRef 0.0001 0.0001 5 0.000021722
New Gradient 11.3098 0.0185 30720 0.000368158
DeterminantRef::ratio 0.0592 0.0592 30720 0.000001927
DeterminantRef::spovgl 10.8725 0.1614 30720 0.000353921
Single-Particle Orbitals 10.7111 10.7111 30720 0.000348668
OneBodyJastrowRef 0.0299 0.0299 30720 0.000000974
TwoBodyJastrowRef 0.3297 0.3297 30720 0.000010734
ParticleSet:::acceptMove 0.4172 0.0047 15371 0.000027145
DTAAOMPTarget::update_e_e 0.4040 0.4040 15371 0.000026286
DTABOMPTarget::update_ion_e 0.0085 0.0085 15371 0.000000555
ParticleSet:::computeNewPosDT 0.4868 0.0063 30720 0.000015846
DTAAOMPTarget::move_e_e 0.4289 0.4289 30720 0.000013962
DTABOMPTarget::move_ion_e 0.0516 0.0516 30720 0.000001679
ParticleSet:::donePbyP 0.0000 0.0000 5 0.000001024
Update 18.3051 0.0089 15371 0.001190883
DeterminantRef::update 17.9324 17.9324 15371 0.001166637
OneBodyJastrowRef 0.0016 0.0016 15371 0.000000103
TwoBodyJastrowRef 0.3623 0.3623 15371 0.000023568
Initialization 6.0218 0.1735 1 6.021833767
DeterminantRef::inverse 3.4248 3.4248 2 1.712401374
DeterminantRef::spovgl 2.2822 0.0905 2 1.141117102
Single-Particle Orbitals 2.1918 2.1918 6144 0.000356734
OneBodyJastrowRef 0.0054 0.0054 1 0.005425576
ParticleSet:::update 0.0776 0.0114 2 0.038816313
DTAAOMPTarget::evaluate_e_e 0.0459 0.0459 1 0.045872135
DTABOMPTarget::evaluate_ion_e 0.0203 0.0001 1 0.020310562
DTABOMPTarget::offload_ion_e 0.0202 0.0202 1 0.020160795
TwoBodyJastrowRef 0.0582 0.0582 1 0.058220775
Pseudopotential 10.8694 0.0305 5 2.173876791
DeterminantRef::spoval 5.1669 0.1549 10215 0.000505813
Single-Particle Orbitals 5.0120 5.0120 122580 0.000040887
OneBodyJastrowRef 0.0127 0.0127 10215 0.000001244
ParticleSet:::update 5.1486 0.0059 10215 0.000504023
DTABOMPTarget::evaluate_e_virtual 4.7470 0.0015 10215 0.000464708
DTABOMPTarget::offload_e_virtual 4.7454 4.7454 10215 0.000464557
DTABOMPTarget::evaluate_ion_virtual 0.3958 0.0015 10215 0.000038742
DTABOMPTarget::offload_ion_virtual 0.3943 0.3943 10215 0.000038598
TwoBodyJastrowRef 0.5107 0.5107 10215 0.000049999
========== Throughput ============
Total throughput ( N_walkers * N_elec^3 / Total time ) = 3.83629e+10
Diffusion throughput ( N_walkers * N_elec^3 / Diffusion time ) = 5.89529e+10
Pseudopotential throughput ( N_walkers * N_elec^2 / Pseudopotential time ) = 2.77835e+07
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 11255)
* Warning: (host ins01.benchmarkcenter.megware.com, process 11255) Observed more threads (5) than expected (4): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=5.
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 11250)
* Warning: (host ins01.benchmarkcenter.megware.com, process 11250) Observed more threads (5) than expected (4): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=5.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_2
To display your profiling results:
########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_2 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_2 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_2 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_2 #
########################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node ins01.benchmarkcenter.megware.com
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 11343)
* Info: "ref-cycles" not supported on ins01.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 11348)miniqmc not built from git repository
number of ranks : 2, number of accelerators : 0
Number of orbitals/splines = 3072
Tile size = 3072
Number of tiles = 1
Number of electrons = 6144
Rmax = 1.7
AcceptanceRatio = 0.5
Iterations = 5
MPI processes = 2
OpenMP threads = 8
Number of walkers per rank = 8
SPO coefficients size = 1572864000 bytes (1500 MB)
delayed update rank = 32
Using the reference implementation for Jastrow,
determinant update, and distance table + einspline of the
reference implementation
==================================
Use --enable-timers= command line option to increase or decrease level of timing information
Stack timer profile
Timer Inclusive_time Exclusive_time Calls Time_per_call
Setup 0.0271 0.0271 1 0.027126309
ParticleSet:::update 0.0000 0.0000 1 0.000006650
Total 49.2135 3.7971 1 49.213472019
Diffusion 27.8492 0.0201 5 5.569837532
Complete Updates 0.2140 0.0000 5 0.042793557
DeterminantRef::update 0.2140 0.2140 10 0.021395153
Current Gradient 0.5832 0.0159 30720 0.000018983
DeterminantRef::ratio 0.5629 0.5629 30720 0.000018325
OneBodyJastrowRef 0.0024 0.0024 30720 0.000000077
TwoBodyJastrowRef 0.0019 0.0019 30720 0.000000063
Kinetic Energy 0.1251 0.1249 5 0.025024922
OneBodyJastrowRef 0.0001 0.0001 5 0.000017318
TwoBodyJastrowRef 0.0001 0.0001 5 0.000021623
New Gradient 7.7493 0.0238 30720 0.000252257
DeterminantRef::ratio 0.0580 0.0580 30720 0.000001887
DeterminantRef::spovgl 7.3109 0.1264 30720 0.000237984
Single-Particle Orbitals 7.1845 7.1845 30720 0.000233869
OneBodyJastrowRef 0.0272 0.0272 30720 0.000000887
TwoBodyJastrowRef 0.3295 0.3295 30720 0.000010725
ParticleSet:::acceptMove 0.4210 0.0050 15371 0.000027388
DTAAOMPTarget::update_e_e 0.4077 0.4077 15371 0.000026522
DTABOMPTarget::update_ion_e 0.0083 0.0083 15371 0.000000539
ParticleSet:::computeNewPosDT 0.4924 0.0065 30720 0.000016030
DTAAOMPTarget::move_e_e 0.4403 0.4403 30720 0.000014334
DTABOMPTarget::move_ion_e 0.0456 0.0456 30720 0.000001483
ParticleSet:::donePbyP 0.0000 0.0000 5 0.000000934
Update 18.2441 0.0095 15371 0.001186916
DeterminantRef::update 17.8737 17.8737 15371 0.001162818
OneBodyJastrowRef 0.0015 0.0015 15371 0.000000095
TwoBodyJastrowRef 0.3594 0.3594 15371 0.000023384
Initialization 6.0059 0.9717 1 6.005892711
DeterminantRef::inverse 3.4059 3.4059 2 1.702952448
DeterminantRef::spovgl 1.4877 0.0636 2 0.743826837
Single-Particle Orbitals 1.4241 1.4241 6144 0.000231782
OneBodyJastrowRef 0.0054 0.0054 1 0.005399856
ParticleSet:::update 0.0771 0.0099 2 0.038574248
DTAAOMPTarget::evaluate_e_e 0.0461 0.0461 1 0.046088010
DTABOMPTarget::evaluate_ion_e 0.0211 0.0001 1 0.021114266
DTABOMPTarget::offload_ion_e 0.0210 0.0210 1 0.021045217
TwoBodyJastrowRef 0.0581 0.0581 1 0.058074208
Pseudopotential 11.5613 0.0288 5 2.312258595
DeterminantRef::spoval 5.9343 0.1587 10215 0.000580940
Single-Particle Orbitals 5.7756 5.7756 122580 0.000047117
OneBodyJastrowRef 0.0126 0.0126 10215 0.000001235
ParticleSet:::update 5.0762 0.0057 10215 0.000496936
DTABOMPTarget::evaluate_e_virtual 4.6715 0.0017 10215 0.000457320
DTABOMPTarget::offload_e_virtual 4.6698 4.6698 10215 0.000457151
DTABOMPTarget::evaluate_ion_virtual 0.3990 0.0019 10215 0.000039061
DTABOMPTarget::offload_ion_virtual 0.3971 0.3971 10215 0.000038874
TwoBodyJastrowRef 0.5094 0.5094 10215 0.000049863
========== Throughput ============
Total throughput ( N_walkers * N_elec^3 / Total time ) = 7.54032e+10
Diffusion throughput ( N_walkers * N_elec^3 / Diffusion time ) = 1.33248e+11
Pseudopotential throughput ( N_walkers * N_elec^2 / Pseudopotential time ) = 5.22415e+07
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 11348)
* Warning: (host ins01.benchmarkcenter.megware.com, process 11348) Observed more threads (9) than expected (8): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=9.
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 11343)
* Warning: (host ins01.benchmarkcenter.megware.com, process 11343) Observed more threads (9) than expected (8): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=9.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_3
To display your profiling results:
########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_3 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_3 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_3 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_3 #
########################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node ins01.benchmarkcenter.megware.com
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 11450)
* Info: "ref-cycles" not supported on ins01.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 11455)miniqmc not built from git repository
number of ranks : 2, number of accelerators : 0
Number of orbitals/splines = 3072
Tile size = 3072
Number of tiles = 1
Number of electrons = 6144
Rmax = 1.7
AcceptanceRatio = 0.5
Iterations = 5
MPI processes = 2
OpenMP threads = 16
Number of walkers per rank = 16
SPO coefficients size = 1572864000 bytes (1500 MB)
delayed update rank = 32
Using the reference implementation for Jastrow,
determinant update, and distance table + einspline of the
reference implementation
==================================
Use --enable-timers= command line option to increase or decrease level of timing information
Stack timer profile
Timer Inclusive_time Exclusive_time Calls Time_per_call
Setup 0.0360 0.0360 1 0.035961313
ParticleSet:::update 0.0000 0.0000 1 0.000003170
Total 64.4561 0.0012 1 64.456058504
Diffusion 45.3213 0.0212 5 9.064257253
Complete Updates 0.3894 0.0000 5 0.077874635
DeterminantRef::update 0.3894 0.3894 10 0.038935324
Current Gradient 0.6546 0.0171 30720 0.000021309
DeterminantRef::ratio 0.6324 0.6324 30720 0.000020585
OneBodyJastrowRef 0.0029 0.0029 30720 0.000000094
TwoBodyJastrowRef 0.0022 0.0022 30720 0.000000072
Kinetic Energy 0.1962 0.1959 5 0.039242888
OneBodyJastrowRef 0.0001 0.0001 5 0.000025721
TwoBodyJastrowRef 0.0002 0.0002 5 0.000031894
New Gradient 17.1338 0.0273 30720 0.000557741
DeterminantRef::ratio 0.0623 0.0623 30720 0.000002029
DeterminantRef::spovgl 16.5998 0.2542 30720 0.000540359
Single-Particle Orbitals 16.3456 16.3456 30720 0.000532083
OneBodyJastrowRef 0.0525 0.0525 30720 0.000001708
TwoBodyJastrowRef 0.3919 0.3919 30720 0.000012758
ParticleSet:::acceptMove 0.6478 0.0067 15371 0.000042147
DTAAOMPTarget::update_e_e 0.6281 0.6281 15371 0.000040863
DTABOMPTarget::update_ion_e 0.0130 0.0130 15371 0.000000845
ParticleSet:::computeNewPosDT 0.5467 0.0079 30720 0.000017798
DTAAOMPTarget::move_e_e 0.4802 0.4802 30720 0.000015631
DTABOMPTarget::move_ion_e 0.0587 0.0587 30720 0.000001911
ParticleSet:::donePbyP 0.0000 0.0000 5 0.000001260
Update 25.7315 0.0115 15371 0.001674027
DeterminantRef::update 25.2489 25.2489 15371 0.001642631
OneBodyJastrowRef 0.0017 0.0017 15371 0.000000108
TwoBodyJastrowRef 0.4694 0.4694 15371 0.000030538
Initialization 7.2625 0.3245 1 7.262494668
DeterminantRef::inverse 3.5178 3.5178 2 1.758911836
DeterminantRef::spovgl 3.2662 0.0340 2 1.633097948
Single-Particle Orbitals 3.2322 3.2322 6144 0.000526078
OneBodyJastrowRef 0.0055 0.0055 1 0.005484015
ParticleSet:::update 0.0903 0.0139 2 0.045141225
DTAAOMPTarget::evaluate_e_e 0.0557 0.0557 1 0.055711248
DTABOMPTarget::evaluate_ion_e 0.0207 0.0001 1 0.020683845
DTABOMPTarget::offload_ion_e 0.0206 0.0206 1 0.020625696
TwoBodyJastrowRef 0.0582 0.0582 1 0.058227025
Pseudopotential 11.8711 0.0335 5 2.374210953
DeterminantRef::spoval 6.1868 0.1783 10215 0.000605656
Single-Particle Orbitals 6.0085 6.0085 122580 0.000049017
OneBodyJastrowRef 0.0165 0.0165 10215 0.000001612
ParticleSet:::update 5.1113 0.0082 10215 0.000500372
DTABOMPTarget::evaluate_e_virtual 4.6761 0.0021 10215 0.000457770
DTABOMPTarget::offload_e_virtual 4.6740 4.6740 10215 0.000457566
DTABOMPTarget::evaluate_ion_virtual 0.4269 0.0027 10215 0.000041796
DTABOMPTarget::offload_ion_virtual 0.4242 0.4242 10215 0.000041528
TwoBodyJastrowRef 0.5230 0.5230 10215 0.000051202
========== Throughput ============
Total throughput ( N_walkers * N_elec^3 / Total time ) = 1.15144e+11
Diffusion throughput ( N_walkers * N_elec^3 / Diffusion time ) = 1.63758e+11
Pseudopotential throughput ( N_walkers * N_elec^2 / Pseudopotential time ) = 1.01757e+08
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 11455)
* Warning: (host ins01.benchmarkcenter.megware.com, process 11455) Observed more threads (17) than expected (16): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=17.
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 11450)
* Warning: (host ins01.benchmarkcenter.megware.com, process 11450) Observed more threads (17) than expected (16): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=17.
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_4
To display your profiling results:
########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_4 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_4 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_4 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_4 #
########################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node ins01.benchmarkcenter.megware.com
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 11590)
* Info: "ref-cycles" not supported on ins01.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 11595)miniqmc not built from git repository
number of ranks : 2, number of accelerators : 0
Number of orbitals/splines = 3072
Tile size = 3072
Number of tiles = 1
Number of electrons = 6144
Rmax = 1.7
AcceptanceRatio = 0.5
Iterations = 5
MPI processes = 2
OpenMP threads = 32
Number of walkers per rank = 32
SPO coefficients size = 1572864000 bytes (1500 MB)
delayed update rank = 32
Using the reference implementation for Jastrow,
determinant update, and distance table + einspline of the
reference implementation
==================================
Use --enable-timers= command line option to increase or decrease level of timing information
Stack timer profile
Timer Inclusive_time Exclusive_time Calls Time_per_call
Setup 0.0448 0.0448 1 0.044788567
ParticleSet:::update 0.0000 0.0000 1 0.000003940
Total 75.4251 2.2082 1 75.425137205
Diffusion 45.7184 0.0211 5 9.143687038
Complete Updates 0.8356 0.0000 5 0.167125090
DeterminantRef::update 0.8356 0.8356 10 0.083559517
Current Gradient 0.7484 0.0223 30720 0.000024361
DeterminantRef::ratio 0.7211 0.7211 30720 0.000023475
OneBodyJastrowRef 0.0027 0.0027 30720 0.000000088
TwoBodyJastrowRef 0.0023 0.0023 30720 0.000000074
Kinetic Energy 0.4400 0.4393 5 0.087998013
OneBodyJastrowRef 0.0003 0.0003 5 0.000053881
TwoBodyJastrowRef 0.0004 0.0004 5 0.000082724
New Gradient 9.7853 0.0262 30720 0.000318533
DeterminantRef::ratio 0.0614 0.0614 30720 0.000001998
DeterminantRef::spovgl 9.2763 0.1947 30720 0.000301963
Single-Particle Orbitals 9.0816 9.0816 30720 0.000295626
OneBodyJastrowRef 0.0391 0.0391 30720 0.000001272
TwoBodyJastrowRef 0.3824 0.3824 30720 0.000012447
ParticleSet:::acceptMove 0.8161 0.0104 15371 0.000053093
DTAAOMPTarget::update_e_e 0.7918 0.7918 15371 0.000051515
DTABOMPTarget::update_ion_e 0.0139 0.0139 15371 0.000000903
ParticleSet:::computeNewPosDT 0.5169 0.0082 30720 0.000016826
DTAAOMPTarget::move_e_e 0.4478 0.4478 30720 0.000014577
DTABOMPTarget::move_ion_e 0.0609 0.0609 30720 0.000001983
ParticleSet:::donePbyP 0.0000 0.0000 5 0.000001846
Update 32.5550 0.0139 15371 0.002117949
DeterminantRef::update 32.0682 32.0682 15371 0.002086280
OneBodyJastrowRef 0.0017 0.0017 15371 0.000000108
TwoBodyJastrowRef 0.4712 0.4712 15371 0.000030657
Initialization 6.4961 0.8001 1 6.496081536
DeterminantRef::inverse 3.6053 3.6053 2 1.802648391
DeterminantRef::spovgl 1.8852 0.0504 2 0.942598839
Single-Particle Orbitals 1.8348 1.8348 6144 0.000298631
OneBodyJastrowRef 0.0052 0.0052 1 0.005188051
ParticleSet:::update 0.1429 0.0266 2 0.071438716
DTAAOMPTarget::evaluate_e_e 0.0939 0.0939 1 0.093864344
DTABOMPTarget::evaluate_ion_e 0.0224 0.0008 1 0.022363419
DTABOMPTarget::offload_ion_e 0.0215 0.0215 1 0.021514408
TwoBodyJastrowRef 0.0574 0.0574 1 0.057410492
Pseudopotential 21.0024 0.0805 5 4.200488367
DeterminantRef::spoval 14.6321 0.3429 10215 0.001432411
Single-Particle Orbitals 14.2892 14.2892 122580 0.000116570
OneBodyJastrowRef 0.0386 0.0386 10215 0.000003780
ParticleSet:::update 5.5171 0.0171 10215 0.000540095
DTABOMPTarget::evaluate_e_virtual 4.9747 0.0077 10215 0.000487003
DTABOMPTarget::offload_e_virtual 4.9671 4.9671 10215 0.000486252
DTABOMPTarget::evaluate_ion_virtual 0.5253 0.0050 10215 0.000051421
DTABOMPTarget::offload_ion_virtual 0.5203 0.5203 10215 0.000050934
TwoBodyJastrowRef 0.7341 0.7341 10215 0.000071869
========== Throughput ============
Total throughput ( N_walkers * N_elec^3 / Total time ) = 1.96797e+11
Diffusion throughput ( N_walkers * N_elec^3 / Diffusion time ) = 3.2467e+11
Pseudopotential throughput ( N_walkers * N_elec^2 / Pseudopotential time ) = 1.1503e+08
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 11590)
* Warning: (host ins01.benchmarkcenter.megware.com, process 11590) Observed more threads (33) than expected (32): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=33.
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 11595)
* Warning: (host ins01.benchmarkcenter.megware.com, process 11595) Observed more threads (33) than expected (32): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=33.
Info: 1/2 lprof instances finished
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_5
To display your profiling results:
########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_5 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_5 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_5 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_5 #
########################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node ins01.benchmarkcenter.megware.com
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 11799)
* Info: "ref-cycles" not supported on ins01.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 11804)miniqmc not built from git repository
number of ranks : 2, number of accelerators : 0
Number of orbitals/splines = 3072
Tile size = 3072
Number of tiles = 1
Number of electrons = 6144
Rmax = 1.7
AcceptanceRatio = 0.5
Iterations = 5
MPI processes = 2
OpenMP threads = 64
Number of walkers per rank = 64
SPO coefficients size = 1572864000 bytes (1500 MB)
delayed update rank = 32
Using the reference implementation for Jastrow,
determinant update, and distance table + einspline of the
reference implementation
==================================
Use --enable-timers= command line option to increase or decrease level of timing information
Stack timer profile
Timer Inclusive_time Exclusive_time Calls Time_per_call
Setup 0.0702 0.0702 1 0.070203591
ParticleSet:::update 0.0000 0.0000 1 0.000002950
Total 142.2907 8.4823 1 142.290663146
Diffusion 89.5160 0.0342 5 17.903201130
Complete Updates 1.7376 0.0001 5 0.347515113
DeterminantRef::update 1.7375 1.7375 10 0.173747957
Current Gradient 1.2402 0.0331 30720 0.000040370
DeterminantRef::ratio 1.1981 1.1981 30720 0.000039001
OneBodyJastrowRef 0.0054 0.0054 30720 0.000000177
TwoBodyJastrowRef 0.0035 0.0035 30720 0.000000113
Kinetic Energy 0.9292 0.9280 5 0.185831850
OneBodyJastrowRef 0.0006 0.0006 5 0.000121054
TwoBodyJastrowRef 0.0006 0.0006 5 0.000110644
New Gradient 13.2633 0.0434 30720 0.000431747
DeterminantRef::ratio 0.0877 0.0877 30720 0.000002853
DeterminantRef::spovgl 12.4020 0.4248 30720 0.000403711
Single-Particle Orbitals 11.9772 11.9772 30720 0.000389883
OneBodyJastrowRef 0.0842 0.0842 30720 0.000002742
TwoBodyJastrowRef 0.6460 0.6460 30720 0.000021027
ParticleSet:::acceptMove 1.7830 0.0154 15371 0.000115997
DTAAOMPTarget::update_e_e 1.7404 1.7404 15371 0.000113227
DTABOMPTarget::update_ion_e 0.0272 0.0272 15371 0.000001767
ParticleSet:::computeNewPosDT 0.8172 0.0120 30720 0.000026601
DTAAOMPTarget::move_e_e 0.6894 0.6894 30720 0.000022442
DTABOMPTarget::move_ion_e 0.1158 0.1158 30720 0.000003769
ParticleSet:::donePbyP 0.0000 0.0000 5 0.000004698
Update 69.7115 0.0212 15371 0.004535261
DeterminantRef::update 68.8802 68.8802 15371 0.004481178
OneBodyJastrowRef 0.0028 0.0028 15371 0.000000180
TwoBodyJastrowRef 0.8074 0.8074 15371 0.000052526
Initialization 8.8999 1.7660 1 8.899911474
DeterminantRef::inverse 4.0682 4.0682 2 2.034123504
DeterminantRef::spovgl 2.8451 0.1307 2 1.422561583
Single-Particle Orbitals 2.7144 2.7144 6144 0.000441803
OneBodyJastrowRef 0.0057 0.0057 1 0.005746888
ParticleSet:::update 0.1548 0.0536 2 0.077410615
DTAAOMPTarget::evaluate_e_e 0.0753 0.0753 1 0.075279792
DTABOMPTarget::evaluate_ion_e 0.0260 0.0034 1 0.025978575
DTABOMPTarget::offload_ion_e 0.0225 0.0225 1 0.022543594
TwoBodyJastrowRef 0.0600 0.0600 1 0.059958557
Pseudopotential 35.3924 0.1757 5 7.078479803
DeterminantRef::spoval 26.4103 0.9399 10215 0.002585446
Single-Particle Orbitals 25.4704 25.4704 122580 0.000207786
OneBodyJastrowRef 0.1057 0.1057 10215 0.000010352
ParticleSet:::update 7.2547 0.0346 10215 0.000710199
DTABOMPTarget::evaluate_e_virtual 6.4741 0.0121 10215 0.000633780
DTABOMPTarget::offload_e_virtual 6.4619 6.4619 10215 0.000632592
DTABOMPTarget::evaluate_ion_virtual 0.7460 0.0118 10215 0.000073034
DTABOMPTarget::offload_ion_virtual 0.7342 0.7342 10215 0.000071878
TwoBodyJastrowRef 1.4460 1.4460 10215 0.000141554
========== Throughput ============
Total throughput ( N_walkers * N_elec^3 / Total time ) = 2.08635e+11
Diffusion throughput ( N_walkers * N_elec^3 / Diffusion time ) = 3.31637e+11
Pseudopotential throughput ( N_walkers * N_elec^2 / Pseudopotential time ) = 1.36522e+08
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 11804)
* Warning: (host ins01.benchmarkcenter.megware.com, process 11804) Observed more threads (65) than expected (64): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=65.
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 11799)
* Warning: (host ins01.benchmarkcenter.megware.com, process 11799) Observed more threads (65) than expected (64): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=65.
Info: 1/2 lprof instances finished
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_6
To display your profiling results:
########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_6 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_6 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_6 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_6 #
########################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node ins01.benchmarkcenter.megware.com
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 12137)
* Info: "ref-cycles" not supported on ins01.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host ins01.benchmarkcenter.megware.com, process 12142)miniqmc not built from git repository
number of ranks : 2, number of accelerators : 0
Number of orbitals/splines = 3072
Tile size = 3072
Number of tiles = 1
Number of electrons = 6144
Rmax = 1.7
AcceptanceRatio = 0.5
Iterations = 5
MPI processes = 2
OpenMP threads = 96
Number of walkers per rank = 96
SPO coefficients size = 1572864000 bytes (1500 MB)
delayed update rank = 32
Using the reference implementation for Jastrow,
determinant update, and distance table + einspline of the
reference implementation
==================================
Use --enable-timers= command line option to increase or decrease level of timing information
Stack timer profile
Timer Inclusive_time Exclusive_time Calls Time_per_call
Setup 0.0978 0.0978 1 0.097807028
ParticleSet:::update 0.0000 0.0000 1 0.000002660
Total 205.3336 14.2854 1 205.333607008
Diffusion 122.2440 0.0495 5 24.448809589
Complete Updates 2.3109 0.0001 5 0.462189917
DeterminantRef::update 2.3108 2.3108 10 0.231081896
Current Gradient 1.8757 0.0451 30720 0.000061058
DeterminantRef::ratio 1.8162 1.8162 30720 0.000059122
OneBodyJastrowRef 0.0091 0.0091 30720 0.000000295
TwoBodyJastrowRef 0.0053 0.0053 30720 0.000000172
Kinetic Energy 1.2691 1.2674 5 0.253814402
OneBodyJastrowRef 0.0008 0.0008 5 0.000157539
TwoBodyJastrowRef 0.0009 0.0009 5 0.000181997
New Gradient 13.8494 0.0640 30720 0.000450828
DeterminantRef::ratio 0.1064 0.1064 30720 0.000003463
DeterminantRef::spovgl 12.6547 0.6051 30720 0.000411938
Single-Particle Orbitals 12.0497 12.0497 30720 0.000392242
OneBodyJastrowRef 0.1448 0.1448 30720 0.000004712
TwoBodyJastrowRef 0.8795 0.8795 30720 0.000028630
ParticleSet:::acceptMove 2.8177 0.0280 15371 0.000183310
DTAAOMPTarget::update_e_e 2.7469 2.7469 15371 0.000178705
DTABOMPTarget::update_ion_e 0.0428 0.0428 15371 0.000002783
ParticleSet:::computeNewPosDT 1.2237 0.0243 30720 0.000039834
DTAAOMPTarget::move_e_e 1.0175 1.0175 30720 0.000033120
DTABOMPTarget::move_ion_e 0.1819 0.1819 30720 0.000005922
ParticleSet:::donePbyP 0.0000 0.0000 5 0.000008095
Update 98.8480 0.0341 15371 0.006430810
DeterminantRef::update 97.6251 97.6251 15371 0.006351251
OneBodyJastrowRef 0.0044 0.0044 15371 0.000000290
TwoBodyJastrowRef 1.1843 1.1843 15371 0.000077049
Initialization 11.7637 2.0175 1 11.763665716
DeterminantRef::inverse 6.1008 6.1008 2 3.050411111
DeterminantRef::spovgl 3.0406 0.1366 2 1.520295783
Single-Particle Orbitals 2.9040 2.9040 6144 0.000472651
OneBodyJastrowRef 0.0051 0.0051 1 0.005051830
ParticleSet:::update 0.5426 0.1209 2 0.271303232
DTAAOMPTarget::evaluate_e_e 0.2755 0.2755 1 0.275480958
DTABOMPTarget::evaluate_ion_e 0.1462 0.0815 1 0.146208536
DTABOMPTarget::offload_ion_e 0.0647 0.0647 1 0.064733184
TwoBodyJastrowRef 0.0571 0.0571 1 0.057094746
Pseudopotential 57.0405 0.3019 5 11.408102723
DeterminantRef::spoval 44.0140 1.7631 10215 0.004308763
Single-Particle Orbitals 42.2510 42.2510 122580 0.000344681
OneBodyJastrowRef 0.1804 0.1804 10215 0.000017663
ParticleSet:::update 10.1532 0.0715 10215 0.000993945
DTABOMPTarget::evaluate_e_virtual 9.0146 0.0244 10215 0.000882483
DTABOMPTarget::offload_e_virtual 8.9902 8.9902 10215 0.000880094
DTABOMPTarget::evaluate_ion_virtual 1.0671 0.0201 10215 0.000104467
DTABOMPTarget::offload_ion_virtual 1.0470 1.0470 10215 0.000102495
TwoBodyJastrowRef 2.3910 2.3910 10215 0.000234072
========== Throughput ============
Total throughput ( N_walkers * N_elec^3 / Total time ) = 2.16868e+11
Diffusion throughput ( N_walkers * N_elec^3 / Diffusion time ) = 3.64273e+11
Pseudopotential throughput ( N_walkers * N_elec^2 / Pseudopotential time ) = 1.27063e+08
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 12142)
* Warning: (host ins01.benchmarkcenter.megware.com, process 12142) Observed more threads (97) than expected (96): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=97.
* Info: Process finished (host ins01.benchmarkcenter.megware.com, process 12137)
* Warning: (host ins01.benchmarkcenter.megware.com, process 12137) Observed more threads (97) than expected (96): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=97.
Info: 1/2 lprof instances finished
Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_7
To display your profiling results:
########################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
########################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_7 #
# Functions | Per-node | maqao lprof -df -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_7 #
# Functions | Per-process | maqao lprof -df -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_7 #
# Functions | Per-thread | maqao lprof -df -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Cluster-wide | maqao lprof -dl xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Per-node | maqao lprof -dl -dn xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Per-process | maqao lprof -dl -dp xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/beegfs/hackathon/users/eoseret/qaas_runs/miniqmc/intel/miniqmc/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_7 #
########################################################################################################################################################################################################