| | | | | | | requested parallelism | walltime sum (s) | nb instances | any sync average per thread time (s) | any wait average per thread time (s) | parallelism overhead (%) | local speedup if perfectly balanced | global speedup if perfectly balanced |
start addr | function name | source location | level | ancestor thread num | invoker | parallel or teams | run_0 | run_1 | run_2 | run_3 | run_4 | run_5 | run_6 | run_0 | run_1 | run_2 | run_3 | run_4 | run_5 | run_6 | run_0 | run_1 | run_2 | run_3 | run_4 | run_5 | run_6 | run_0 | run_1 | run_2 | run_3 | run_4 | run_5 | run_6 | run_0 | run_1 | run_2 | run_3 | run_4 | run_5 | run_6 | run_0 | run_1 | run_2 | run_3 | run_4 | run_5 | run_6 | run_0 | run_1 | run_2 | run_3 | run_4 | run_5 | run_6 | run_0 | run_1 | run_2 | run_3 | run_4 | run_5 | run_6 |
libqmckl.so.0.0.0:0x3f687 | qmckl_compute_mo_basis_mo_vgl_hpc | qmckl_mo.c:1246 | 0 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 26 | 52 | 32.752 | 14.776 | 7.805 | 3.875 | 1.995 | 1.234 | 0.670 | 4.00 | 4.00 | 4.00 | 4.00 | 4.00 | 4.00 | 4.00 | 19.2 E-6 | 15.3 E-3 | 0.245 | 0.134 | 0.112 | 76.4 E-3 | 47.7 E-3 | 7.35 E-6 | 15.2 E-3 | 0.245 | 0.134 | 0.112 | 76.4 E-3 | 47.7 E-3 | 0.00 | 0.10 | 3.13 | 3.47 | 5.61 | 6.19 | 7.12 | 1.000 | 1.001 | 1.032 | 1.036 | 1.059 | 1.066 | 1.077 | 1.000 | 1.001 | 1.017 | 1.016 | 1.021 | 1.018 | 1.014 |
libqmckl.so.0.0.0:0x3a323 | qmckl_compute_mo_basis_mo_value_hpc | qmckl_mo.c:931 | 0 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 26 | 52 | 14.965 | 5.639 | 2.883 | 1.457 | 0.797 | 0.516 | 0.287 | 3.00 | 3.00 | 3.00 | 3.00 | 3.00 | 3.00 | 3.00 | 5.53 E-6 | 10.7 E-3 | 87.1 E-3 | 0.144 | 56.1 E-3 | 38.6 E-3 | 9.50 E-3 | 1.14 E-6 | 10.7 E-3 | 87.1 E-3 | 0.144 | 56.1 E-3 | 38.6 E-3 | 9.49 E-3 | 0.00 | 0.19 | 3.02 | 9.87 | 7.04 | 7.47 | 3.31 | 1.000 | 1.002 | 1.031 | 1.109 | 1.076 | 1.081 | 1.034 | 1.000 | 1.000 | 1.006 | 1.018 | 1.010 | 1.009 | 1.003 |
libqmckl.so.0.0.0:0x15132 | qmckl_compute_ao_vgl_hpc_gaussian | qmckl_ao.c:3279 | 0 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 26 | 52 | 2.949 | 1.501 | 0.947 | 0.502 | 0.290 | 0.187 | 0.184 | 4.00 | 4.00 | 4.00 | 4.00 | 4.00 | 4.00 | 4.00 | 13.3 E-6 | 1.68 E-3 | 0.134 | 76.8 E-3 | 75.9 E-3 | 52.0 E-3 | 42.6 E-3 | 1.25 E-6 | 1.66 E-3 | 0.134 | 76.8 E-3 | 75.9 E-3 | 51.9 E-3 | 42.6 E-3 | 0.00 | 0.11 | 14.1 | 15.3 | 26.2 | 27.8 | 23.1 | 1.000 | 1.001 | 1.165 | 1.181 | 1.354 | 1.386 | 1.301 | 1.000 | 1.000 | 1.009 | 1.009 | 1.014 | 1.012 | 1.012 |
libqmckl.so.0.0.0:0x1c8cc | qmckl_compute_ao_value_hpc_gaussian | qmckl_ao.c:2781 | 0 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 26 | 52 | 1.068 | 0.536 | 0.282 | 0.140 | 74.1 E-3 | 43.9 E-3 | 34.9 E-3 | 3.00 | 3.00 | 3.00 | 3.00 | 3.00 | 3.00 | 3.00 | 4.43 E-6 | 21.1 E-6 | 28.0 E-6 | 31.9 E-6 | 34.2 E-6 | 63.7 E-6 | 85.5 E-6 | 716 E-9 | 18.4 E-6 | 23.9 E-6 | 29.8 E-6 | 32.0 E-6 | 61.3 E-6 | 81.6 E-6 | 0.00 | 0.00 | 0.01 | 0.02 | 0.05 | 0.15 | 0.24 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.001 | 1.002 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |