| | | | | | | requested parallelism | walltime sum (s) | nb instances | any sync average per thread time (s) | any wait average per thread time (s) | parallelism overhead (%) | local speedup if perfectly balanced | global speedup if perfectly balanced |
start addr | function name | source location | level | ancestor thread num | invoker | parallel or teams | m1o1 | m1o2 | m1o4 | m1o8 | m1o16 | m1o26 | m1o52 | m1o1 | m1o2 | m1o4 | m1o8 | m1o16 | m1o26 | m1o52 | m1o1 | m1o2 | m1o4 | m1o8 | m1o16 | m1o26 | m1o52 | m1o1 | m1o2 | m1o4 | m1o8 | m1o16 | m1o26 | m1o52 | m1o1 | m1o2 | m1o4 | m1o8 | m1o16 | m1o26 | m1o52 | m1o1 | m1o2 | m1o4 | m1o8 | m1o16 | m1o26 | m1o52 | m1o1 | m1o2 | m1o4 | m1o8 | m1o16 | m1o26 | m1o52 | m1o1 | m1o2 | m1o4 | m1o8 | m1o16 | m1o26 | m1o52 |
libqmckl.so.0.0.0:0x1c117 | qmckl_compute_mo_basis_mo_vgl_hpc | qmckl_mo.c:1246 | 0 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 26 | 52 | 11.715 | 5.611 | 2.952 | 1.466 | 0.731 | 0.445 | 0.263 | 16.0 | 16.0 | 16.0 | 16.0 | 16.0 | 16.0 | 16.0 | 12.0 E-6 | 4.26 E-3 | 0.114 | 39.5 E-3 | 18.9 E-3 | 7.50 E-3 | 24.3 E-3 | 2.26 E-6 | 4.24 E-3 | 0.114 | 39.5 E-3 | 18.9 E-3 | 7.49 E-3 | 24.3 E-3 | 0.00 | 0.08 | 3.87 | 2.69 | 2.59 | 1.69 | 9.24 | 1.000 | 1.001 | 1.040 | 1.028 | 1.027 | 1.017 | 1.102 | 1.000 | 1.000 | 1.022 | 1.015 | 1.012 | 1.007 | 1.028 |
libqmckl.so.0.0.0:0x16db3 | qmckl_compute_mo_basis_mo_value_hpc | qmckl_mo.c:931 | 0 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 26 | 52 | 5.183 | 2.266 | 1.189 | 0.586 | 0.291 | 0.177 | 98.7 E-3 | 15.0 | 15.0 | 15.0 | 15.0 | 15.0 | 15.0 | 15.0 | 9.71 E-6 | 4.81 E-3 | 53.8 E-3 | 16.0 E-3 | 6.86 E-3 | 3.84 E-3 | 4.04 E-3 | 1.16 E-6 | 4.80 E-3 | 53.8 E-3 | 16.0 E-3 | 6.86 E-3 | 3.83 E-3 | 4.04 E-3 | 0.00 | 0.21 | 4.52 | 2.73 | 2.36 | 2.17 | 4.09 | 1.000 | 1.002 | 1.047 | 1.028 | 1.024 | 1.022 | 1.043 | 1.000 | 1.000 | 1.010 | 1.006 | 1.004 | 1.004 | 1.005 |
libqmckl.so.0.0.0:0x29d83 | qmckl_compute_ao_vgl_hpc_gaussian | qmckl_ao.c:3283 | 0 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 26 | 52 | 1.906 | 0.962 | 0.499 | 0.260 | 0.139 | 98.2 E-3 | 0.109 | 16.0 | 16.0 | 16.0 | 16.0 | 16.0 | 16.0 | 16.0 | 21.4 E-6 | 3.05 E-3 | 15.6 E-3 | 13.6 E-3 | 10.8 E-3 | 5.60 E-3 | 26.7 E-3 | 1.21 E-6 | 3.00 E-3 | 15.6 E-3 | 13.6 E-3 | 10.7 E-3 | 5.57 E-3 | 26.6 E-3 | 0.00 | 0.32 | 3.13 | 5.22 | 7.73 | 5.71 | 24.6 | 1.000 | 1.003 | 1.032 | 1.055 | 1.084 | 1.061 | 1.325 | 1.000 | 1.000 | 1.003 | 1.005 | 1.007 | 1.005 | 1.030 |
libqmckl.so.0.0.0:0x31620 | qmckl_compute_ao_value_hpc_gaussian | qmckl_ao.c:2781 | 0 | 0 | runtime | parallel | 1 | 2 | 4 | 8 | 16 | 26 | 52 | 0.917 | 0.461 | 0.237 | 0.122 | 61.9 E-3 | 37.7 E-3 | 20.9 E-3 | 15.0 | 15.0 | 15.0 | 15.0 | 15.0 | 15.0 | 15.0 | 8.92 E-6 | 77.8 E-6 | 123 E-6 | 117 E-6 | 170 E-6 | 168 E-6 | 257 E-6 | 1.12 E-6 | 62.6 E-6 | 114 E-6 | 110 E-6 | 162 E-6 | 160 E-6 | 244 E-6 | 0.00 | 0.02 | 0.05 | 0.10 | 0.27 | 0.44 | 1.23 | 1.000 | 1.000 | 1.001 | 1.001 | 1.003 | 1.004 | 1.012 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |