
Executable Output

* Info: Selecting the 'perf-high-ppn' engine for node gmz16.benchmarkcenter.megware.com

* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 9469)
* Info: "ref-cycles" not supported on gmz16.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 9474)
   _  __       _         _
  | |/ /      (_)       | |
  | ' /  _ __  _  _ __  | | __ ___
  |  <  | '__|| || '_ \ | |/ // _ \ 
  | . \ | |   | || |_) ||   <|  __/
  |_|\_\|_|   |_|| .__/ |_|\_\\___|
                 | |
                 |_|        Version 1.2.4


Copyright (c) 2014-2019, Lawrence Livermore National Security, LLC

Kripke is released under the BSD 3-Clause License, please see the
LICENSE file for the full license

This work was produced under the auspices of the U.S. Department of
Energy by Lawrence Livermore National Laboratory under Contract

Author: Adam J. Kunen 

Compilation Options:
  Architecture:           OpenMP
  Compiler:               /cluster/intel/oneapi/2024.0.0/mpi/2021.11/bin/mpiicpc
  Compiler Flags:         "-O3 -march=native -O3 -march=znver4 -mprefer-vector-width=512 -flto -g -grecord-gcc-switches -fno-omit-frame-pointer -fcf-protection=none -no-pie -cxx=clang++     -Wall -Wextra  "
  Linker Flags:           " "
  CHAI Enabled:           No
  CUDA Enabled:           No
  MPI Enabled:            Yes
  OpenMP Enabled:         Yes
  Caliper Enabled:        No

OpenMP Thread->Core mapping for 1 threads on rank 0
    0->  0

Input Parameters

  Problem Size:
    Zones:                 16 x 16 x 16  (4096 total)
    Groups:                1024
    Legendre Order:        4
    Quadrature Set:        Dummy S2 with 96 points

  Physical Properties:
    Total X-Sec:           sigt=[0.100000, 0.000100, 0.100000]
    Scattering X-Sec:      sigs=[0.050000, 0.000050, 0.050000]

  Solver Options:
    Number iterations:     10

  MPI Decomposition Options:
    Total MPI tasks:       2
    Spatial decomp:        2 x 1 x 1 MPI tasks
    Block solve method:    Sweep

  Per-Task Options:
    DirSets/Directions:    8 sets, 12 directions/set
    GroupSet/Groups:       2 sets, 512 groups/set
    Zone Sets:             1 x 1 x 1
    Architecture:          OpenMP
    Data Layout:           DGZ

Generating Problem

  Decomposition Space:   Procs:      Subdomains (local/global):
  ---------------------  ----------  --------------------------
  (P) Energy:            1           2 / 2
  (Q) Direction:         1           8 / 8
  (R) Space:             2           1 / 2
  (Rx,Ry,Rz) R in XYZ:   2x1x1       1x1x1 / 2x1x1
  (PQR) TOTAL:           2           16 / 32

  Material Volumes=[8.789062e+03, 1.177734e+05, 2.753438e+06]

  Memory breakdown of Field variables:
  Field Variable            Num Elements    Megabytes
  --------------            ------------    ---------
  data/sigs                     15728640      120.000
  dx                                  16        0.000
  dy                                  16        0.000
  dz                                  16        0.000
  ell                               2400        0.018
  ell_plus                          2400        0.018
  i_plane                       25165824      192.000
  j_plane                       25165824      192.000
  k_plane                       25165824      192.000
  mixelem_to_fraction               4352        0.033
  phi                          104857600      800.000
  phi_out                      104857600      800.000
  psi                          402653184     3072.000
  quadrature/w                        96        0.001
  quadrature/xcos                     96        0.001
  quadrature/ycos                     96        0.001
  quadrature/zcos                     96        0.001
  rhs                          402653184     3072.000
  sigt_zonal                     4194304       32.000
  volume                            4096        0.031
  --------                  ------------    ---------
  TOTAL                       1110455664     8472.104

  Generation Complete!

Steady State Solve

  iter 0: particle count=1.197998e+09, change=1.000000e+00
  iter 1: particle count=1.801368e+09, change=3.349511e-01
  iter 2: particle count=2.102278e+09, change=1.431351e-01
  iter 3: particle count=2.251810e+09, change=6.640521e-02
  iter 4: particle count=2.325888e+09, change=3.184924e-02
  iter 5: particle count=2.362467e+09, change=1.548355e-02
  iter 6: particle count=2.380471e+09, change=7.563193e-03
  iter 7: particle count=2.389305e+09, change=3.697158e-03
  iter 8: particle count=2.393627e+09, change=1.805479e-03
  iter 9: particle count=2.395735e+09, change=8.801810e-04
  Solver terminated


  Timer                    Count       Seconds
  ----------------  ------------  ------------
  Generate                     1       0.02985
  LPlusTimes                  10      17.01816
  LTimes                      10      25.68544
  Population                  10       1.67824
  Scattering                  10    1025.60752
  Solve                        1    1107.14039
  Source                      10       0.04474
  SweepSolver                 10      36.24844
  SweepSubdomain             160      18.97942


Figures of Merit

  Throughput:         3.636876e+06 [unknowns/(second/iteration)]
  Grind time :        2.749613e-07 [(seconds/iteration)/unknowns]
  Sweep efficiency :  52.35926 [100.0 * SweepSubdomain time / SweepSolver time]
  Number of unknowns: 402653184


* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 9469)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 9469) Observed more threads (2) than expected (1): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=2.

* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 9474)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 9474) Observed more threads (2) than expected (1): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=2.

Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_0

To display your profiling results:
* Info: Selecting the 'perf-high-ppn' engine for node gmz16.benchmarkcenter.megware.com

* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 9638)
* Info: "ref-cycles" not supported on gmz16.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 9643)
Compilation Options:
  Architecture:           OpenMP
  Compiler:               /cluster/intel/oneapi/2024.0.0/mpi/2021.11/bin/mpiicpc
  Compiler Flags:         "-O3 -march=native -O3 -march=znver4 -mprefer-vector-width=512 -flto -g -grecord-gcc-switches -fno-omit-frame-pointer -fcf-protection=none -no-pie -cxx=clang++     -Wall -Wextra  "
  Linker Flags:           " "
  CHAI Enabled:           No
  CUDA Enabled:           No
  MPI Enabled:            Yes
  OpenMP Enabled:         Yes
  Caliper Enabled:        No

OpenMP Thread->Core mapping for 2 threads on rank 0
    0->  0    1-> 16

Input Parameters

  Problem Size:
    Zones:                 16 x 16 x 16  (4096 total)
    Groups:                1024
    Legendre Order:        4
    Quadrature Set:        Dummy S2 with 96 points

  Physical Properties:
    Total X-Sec:           sigt=[0.100000, 0.000100, 0.100000]
    Scattering X-Sec:      sigs=[0.050000, 0.000050, 0.050000]

  Solver Options:
    Number iterations:     10

  MPI Decomposition Options:
    Total MPI tasks:       2
    Spatial decomp:        2 x 1 x 1 MPI tasks
    Block solve method:    Sweep

  Per-Task Options:
    DirSets/Directions:    8 sets, 12 directions/set
    GroupSet/Groups:       2 sets, 512 groups/set
    Zone Sets:             1 x 1 x 1
    Architecture:          OpenMP
    Data Layout:           DGZ

Generating Problem

  Decomposition Space:   Procs:      Subdomains (local/global):
  ---------------------  ----------  --------------------------
  (P) Energy:            1           2 / 2
  (Q) Direction:         1           8 / 8
  (R) Space:             2           1 / 2
  (Rx,Ry,Rz) R in XYZ:   2x1x1       1x1x1 / 2x1x1
  (PQR) TOTAL:           2           16 / 32

  Material Volumes=[8.789062e+03, 1.177734e+05, 2.753438e+06]

  Memory breakdown of Field variables:
  Field Variable            Num Elements    Megabytes
  --------------            ------------    ---------
  data/sigs                     15728640      120.000
  dx                                  16        0.000
  dy                                  16        0.000
  dz                                  16        0.000
  ell                               2400        0.018
  ell_plus                          2400        0.018
  i_plane                       25165824      192.000
  j_plane                       25165824      192.000
  k_plane                       25165824      192.000
  mixelem_to_fraction               4352        0.033
  phi                          104857600      800.000
  phi_out                      104857600      800.000
  psi                          402653184     3072.000
  quadrature/w                        96        0.001
  quadrature/xcos                     96        0.001
  quadrature/ycos                     96        0.001
  quadrature/zcos                     96        0.001
  rhs                          402653184     3072.000
  sigt_zonal                     4194304       32.000
  volume                            4096        0.031
  --------                  ------------    ---------
  TOTAL                       1110455664     8472.104

  Generation Complete!

Steady State Solve

  iter 0: particle count=1.197998e+09, change=1.000000e+00
  iter 1: particle count=1.801368e+09, change=3.349511e-01
  iter 2: particle count=2.102278e+09, change=1.431351e-01
  iter 3: particle count=2.251810e+09, change=6.640521e-02
  iter 4: particle count=2.325888e+09, change=3.184924e-02
  iter 5: particle count=2.362467e+09, change=1.548355e-02
  iter 6: particle count=2.380471e+09, change=7.563193e-03
  iter 7: particle count=2.389305e+09, change=3.697158e-03
  iter 8: particle count=2.393627e+09, change=1.805479e-03
  iter 9: particle count=2.395735e+09, change=8.801810e-04
  Solver terminated


  Timer                    Count       Seconds
  ----------------  ------------  ------------
  Generate                     1       0.02002
  LPlusTimes                  10      10.28881
  LTimes                      10      14.09989
  Population                  10       0.84334
  Scattering                  10     515.73824
  Solve                        1     564.03207
  Source                      10       0.02224
  SweepSolver                 10      22.20197
  SweepSubdomain             160       9.56857


Figures of Merit

  Throughput:         7.138835e+06 [unknowns/(second/iteration)]
  Grind time :        1.400789e-07 [(seconds/iteration)/unknowns]
  Sweep efficiency :  43.09784 [100.0 * SweepSubdomain time / SweepSolver time]
  Number of unknowns: 402653184


* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 9643)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 9643) Observed more threads (3) than expected (2): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=3.

* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 9638)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 9638) Observed more threads (3) than expected (2): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=3.

Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_1

To display your profiling results:
* Info: Selecting the 'perf-high-ppn' engine for node gmz16.benchmarkcenter.megware.com

* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 9757)
* Info: "ref-cycles" not supported on gmz16.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 9762)
Compilation Options:
  Architecture:           OpenMP
  Compiler:               /cluster/intel/oneapi/2024.0.0/mpi/2021.11/bin/mpiicpc
  Compiler Flags:         "-O3 -march=native -O3 -march=znver4 -mprefer-vector-width=512 -flto -g -grecord-gcc-switches -fno-omit-frame-pointer -fcf-protection=none -no-pie -cxx=clang++     -Wall -Wextra  "
  Linker Flags:           " "
  CHAI Enabled:           No
  CUDA Enabled:           No
  MPI Enabled:            Yes
  OpenMP Enabled:         Yes
  Caliper Enabled:        No

OpenMP Thread->Core mapping for 4 threads on rank 0
    0->  0    1->  8    2-> 16    3->120

Input Parameters

  Problem Size:
    Zones:                 16 x 16 x 16  (4096 total)
    Groups:                1024
    Legendre Order:        4
    Quadrature Set:        Dummy S2 with 96 points

  Physical Properties:
    Total X-Sec:           sigt=[0.100000, 0.000100, 0.100000]
    Scattering X-Sec:      sigs=[0.050000, 0.000050, 0.050000]

  Solver Options:
    Number iterations:     10

  MPI Decomposition Options:
    Total MPI tasks:       2
    Spatial decomp:        2 x 1 x 1 MPI tasks
    Block solve method:    Sweep

  Per-Task Options:
    DirSets/Directions:    8 sets, 12 directions/set
    GroupSet/Groups:       2 sets, 512 groups/set
    Zone Sets:             1 x 1 x 1
    Architecture:          OpenMP
    Data Layout:           DGZ

Generating Problem

  Decomposition Space:   Procs:      Subdomains (local/global):
  ---------------------  ----------  --------------------------
  (P) Energy:            1           2 / 2
  (Q) Direction:         1           8 / 8
  (R) Space:             2           1 / 2
  (Rx,Ry,Rz) R in XYZ:   2x1x1       1x1x1 / 2x1x1
  (PQR) TOTAL:           2           16 / 32

  Material Volumes=[8.789062e+03, 1.177734e+05, 2.753438e+06]

  Memory breakdown of Field variables:
  Field Variable            Num Elements    Megabytes
  --------------            ------------    ---------
  data/sigs                     15728640      120.000
  dx                                  16        0.000
  dy                                  16        0.000
  dz                                  16        0.000
  ell                               2400        0.018
  ell_plus                          2400        0.018
  i_plane                       25165824      192.000
  j_plane                       25165824      192.000
  k_plane                       25165824      192.000
  mixelem_to_fraction               4352        0.033
  phi                          104857600      800.000
  phi_out                      104857600      800.000
  psi                          402653184     3072.000
  quadrature/w                        96        0.001
  quadrature/xcos                     96        0.001
  quadrature/ycos                     96        0.001
  quadrature/zcos                     96        0.001
  rhs                          402653184     3072.000
  sigt_zonal                     4194304       32.000
  volume                            4096        0.031
  --------                  ------------    ---------
  TOTAL                       1110455664     8472.104

  Generation Complete!

Steady State Solve

  iter 0: particle count=1.197998e+09, change=1.000000e+00
  iter 1: particle count=1.801368e+09, change=3.349511e-01
  iter 2: particle count=2.102278e+09, change=1.431351e-01
  iter 3: particle count=2.251810e+09, change=6.640521e-02
  iter 4: particle count=2.325888e+09, change=3.184924e-02
  iter 5: particle count=2.362467e+09, change=1.548355e-02
  iter 6: particle count=2.380471e+09, change=7.563193e-03
  iter 7: particle count=2.389305e+09, change=3.697158e-03
  iter 8: particle count=2.393627e+09, change=1.805479e-03
  iter 9: particle count=2.395735e+09, change=8.801810e-04
  Solver terminated


  Timer                    Count       Seconds
  ----------------  ------------  ------------
  Generate                     1       0.01927
  LPlusTimes                  10       6.20350
  LTimes                      10       9.02198
  Population                  10       3.00649
  Scattering                  10     260.31054
  Solve                        1     285.30179
  Source                      10       0.01237
  SweepSolver                 10       5.92256
  SweepSubdomain             160       5.47562


Figures of Merit

  Throughput:         1.411324e+07 [unknowns/(second/iteration)]
  Grind time :        7.085546e-08 [(seconds/iteration)/unknowns]
  Sweep efficiency :  92.45365 [100.0 * SweepSubdomain time / SweepSolver time]
  Number of unknowns: 402653184


* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 9762)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 9762) Observed more threads (5) than expected (4): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=5.

* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 9757)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 9757) Observed more threads (5) than expected (4): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=5.

Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_2

To display your profiling results:
* Info: Selecting the 'perf-high-ppn' engine for node gmz16.benchmarkcenter.megware.com

* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 9858)
* Info: "ref-cycles" not supported on gmz16.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 9863)
Compilation Options:
  Architecture:           OpenMP
  Compiler:               /cluster/intel/oneapi/2024.0.0/mpi/2021.11/bin/mpiicpc
  Compiler Flags:         "-O3 -march=native -O3 -march=znver4 -mprefer-vector-width=512 -flto -g -grecord-gcc-switches -fno-omit-frame-pointer -fcf-protection=none -no-pie -cxx=clang++     -Wall -Wextra  "
  Linker Flags:           " "
  CHAI Enabled:           No
  CUDA Enabled:           No
  MPI Enabled:            Yes
  OpenMP Enabled:         Yes
  Caliper Enabled:        No

OpenMP Thread->Core mapping for 8 threads on rank 0
    0->  0    1-> 28    2->  8    3-> 36    4-> 16    5-> 44    6->120    7->132

Input Parameters

  Problem Size:
    Zones:                 16 x 16 x 16  (4096 total)
    Groups:                1024
    Legendre Order:        4
    Quadrature Set:        Dummy S2 with 96 points

  Physical Properties:
    Total X-Sec:           sigt=[0.100000, 0.000100, 0.100000]
    Scattering X-Sec:      sigs=[0.050000, 0.000050, 0.050000]

  Solver Options:
    Number iterations:     10

  MPI Decomposition Options:
    Total MPI tasks:       2
    Spatial decomp:        2 x 1 x 1 MPI tasks
    Block solve method:    Sweep

  Per-Task Options:
    DirSets/Directions:    8 sets, 12 directions/set
    GroupSet/Groups:       2 sets, 512 groups/set
    Zone Sets:             1 x 1 x 1
    Architecture:          OpenMP
    Data Layout:           DGZ

Generating Problem

  Decomposition Space:   Procs:      Subdomains (local/global):
  ---------------------  ----------  --------------------------
  (P) Energy:            1           2 / 2
  (Q) Direction:         1           8 / 8
  (R) Space:             2           1 / 2
  (Rx,Ry,Rz) R in XYZ:   2x1x1       1x1x1 / 2x1x1
  (PQR) TOTAL:           2           16 / 32

  Material Volumes=[8.789062e+03, 1.177734e+05, 2.753438e+06]

  Memory breakdown of Field variables:
  Field Variable            Num Elements    Megabytes
  --------------            ------------    ---------
  data/sigs                     15728640      120.000
  dx                                  16        0.000
  dy                                  16        0.000
  dz                                  16        0.000
  ell                               2400        0.018
  ell_plus                          2400        0.018
  i_plane                       25165824      192.000
  j_plane                       25165824      192.000
  k_plane                       25165824      192.000
  mixelem_to_fraction               4352        0.033
  phi                          104857600      800.000
  phi_out                      104857600      800.000
  psi                          402653184     3072.000
  quadrature/w                        96        0.001
  quadrature/xcos                     96        0.001
  quadrature/ycos                     96        0.001
  quadrature/zcos                     96        0.001
  rhs                          402653184     3072.000
  sigt_zonal                     4194304       32.000
  volume                            4096        0.031
  --------                  ------------    ---------
  TOTAL                       1110455664     8472.104

  Generation Complete!

Steady State Solve

  iter 0: particle count=1.197998e+09, change=1.000000e+00
  iter 1: particle count=1.801368e+09, change=3.349511e-01
  iter 2: particle count=2.102278e+09, change=1.431351e-01
  iter 3: particle count=2.251810e+09, change=6.640521e-02
  iter 4: particle count=2.325888e+09, change=3.184924e-02
  iter 5: particle count=2.362467e+09, change=1.548355e-02
  iter 6: particle count=2.380471e+09, change=7.563193e-03
  iter 7: particle count=2.389305e+09, change=3.697158e-03
  iter 8: particle count=2.393627e+09, change=1.805479e-03
  iter 9: particle count=2.395735e+09, change=8.801810e-04
  Solver terminated


  Timer                    Count       Seconds
  ----------------  ------------  ------------
  Generate                     1       0.01960
  LPlusTimes                  10       3.21406
  LTimes                      10       4.63567
  Population                  10       0.21309
  Scattering                  10     128.72984
  Solve                        1     142.41169
  Source                      10       0.00637
  SweepSolver                 10       4.83026
  SweepSubdomain             160       2.81619


Figures of Merit

  Throughput:         2.827388e+07 [unknowns/(second/iteration)]
  Grind time :        3.536833e-08 [(seconds/iteration)/unknowns]
  Sweep efficiency :  58.30296 [100.0 * SweepSubdomain time / SweepSolver time]
  Number of unknowns: 402653184


* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 9858)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 9858) Observed more threads (9) than expected (8): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=9.

* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 9863)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 9863) Observed more threads (9) than expected (8): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=9.

Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_3

To display your profiling results:
* Info: Selecting the 'perf-high-ppn' engine for node gmz16.benchmarkcenter.megware.com

* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 9968)
* Info: "ref-cycles" not supported on gmz16.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 9973)
Compilation Options:
  Architecture:           OpenMP
  Compiler:               /cluster/intel/oneapi/2024.0.0/mpi/2021.11/bin/mpiicpc
  Compiler Flags:         "-O3 -march=native -O3 -march=znver4 -mprefer-vector-width=512 -flto -g -grecord-gcc-switches -fno-omit-frame-pointer -fcf-protection=none -no-pie -cxx=clang++     -Wall -Wextra  "
  Linker Flags:           " "
  CHAI Enabled:           No
  CUDA Enabled:           No
  MPI Enabled:            Yes
  OpenMP Enabled:         Yes
  Caliper Enabled:        No

OpenMP Thread->Core mapping for 16 threads on rank 0
    0->  0    1->  6    2-> 28    3-> 50    4->  8    5-> 14    6-> 36    7-> 58
    8-> 16    9-> 22   10-> 44   11-> 66   12->120   13->126   14->132   15->138

Input Parameters

  Problem Size:
    Zones:                 16 x 16 x 16  (4096 total)
    Groups:                1024
    Legendre Order:        4
    Quadrature Set:        Dummy S2 with 96 points

  Physical Properties:
    Total X-Sec:           sigt=[0.100000, 0.000100, 0.100000]
    Scattering X-Sec:      sigs=[0.050000, 0.000050, 0.050000]

  Solver Options:
    Number iterations:     10

  MPI Decomposition Options:
    Total MPI tasks:       2
    Spatial decomp:        2 x 1 x 1 MPI tasks
    Block solve method:    Sweep

  Per-Task Options:
    DirSets/Directions:    8 sets, 12 directions/set
    GroupSet/Groups:       2 sets, 512 groups/set
    Zone Sets:             1 x 1 x 1
    Architecture:          OpenMP
    Data Layout:           DGZ

Generating Problem

  Decomposition Space:   Procs:      Subdomains (local/global):
  ---------------------  ----------  --------------------------
  (P) Energy:            1           2 / 2
  (Q) Direction:         1           8 / 8
  (R) Space:             2           1 / 2
  (Rx,Ry,Rz) R in XYZ:   2x1x1       1x1x1 / 2x1x1
  (PQR) TOTAL:           2           16 / 32

  Material Volumes=[8.789062e+03, 1.177734e+05, 2.753438e+06]

  Memory breakdown of Field variables:
  Field Variable            Num Elements    Megabytes
  --------------            ------------    ---------
  data/sigs                     15728640      120.000
  dx                                  16        0.000
  dy                                  16        0.000
  dz                                  16        0.000
  ell                               2400        0.018
  ell_plus                          2400        0.018
  i_plane                       25165824      192.000
  j_plane                       25165824      192.000
  k_plane                       25165824      192.000
  mixelem_to_fraction               4352        0.033
  phi                          104857600      800.000
  phi_out                      104857600      800.000
  psi                          402653184     3072.000
  quadrature/w                        96        0.001
  quadrature/xcos                     96        0.001
  quadrature/ycos                     96        0.001
  quadrature/zcos                     96        0.001
  rhs                          402653184     3072.000
  sigt_zonal                     4194304       32.000
  volume                            4096        0.031
  --------                  ------------    ---------
  TOTAL                       1110455664     8472.104

  Generation Complete!

Steady State Solve

  iter 0: particle count=1.197998e+09, change=1.000000e+00
  iter 1: particle count=1.801368e+09, change=3.349511e-01
  iter 2: particle count=2.102278e+09, change=1.431351e-01
  iter 3: particle count=2.251810e+09, change=6.640521e-02
  iter 4: particle count=2.325888e+09, change=3.184924e-02
  iter 5: particle count=2.362467e+09, change=1.548355e-02
  iter 6: particle count=2.380471e+09, change=7.563193e-03
  iter 7: particle count=2.389305e+09, change=3.697158e-03
  iter 8: particle count=2.393627e+09, change=1.805479e-03
  iter 9: particle count=2.395735e+09, change=8.801810e-04
  Solver terminated


  Timer                    Count       Seconds
  ----------------  ------------  ------------
  Generate                     1       0.02015
  LPlusTimes                  10       2.32961
  LTimes                      10       2.86083
  Population                  10       0.47778
  Scattering                  10      64.92937
  Solve                        1      73.71058
  Source                      10       0.00342
  SweepSolver                 10       2.34543
  SweepSubdomain             160       1.44526


Figures of Merit

  Throughput:         5.462624e+07 [unknowns/(second/iteration)]
  Grind time :        1.830622e-08 [(seconds/iteration)/unknowns]
  Sweep efficiency :  61.62002 [100.0 * SweepSubdomain time / SweepSolver time]
  Number of unknowns: 402653184


* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 9973)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 9973) Observed more threads (17) than expected (16): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=17.

* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 9968)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 9968) Observed more threads (17) than expected (16): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=17.

Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_4

To display your profiling results:
* Info: Selecting the 'perf-high-ppn' engine for node gmz16.benchmarkcenter.megware.com

* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 10110)
* Info: "ref-cycles" not supported on gmz16.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 10115)
Compilation Options:
  Architecture:           OpenMP
  Compiler:               /cluster/intel/oneapi/2024.0.0/mpi/2021.11/bin/mpiicpc
  Compiler Flags:         "-O3 -march=native -O3 -march=znver4 -mprefer-vector-width=512 -flto -g -grecord-gcc-switches -fno-omit-frame-pointer -fcf-protection=none -no-pie -cxx=clang++     -Wall -Wextra  "
  Linker Flags:           " "
  CHAI Enabled:           No
  CUDA Enabled:           No
  MPI Enabled:            Yes
  OpenMP Enabled:         Yes
  Caliper Enabled:        No

OpenMP Thread->Core mapping for 32 threads on rank 0
    0->  0    1->  3    2->  6    3-> 25    4-> 28    5-> 31    6-> 50    7-> 53
    8->  8    9-> 11   10-> 14   11-> 33   12-> 36   13-> 39   14-> 58   15-> 61
   16-> 16   17-> 19   18-> 22   19-> 41   20-> 44   21-> 47   22-> 66   23-> 69
   24->120   25->123   26->126   27->129   28->132   29->135   30->138   31->141

Input Parameters

  Problem Size:
    Zones:                 16 x 16 x 16  (4096 total)
    Groups:                1024
    Legendre Order:        4
    Quadrature Set:        Dummy S2 with 96 points

  Physical Properties:
    Total X-Sec:           sigt=[0.100000, 0.000100, 0.100000]
    Scattering X-Sec:      sigs=[0.050000, 0.000050, 0.050000]

  Solver Options:
    Number iterations:     10

  MPI Decomposition Options:
    Total MPI tasks:       2
    Spatial decomp:        2 x 1 x 1 MPI tasks
    Block solve method:    Sweep

  Per-Task Options:
    DirSets/Directions:    8 sets, 12 directions/set
    GroupSet/Groups:       2 sets, 512 groups/set
    Zone Sets:             1 x 1 x 1
    Architecture:          OpenMP
    Data Layout:           DGZ

Generating Problem

  Decomposition Space:   Procs:      Subdomains (local/global):
  ---------------------  ----------  --------------------------
  (P) Energy:            1           2 / 2
  (Q) Direction:         1           8 / 8
  (R) Space:             2           1 / 2
  (Rx,Ry,Rz) R in XYZ:   2x1x1       1x1x1 / 2x1x1
  (PQR) TOTAL:           2           16 / 32

  Material Volumes=[8.789062e+03, 1.177734e+05, 2.753438e+06]

  Memory breakdown of Field variables:
  Field Variable            Num Elements    Megabytes
  --------------            ------------    ---------
  data/sigs                     15728640      120.000
  dx                                  16        0.000
  dy                                  16        0.000
  dz                                  16        0.000
  ell                               2400        0.018
  ell_plus                          2400        0.018
  i_plane                       25165824      192.000
  j_plane                       25165824      192.000
  k_plane                       25165824      192.000
  mixelem_to_fraction               4352        0.033
  phi                          104857600      800.000
  phi_out                      104857600      800.000
  psi                          402653184     3072.000
  quadrature/w                        96        0.001
  quadrature/xcos                     96        0.001
  quadrature/ycos                     96        0.001
  quadrature/zcos                     96        0.001
  rhs                          402653184     3072.000
  sigt_zonal                     4194304       32.000
  volume                            4096        0.031
  --------                  ------------    ---------
  TOTAL                       1110455664     8472.104

  Generation Complete!

Steady State Solve

  iter 0: particle count=1.197998e+09, change=1.000000e+00
  iter 1: particle count=1.801368e+09, change=3.349511e-01
  iter 2: particle count=2.102278e+09, change=1.431351e-01
  iter 3: particle count=2.251810e+09, change=6.640521e-02
  iter 4: particle count=2.325888e+09, change=3.184924e-02
  iter 5: particle count=2.362467e+09, change=1.548355e-02
  iter 6: particle count=2.380471e+09, change=7.563193e-03
  iter 7: particle count=2.389305e+09, change=3.697158e-03
  iter 8: particle count=2.393627e+09, change=1.805479e-03
  iter 9: particle count=2.395735e+09, change=8.801810e-04
  Solver terminated


  Timer                    Count       Seconds
  ----------------  ------------  ------------
  Generate                     1       0.02008
  LPlusTimes                  10       1.95204
  LTimes                      10       2.38741
  Population                  10       0.15799
  Scattering                  10      32.69943
  Solve                        1      39.76690
  Source                      10       0.00187
  SweepSolver                 10       1.82181
  SweepSubdomain             160       0.77328


Figures of Merit

  Throughput:         1.012533e+08 [unknowns/(second/iteration)]
  Grind time :        9.876217e-09 [(seconds/iteration)/unknowns]
  Sweep efficiency :  42.44560 [100.0 * SweepSubdomain time / SweepSolver time]
  Number of unknowns: 402653184


* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 10110)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 10110) Observed more threads (33) than expected (32): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=33.

* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 10115)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 10115) Observed more threads (33) than expected (32): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=33.

Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_5

To display your profiling results:
* Info: Selecting the 'perf-high-ppn' engine for node gmz16.benchmarkcenter.megware.com

* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 10292)
* Info: "ref-cycles" not supported on gmz16.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 10297)
Compilation Options:
  Architecture:           OpenMP
  Compiler:               /cluster/intel/oneapi/2024.0.0/mpi/2021.11/bin/mpiicpc
  Compiler Flags:         "-O3 -march=native -O3 -march=znver4 -mprefer-vector-width=512 -flto -g -grecord-gcc-switches -fno-omit-frame-pointer -fcf-protection=none -no-pie -cxx=clang++     -Wall -Wextra  "
  Linker Flags:           " "
  CHAI Enabled:           No
  CUDA Enabled:           No
  MPI Enabled:            Yes
  OpenMP Enabled:         Yes
  Caliper Enabled:        No

OpenMP Thread->Core mapping for 64 threads on rank 0
    0->  0    1->193    2->  3    3->196    4->  6    5->199    6-> 25    7->218
    8-> 28    9->221   10-> 31   11->240   12-> 50   13->243   14-> 53   15->246
   16->  8   17->201   18-> 11   19->204   20-> 14   21->207   22-> 33   23->226
   24-> 36   25->229   26-> 39   27->248   28-> 58   29->251   30-> 61   31->254
   32-> 16   33->209   34-> 19   35->212   36-> 22   37->215   38-> 41   39->234
   40-> 44   41->237   42-> 47   43->256   44-> 66   45->259   46-> 69   47->262
   48->120   49->313   50->123   51->316   52->126   53->319   54->129   55->322
   56->132   57->325   58->135   59->328   60->138   61->331   62->141   63->334

Input Parameters

  Problem Size:
    Zones:                 16 x 16 x 16  (4096 total)
    Groups:                1024
    Legendre Order:        4
    Quadrature Set:        Dummy S2 with 96 points

  Physical Properties:
    Total X-Sec:           sigt=[0.100000, 0.000100, 0.100000]
    Scattering X-Sec:      sigs=[0.050000, 0.000050, 0.050000]

  Solver Options:
    Number iterations:     10

  MPI Decomposition Options:
    Total MPI tasks:       2
    Spatial decomp:        2 x 1 x 1 MPI tasks
    Block solve method:    Sweep

  Per-Task Options:
    DirSets/Directions:    8 sets, 12 directions/set
    GroupSet/Groups:       2 sets, 512 groups/set
    Zone Sets:             1 x 1 x 1
    Architecture:          OpenMP
    Data Layout:           DGZ

Generating Problem

  Decomposition Space:   Procs:      Subdomains (local/global):
  ---------------------  ----------  --------------------------
  (P) Energy:            1           2 / 2
  (Q) Direction:         1           8 / 8
  (R) Space:             2           1 / 2
  (Rx,Ry,Rz) R in XYZ:   2x1x1       1x1x1 / 2x1x1
  (PQR) TOTAL:           2           16 / 32

  Material Volumes=[8.789062e+03, 1.177734e+05, 2.753438e+06]

  Memory breakdown of Field variables:
  Field Variable            Num Elements    Megabytes
  --------------            ------------    ---------
  data/sigs                     15728640      120.000
  dx                                  16        0.000
  dy                                  16        0.000
  dz                                  16        0.000
  ell                               2400        0.018
  ell_plus                          2400        0.018
  i_plane                       25165824      192.000
  j_plane                       25165824      192.000
  k_plane                       25165824      192.000
  mixelem_to_fraction               4352        0.033
  phi                          104857600      800.000
  phi_out                      104857600      800.000
  psi                          402653184     3072.000
  quadrature/w                        96        0.001
  quadrature/xcos                     96        0.001
  quadrature/ycos                     96        0.001
  quadrature/zcos                     96        0.001
  rhs                          402653184     3072.000
  sigt_zonal                     4194304       32.000
  volume                            4096        0.031
  --------                  ------------    ---------
  TOTAL                       1110455664     8472.104

  Generation Complete!

Steady State Solve

  iter 0: particle count=1.197998e+09, change=1.000000e+00
  iter 1: particle count=1.801368e+09, change=3.349511e-01
  iter 2: particle count=2.102278e+09, change=1.431351e-01
  iter 3: particle count=2.251810e+09, change=6.640521e-02
  iter 4: particle count=2.325888e+09, change=3.184924e-02
  iter 5: particle count=2.362467e+09, change=1.548355e-02
  iter 6: particle count=2.380471e+09, change=7.563193e-03
  iter 7: particle count=2.389305e+09, change=3.697158e-03
  iter 8: particle count=2.393627e+09, change=1.805479e-03
  iter 9: particle count=2.395735e+09, change=8.801810e-04
  Solver terminated


  Timer                    Count       Seconds
  ----------------  ------------  ------------
  Generate                     1       0.01959
  LPlusTimes                  10       2.24294
  LTimes                      10       3.50936
  Population                  10       0.08517
  Scattering                  10      16.91393
  Solve                        1      29.32880
  Source                      10       0.00119
  SweepSolver                 10       5.80821
  SweepSubdomain             160       0.58777


Figures of Merit

  Throughput:         1.372894e+08 [unknowns/(second/iteration)]
  Grind time :        7.283886e-09 [(seconds/iteration)/unknowns]
  Sweep efficiency :  10.11966 [100.0 * SweepSubdomain time / SweepSolver time]
  Number of unknowns: 402653184


* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 10297)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 10297) Observed more threads (65) than expected (64): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=65.

* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 10292)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 10292) Observed more threads (65) than expected (64): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=65.

Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_6

To display your profiling results:
* Info: Selecting the 'perf-high-ppn' engine for node gmz16.benchmarkcenter.megware.com

* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 10621)
* Info: "ref-cycles" not supported on gmz16.benchmarkcenter.megware.com: fallback to "cpu-clock"
* Warning: Found no event able to derive walltime: prepending cpu-clock
* Info: Process launched (host gmz16.benchmarkcenter.megware.com, process 10627)
Compilation Options:
  Architecture:           OpenMP
  Compiler:               /cluster/intel/oneapi/2024.0.0/mpi/2021.11/bin/mpiicpc
  Compiler Flags:         "-O3 -march=native -O3 -march=znver4 -mprefer-vector-width=512 -flto -g -grecord-gcc-switches -fno-omit-frame-pointer -fcf-protection=none -no-pie -cxx=clang++     -Wall -Wextra  "
  Linker Flags:           " "
  CHAI Enabled:           No
  CUDA Enabled:           No
  MPI Enabled:            Yes
  OpenMP Enabled:         Yes
  Caliper Enabled:        No

OpenMP Thread->Core mapping for 96 threads on rank 0
    0->  0    1->  1    2->  2    3->  3    4->  4    5->  5    6->  6    7->  7
    8-> 24    9-> 25   10-> 26   11-> 27   12-> 28   13-> 29   14-> 30   15-> 31
   16-> 48   17-> 49   18-> 50   19-> 51   20-> 52   21-> 53   22-> 54   23-> 55
   24->  8   25->  9   26-> 10   27-> 11   28-> 12   29-> 13   30-> 14   31-> 15
   32-> 32   33-> 33   34-> 34   35-> 35   36-> 36   37-> 37   38-> 38   39-> 39
   40-> 56   41-> 57   42-> 58   43-> 59   44-> 60   45-> 61   46-> 62   47-> 63
   48-> 16   49-> 17   50-> 18   51-> 19   52-> 20   53-> 21   54-> 22   55-> 23
   56-> 40   57-> 41   58-> 42   59-> 43   60-> 44   61-> 45   62-> 46   63-> 47
   64-> 64   65-> 65   66-> 66   67-> 67   68-> 68   69-> 69   70-> 70   71-> 71
   72->120   73->121   74->122   75->123   76->124   77->125   78->126   79->127
   80->128   81->129   82->130   83->131   84->132   85->133   86->134   87->135
   88->136   89->137   90->138   91->139   92->140   93->141   94->142   95->143

Input Parameters

  Problem Size:
    Zones:                 16 x 16 x 16  (4096 total)
    Groups:                1024
    Legendre Order:        4
    Quadrature Set:        Dummy S2 with 96 points

  Physical Properties:
    Total X-Sec:           sigt=[0.100000, 0.000100, 0.100000]
    Scattering X-Sec:      sigs=[0.050000, 0.000050, 0.050000]

  Solver Options:
    Number iterations:     10

  MPI Decomposition Options:
    Total MPI tasks:       2
    Spatial decomp:        2 x 1 x 1 MPI tasks
    Block solve method:    Sweep

  Per-Task Options:
    DirSets/Directions:    8 sets, 12 directions/set
    GroupSet/Groups:       2 sets, 512 groups/set
    Zone Sets:             1 x 1 x 1
    Architecture:          OpenMP
    Data Layout:           DGZ

Generating Problem

  Decomposition Space:   Procs:      Subdomains (local/global):
  ---------------------  ----------  --------------------------
  (P) Energy:            1           2 / 2
  (Q) Direction:         1           8 / 8
  (R) Space:             2           1 / 2
  (Rx,Ry,Rz) R in XYZ:   2x1x1       1x1x1 / 2x1x1
  (PQR) TOTAL:           2           16 / 32

  Material Volumes=[8.789062e+03, 1.177734e+05, 2.753438e+06]

  Memory breakdown of Field variables:
  Field Variable            Num Elements    Megabytes
  --------------            ------------    ---------
  data/sigs                     15728640      120.000
  dx                                  16        0.000
  dy                                  16        0.000
  dz                                  16        0.000
  ell                               2400        0.018
  ell_plus                          2400        0.018
  i_plane                       25165824      192.000
  j_plane                       25165824      192.000
  k_plane                       25165824      192.000
  mixelem_to_fraction               4352        0.033
  phi                          104857600      800.000
  phi_out                      104857600      800.000
  psi                          402653184     3072.000
  quadrature/w                        96        0.001
  quadrature/xcos                     96        0.001
  quadrature/ycos                     96        0.001
  quadrature/zcos                     96        0.001
  rhs                          402653184     3072.000
  sigt_zonal                     4194304       32.000
  volume                            4096        0.031
  --------                  ------------    ---------
  TOTAL                       1110455664     8472.104

  Generation Complete!

Steady State Solve

  iter 0: particle count=1.197998e+09, change=1.000000e+00
  iter 1: particle count=1.801368e+09, change=3.349511e-01
  iter 2: particle count=2.102278e+09, change=1.431351e-01
  iter 3: particle count=2.251810e+09, change=6.640521e-02
  iter 4: particle count=2.325888e+09, change=3.184924e-02
  iter 5: particle count=2.362467e+09, change=1.548355e-02
  iter 6: particle count=2.380471e+09, change=7.563193e-03
  iter 7: particle count=2.389305e+09, change=3.697158e-03
  iter 8: particle count=2.393627e+09, change=1.805479e-03
  iter 9: particle count=2.395735e+09, change=8.801810e-04
  Solver terminated


  Timer                    Count       Seconds
  ----------------  ------------  ------------
  Generate                     1       0.01932
  LPlusTimes                  10       2.24593
  LTimes                      10       2.67905
  Population                  10       0.05873
  Scattering                  10      12.88735
  Solve                        1      26.16114
  Source                      10       0.00083
  SweepSolver                 10       7.48809
  SweepSubdomain             160       0.53343


Figures of Merit

  Throughput:         1.539127e+08 [unknowns/(second/iteration)]
  Grind time :        6.497190e-09 [(seconds/iteration)/unknowns]
  Sweep efficiency :  7.12378 [100.0 * SweepSubdomain time / SweepSolver time]
  Number of unknowns: 402653184


* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 10621)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 10621) Observed more threads (97) than expected (96): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=97.

* Info: Process finished (host gmz16.benchmarkcenter.megware.com, process 10627)
* Warning: (host gmz16.benchmarkcenter.megware.com, process 10627) Observed more threads (97) than expected (96): in case of high IO overhead or suspicious profile, rerun with maximum-threads-per-process=97.

Your experiment path is /beegfs/hackathon/users/eoseret/qaas_runs/kripke/intel/Kripke/run/oneview_runs/compilers/aocc_10/oneview_results_scal/tools/lprof_npsu_run_7

To display your profiling results:
