OV - - Outputs

Executable Output


* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com

* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 5115)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 1
Number of Threads counted = 1
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 418416 microseconds.
   (= 418416 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           13839.6     0.995896     0.994430     0.997401
Scale:          13984.9     0.985712     0.984102     0.986495
Add:            17215.7     1.200755     1.199128     1.203242
Triad:          17174.4     1.203057     1.202009     1.203596
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 5115)

Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_0

To display your profiling results:
######################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                               COMMAND                                                                               #
######################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_0      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_0  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_0  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_0  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_0      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_0  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_0  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_0  #
######################################################################################################################################################################################################


* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com

* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 5253)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 2
Number of Threads counted = 2
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 209323 microseconds.
   (= 209323 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           27820.1     0.495319     0.494699     0.495869
Scale:          28101.8     0.490455     0.489739     0.490699
Add:            34393.7     0.601037     0.600222     0.601406
Triad:          34315.9     0.602082     0.601582     0.604469
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 5253)

Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_1

To display your profiling results:
######################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                               COMMAND                                                                               #
######################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_1      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_1  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_1  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_1  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_1      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_1  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_1  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_1  #
######################################################################################################################################################################################################


* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com

* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 5324)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 4
Number of Threads counted = 4
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 104414 microseconds.
   (= 104414 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           48454.3     0.284339     0.284032     0.284402
Scale:          48988.1     0.281104     0.280937     0.281237
Add:            64683.0     0.319407     0.319154     0.319936
Triad:          64572.1     0.320010     0.319702     0.320912
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 5324)

Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_2

To display your profiling results:
######################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                               COMMAND                                                                               #
######################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_2      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_2  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_2  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_2  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_2      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_2  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_2  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_2  #
######################################################################################################################################################################################################


* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com

* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 5394)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 8
Number of Threads counted = 8
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 52735 microseconds.
   (= 52735 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           97075.3     0.141917     0.141772     0.141978
Scale:          98140.0     0.140302     0.140234     0.140320
Add:           129323.1     0.159768     0.159630     0.159942
Triad:         129073.2     0.160086     0.159939     0.160385
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 5394)

Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_3

To display your profiling results:
######################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                               COMMAND                                                                               #
######################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_3      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_3  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_3  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_3  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_3      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_3  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_3  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_3  #
######################################################################################################################################################################################################


* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com

* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 5471)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 16
Number of Threads counted = 16
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 35154 microseconds.
   (= 35154 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:          190572.3     0.072448     0.072217     0.073775
Scale:         192969.2     0.071578     0.071320     0.073694
Add:           248843.9     0.083063     0.082959     0.083656
Triad:         248245.4     0.083274     0.083159     0.083811
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 5471)

Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_4

To display your profiling results:
######################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                               COMMAND                                                                               #
######################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_4      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_4  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_4  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_4  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_4      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_4  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_4  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_4  #
######################################################################################################################################################################################################


* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com

* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 5549)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 32
Number of Threads counted = 32
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 28860 microseconds.
   (= 28860 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:          343737.4     0.040468     0.040038     0.040710
Scale:         347820.5     0.039684     0.039568     0.040766
Add:           418705.2     0.049672     0.049304     0.049789
Triad:         418382.7     0.050300     0.049342     0.050464
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 5549)

Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_5

To display your profiling results:
######################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                               COMMAND                                                                               #
######################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_5      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_5  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_5  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_5  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_5      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_5  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_5  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_5  #
######################################################################################################################################################################################################


* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com

* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 5673)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 48
Number of Threads counted = 48
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 28336 microseconds.
   (= 28336 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:          420577.6     0.033525     0.032723     0.033768
Scale:         419360.1     0.033022     0.032818     0.033163
Add:           478486.9     0.043386     0.043144     0.043446
Triad:         477645.5     0.043486     0.043220     0.043933
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 5673)

Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_6

To display your profiling results:
######################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                               COMMAND                                                                               #
######################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_6      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_6  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_6  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_6  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_6      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_6  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_6  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_6  #
######################################################################################################################################################################################################


* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com

* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 5805)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 96
Number of Threads counted = 96
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 28894 microseconds.
   (= 28894 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:          467955.1     0.029832     0.029410     0.029914
Scale:         464370.9     0.029903     0.029637     0.029941
Add:           481590.1     0.043006     0.042866     0.043162
Triad:         482242.6     0.042946     0.042808     0.043061
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 5805)

Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_7

To display your profiling results:
######################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                               COMMAND                                                                               #
######################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_7      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_7  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_7  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_7  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_7      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_7  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_7  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_7  #
######################################################################################################################################################################################################

Report Configuration

Executable Output