options

Executable Output


* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com

* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 6570)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 1
Number of Threads counted = 1
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 398017 microseconds.
   (= 398017 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           24556.9     0.561044     0.560436     0.562432
Scale:          24374.1     0.565099     0.564638     0.566748
Add:            21307.3     0.969510     0.968862     0.970908
Triad:          21332.1     0.968443     0.967734     0.970362
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 6570)

Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_0

To display your profiling results:
#####################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                              COMMAND                                                                               #
#####################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_0      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_0  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_0  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_0  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_0      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_0  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_0  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_0  #
#####################################################################################################################################################################################################


* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com

* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 6649)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 2
Number of Threads counted = 2
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 198987 microseconds.
   (= 198987 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           50695.3     0.271878     0.271476     0.274315
Scale:          50274.4     0.273947     0.273749     0.275223
Add:            43847.0     0.471054     0.470815     0.471904
Triad:          43905.4     0.470539     0.470189     0.471637
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 6649)

Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_1

To display your profiling results:
#####################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                              COMMAND                                                                               #
#####################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_1      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_1  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_1  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_1  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_1      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_1  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_1  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_1  #
#####################################################################################################################################################################################################


* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com

* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 6717)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 4
Number of Threads counted = 4
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 99206 microseconds.
   (= 99206 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           98356.7     0.152490     0.139925     0.153822
Scale:         100801.8     0.153567     0.136531     0.154524
Add:            87302.6     0.241355     0.236463     0.242316
Triad:          87137.2     0.241137     0.236912     0.242346
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 6717)

Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_2

To display your profiling results:
#####################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                              COMMAND                                                                               #
#####################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_2      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_2  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_2  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_2  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_2      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_2  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_2  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_2  #
#####################################################################################################################################################################################################


* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com

* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 6787)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 8
Number of Threads counted = 8
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 49523 microseconds.
   (= 49523 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:          184184.0     0.077286     0.074722     0.077533
Scale:         197142.7     0.077375     0.069810     0.077673
Add:           171591.7     0.123845     0.120308     0.124316
Triad:         171631.5     0.123843     0.120280     0.124527
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 6787)

Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_3

To display your profiling results:
#####################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                              COMMAND                                                                               #
#####################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_3      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_3  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_3  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_3  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_3      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_3  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_3  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_3  #
#####################################################################################################################################################################################################


* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com

* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 6840)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 16
Number of Threads counted = 16
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 25373 microseconds.
   (= 25373 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:          395147.7     0.035052     0.034829     0.038790
Scale:         394084.8     0.035164     0.034923     0.039180
Add:           344592.9     0.060084     0.059908     0.061236
Triad:         344506.5     0.060062     0.059923     0.061386
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 6840)

Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_4

To display your profiling results:
#####################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                              COMMAND                                                                               #
#####################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_4      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_4  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_4  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_4  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_4      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_4  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_4  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_4  #
#####################################################################################################################################################################################################


* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com

* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 6928)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 32
Number of Threads counted = 32
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 14379 microseconds.
   (= 14379 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:          760031.1     0.018979     0.018108     0.020222
Scale:         745860.2     0.019161     0.018452     0.020496
Add:           653182.6     0.032596     0.031605     0.033208
Triad:         652193.7     0.032587     0.031653     0.033143
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 6928)

Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_5

To display your profiling results:
#####################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                              COMMAND                                                                               #
#####################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_5      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_5  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_5  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_5  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_5      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_5  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_5  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_5  #
#####################################################################################################################################################################################################


* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com

* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 7027)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 48
Number of Threads counted = 48
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 10379 microseconds.
   (= 10379 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:          945030.6     0.014783     0.014563     0.015597
Scale:         939738.2     0.014860     0.014645     0.015532
Add:           867678.9     0.024465     0.023792     0.025085
Triad:         868627.6     0.024434     0.023766     0.025167
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 7027)

Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_6

To display your profiling results:
#####################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                              COMMAND                                                                               #
#####################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_6      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_6  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_6  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_6  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_6      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_6  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_6  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_6  #
#####################################################################################################################################################################################################


* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com

* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 7159)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 96
Number of Threads counted = 96
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 8213 microseconds.
   (= 8213 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:         1463785.0     0.010153     0.009402     0.011155
Scale:        1502495.1     0.010120     0.009160     0.011241
Add:          1390144.5     0.015846     0.014850     0.043723
Triad:        1366149.3     0.015753     0.015111     0.017295
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 7159)

Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_7

To display your profiling results:
#####################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                              COMMAND                                                                               #
#####################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_7      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_7  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_7  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_7  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_7      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_7  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_7  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_HBM/intel/stream/run/oneview_runs/compilers/gcc_2/oneview_results_scal/tools/lprof_npsu_run_7  #
#####################################################################################################################################################################################################

×