options

Executable Output


* Info: Selecting the 'perf-low-ppn' engine for node o401

* Info: Process launched (host o401, process 19129)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 1
Number of Threads counted = 1
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 507839 microseconds.
   (= 507839 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           12405.0     1.110495     1.109440     1.110877
Scale:          12453.0     1.106691     1.105160     1.108153
Add:            15573.5     1.326685     1.325576     1.327270
Triad:          15566.9     1.327069     1.326140     1.327535
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

* Info: Process finished (host o401, process 19129)

Your experiment path is /scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_0

To display your profiling results:
########################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                COMMAND                                                                                #
########################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_0      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_0  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_0  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_0  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_0      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_0  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_0  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_0  #
########################################################################################################################################################################################################


* Info: Selecting the 'perf-low-ppn' engine for node o401

* Info: Process launched (host o401, process 19472)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 2
Number of Threads counted = 2
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 247635 microseconds.
   (= 247635 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           24791.8     0.555340     0.555125     0.555423
Scale:          24928.1     0.552842     0.552091     0.553257
Add:            31131.9     0.663422     0.663108     0.664081
Triad:          31099.7     0.664171     0.663796     0.665541
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

* Info: Process finished (host o401, process 19472)

Your experiment path is /scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_1

To display your profiling results:
########################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                COMMAND                                                                                #
########################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_1      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_1  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_1  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_1  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_1      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_1  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_1  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_1  #
########################################################################################################################################################################################################


* Info: Selecting the 'perf-low-ppn' engine for node o401

* Info: Process launched (host o401, process 19781)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 4
Number of Threads counted = 4
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 124086 microseconds.
   (= 124086 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           43803.6     0.314404     0.314188     0.314514
Scale:          44356.0     0.310370     0.310275     0.310449
Add:            58841.2     0.351077     0.350840     0.351293
Triad:          58627.0     0.352319     0.352122     0.352830
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

* Info: Process finished (host o401, process 19781)

Your experiment path is /scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_2

To display your profiling results:
########################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                COMMAND                                                                                #
########################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_2      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_2  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_2  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_2  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_2      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_2  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_2  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_2  #
########################################################################################################################################################################################################


* Info: Selecting the 'perf-low-ppn' engine for node o401

* Info: Process launched (host o401, process 20096)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 8
Number of Threads counted = 8
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 61951 microseconds.
   (= 61951 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           87390.2     0.157621     0.157484     0.157692
Scale:          88554.8     0.155548     0.155413     0.155654
Add:           117553.7     0.175733     0.175612     0.175942
Triad:         117176.7     0.176291     0.176177     0.176550
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

* Info: Process finished (host o401, process 20096)

Your experiment path is /scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_3

To display your profiling results:
########################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                COMMAND                                                                                #
########################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_3      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_3  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_3  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_3  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_3      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_3  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_3  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_3  #
########################################################################################################################################################################################################


* Info: Selecting the 'perf-low-ppn' engine for node o401

* Info: Process launched (host o401, process 20420)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 16
Number of Threads counted = 16
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 37042 microseconds.
   (= 37042 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:          172731.6     0.079749     0.079676     0.079787
Scale:         174392.8     0.078972     0.078917     0.079004
Add:           229210.5     0.090152     0.090065     0.090194
Triad:         228877.6     0.090249     0.090196     0.090327
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

* Info: Process finished (host o401, process 20420)

Your experiment path is /scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_4

To display your profiling results:
########################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                COMMAND                                                                                #
########################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_4      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_4  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_4  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_4  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_4      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_4  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_4  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_4  #
########################################################################################################################################################################################################


* Info: Selecting the 'perf-low-ppn' engine for node o401

* Info: Process launched (host o401, process 20764)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 32
Number of Threads counted = 32
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 31038 microseconds.
   (= 31038 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:          318452.5     0.043329     0.043217     0.043913
Scale:         321217.4     0.042998     0.042845     0.043980
Add:           407304.9     0.050820     0.050684     0.051470
Triad:         405903.4     0.050968     0.050859     0.051554
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

* Info: Process finished (host o401, process 20764)

Your experiment path is /scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_5

To display your profiling results:
########################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                COMMAND                                                                                #
########################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_5      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_5  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_5  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_5  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_5      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_5  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_5  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_5  #
########################################################################################################################################################################################################


* Info: Selecting the 'perf-low-ppn' engine for node o401

* Info: Process launched (host o401, process 21153)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 64
Number of Threads counted = 64
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 29479 microseconds.
   (= 29479 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:          448876.7     0.030729     0.030660     0.030734
Scale:         449683.4     0.030760     0.030605     0.030977
Add:           496843.3     0.041696     0.041550     0.041712
Triad:         496222.3     0.041657     0.041602     0.041704
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

* Info: Process finished (host o401, process 21153)

Your experiment path is /scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_6

To display your profiling results:
########################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                COMMAND                                                                                #
########################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_6      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_6  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_6  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_6  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_6      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_6  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_6  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_6  #
########################################################################################################################################################################################################


* Info: Selecting the 'perf-low-ppn' engine for node o401

* Info: Process launched (host o401, process 21642)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 112
Number of Threads counted = 112
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 29622 microseconds.
   (= 29622 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:          463323.5     0.029909     0.029704     0.029907
Scale:         462405.0     0.029908     0.029763     0.029962
Add:           486802.7     0.042497     0.042407     0.042546
Triad:         487147.3     0.042461     0.042377     0.042512
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

* Info: Process finished (host o401, process 21642)

Your experiment path is /scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_7

To display your profiling results:
########################################################################################################################################################################################################
#    LEVEL    |     REPORT     |                                                                                COMMAND                                                                                #
########################################################################################################################################################################################################
#  Functions  |  Cluster-wide  |  maqao lprof -df xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_7      #
#  Functions  |  Per-node      |  maqao lprof -df -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_7  #
#  Functions  |  Per-process   |  maqao lprof -df -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_7  #
#  Functions  |  Per-thread    |  maqao lprof -df -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_7  #
#  Loops      |  Cluster-wide  |  maqao lprof -dl xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_7      #
#  Loops      |  Per-node      |  maqao lprof -dl -dn xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_7  #
#  Loops      |  Per-process   |  maqao lprof -dl -dp xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_7  #
#  Loops      |  Per-thread    |  maqao lprof -dl -dt xp=/scratch_na/users/xoserete/qaas_runs/171-317-3283/intel/stream/run/oneview_runs/compilers/icx_14/oneview_results_scal/tools/lprof_npsu_run_7  #
########################################################################################################################################################################################################

×