* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com
* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 5115)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 1
Number of Threads counted = 1
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 418416 microseconds.
(= 418416 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 13839.6 0.995896 0.994430 0.997401
Scale: 13984.9 0.985712 0.984102 0.986495
Add: 17215.7 1.200755 1.199128 1.203242
Triad: 17174.4 1.203057 1.202009 1.203596
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 5115)
Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_0
To display your profiling results:
######################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
######################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_0 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_0 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_0 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_0 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_0 #
######################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com
* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 5253)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 2
Number of Threads counted = 2
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 209323 microseconds.
(= 209323 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 27820.1 0.495319 0.494699 0.495869
Scale: 28101.8 0.490455 0.489739 0.490699
Add: 34393.7 0.601037 0.600222 0.601406
Triad: 34315.9 0.602082 0.601582 0.604469
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 5253)
Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_1
To display your profiling results:
######################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
######################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_1 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_1 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_1 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_1 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_1 #
######################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com
* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 5324)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 4
Number of Threads counted = 4
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 104414 microseconds.
(= 104414 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 48454.3 0.284339 0.284032 0.284402
Scale: 48988.1 0.281104 0.280937 0.281237
Add: 64683.0 0.319407 0.319154 0.319936
Triad: 64572.1 0.320010 0.319702 0.320912
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 5324)
Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_2
To display your profiling results:
######################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
######################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_2 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_2 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_2 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_2 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_2 #
######################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com
* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 5394)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 8
Number of Threads counted = 8
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 52735 microseconds.
(= 52735 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 97075.3 0.141917 0.141772 0.141978
Scale: 98140.0 0.140302 0.140234 0.140320
Add: 129323.1 0.159768 0.159630 0.159942
Triad: 129073.2 0.160086 0.159939 0.160385
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 5394)
Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_3
To display your profiling results:
######################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
######################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_3 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_3 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_3 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_3 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_3 #
######################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com
* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 5471)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 16
Number of Threads counted = 16
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 35154 microseconds.
(= 35154 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 190572.3 0.072448 0.072217 0.073775
Scale: 192969.2 0.071578 0.071320 0.073694
Add: 248843.9 0.083063 0.082959 0.083656
Triad: 248245.4 0.083274 0.083159 0.083811
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 5471)
Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_4
To display your profiling results:
######################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
######################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_4 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_4 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_4 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_4 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_4 #
######################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com
* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 5549)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 32
Number of Threads counted = 32
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 28860 microseconds.
(= 28860 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 343737.4 0.040468 0.040038 0.040710
Scale: 347820.5 0.039684 0.039568 0.040766
Add: 418705.2 0.049672 0.049304 0.049789
Triad: 418382.7 0.050300 0.049342 0.050464
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 5549)
Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_5
To display your profiling results:
######################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
######################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_5 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_5 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_5 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_5 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_5 #
######################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com
* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 5673)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 48
Number of Threads counted = 48
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 28336 microseconds.
(= 28336 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 420577.6 0.033525 0.032723 0.033768
Scale: 419360.1 0.033022 0.032818 0.033163
Add: 478486.9 0.043386 0.043144 0.043446
Triad: 477645.5 0.043486 0.043220 0.043933
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 5673)
Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_6
To display your profiling results:
######################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
######################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_6 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_6 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_6 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_6 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_6 #
######################################################################################################################################################################################################
* Info: Selecting the 'perf-high-ppn' engine for node idp10.benchmarkcenter.megware.com
* Warning: Found no event able to derive walltime: prepending ref-cycles
* Info: Process launched (host idp10.benchmarkcenter.megware.com, process 5805)-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 860160000 (elements), Offset = 0 (elements)
Memory per array = 6562.5 MiB (= 6.4 GiB).
Total memory required = 19687.5 MiB (= 19.2 GiB).
Each kernel will be executed 100 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 96
Number of Threads counted = 96
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 28894 microseconds.
(= 28894 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 467955.1 0.029832 0.029410 0.029914
Scale: 464370.9 0.029903 0.029637 0.029941
Add: 481590.1 0.043006 0.042866 0.043162
Triad: 482242.6 0.042946 0.042808 0.043061
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
* Info: Process finished (host idp10.benchmarkcenter.megware.com, process 5805)
Your experiment path is /home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_7
To display your profiling results:
######################################################################################################################################################################################################
# LEVEL | REPORT | COMMAND #
######################################################################################################################################################################################################
# Functions | Cluster-wide | maqao lprof -df xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_7 #
# Functions | Per-node | maqao lprof -df -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_7 #
# Functions | Per-process | maqao lprof -df -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_7 #
# Functions | Per-thread | maqao lprof -df -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Cluster-wide | maqao lprof -dl xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Per-node | maqao lprof -dl -dn xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Per-process | maqao lprof -dl -dp xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_7 #
# Loops | Per-thread | maqao lprof -dl -dt xp=/home/eoseret/qaas_runs_CPU_9468/STREAM-all_DDR/intel/stream/run/oneview_runs/compilers/icx_10/oneview_results_scal/tools/lprof_npsu_run_7 #
######################################################################################################################################################################################################