OV - K-Means scalability icpx-O3-funroll-soa 100000000 - Loops

MAQAO

options

Loops Index

▶Scalability Runs Description

Run run_1_thread	Number processes: 1Number nodes: 1Run Command: <executable> input/100000000.in 1000 100000000 50 25MPI Command: Dataset: Run Directory: /home/fmusial/KMEANS_BenchmarksOMP_PROC_BIND: closeOMP_NUM_THREADS: 1OMP_PLACES: cores
Run run_2_threads	Number processes: 1Number nodes: 1Run Command: <executable> input/100000000.in 1000 100000000 50 25MPI Command: Dataset: Run Directory: /home/fmusial/KMEANS_BenchmarksOMP_PROC_BIND: closeOMP_PLACES: coresOMP_NUM_THREADS: 2
Run run_4_threads	Number processes: 1Number nodes: 1Run Command: <executable> input/100000000.in 1000 100000000 50 25MPI Command: Dataset: Run Directory: /home/fmusial/KMEANS_BenchmarksOMP_PROC_BIND: closeOMP_PLACES: coresOMP_NUM_THREADS: 4
Run run_8_threads	Number processes: 1Number nodes: 1Run Command: <executable> input/100000000.in 1000 100000000 50 25MPI Command: Dataset: Run Directory: /home/fmusial/KMEANS_BenchmarksOMP_PROC_BIND: closeOMP_PLACES: coresOMP_NUM_THREADS: 8
Run run_16_threads	Number processes: 1Number nodes: 1Run Command: <executable> input/100000000.in 1000 100000000 50 25MPI Command: Dataset: Run Directory: /home/fmusial/KMEANS_BenchmarksOMP_PROC_BIND: closeOMP_PLACES: coresOMP_NUM_THREADS: 16
Run run_26_threads	Number processes: 1Number nodes: 1Run Command: <executable> input/100000000.in 1000 100000000 50 25MPI Command: Dataset: Run Directory: /home/fmusial/KMEANS_BenchmarksOMP_PROC_BIND: closeOMP_PLACES: coresOMP_NUM_THREADS: 26

▶Filters

Loop id	Source Location	Source Function	Level	Exclusive Coverage run_1_thread (%)	Exclusive Coverage run_2_threads (%)	Exclusive Coverage run_4_threads (%)	Exclusive Coverage run_8_threads (%)	Exclusive Coverage run_16_threads (%)	Exclusive Coverage run_26_threads (%)	Inclusive Coverage run_1_thread (%)	Inclusive Coverage run_2_threads (%)	Inclusive Coverage run_4_threads (%)	Inclusive Coverage run_8_threads (%)	Inclusive Coverage run_16_threads (%)	Inclusive Coverage run_26_threads (%)	Max Exclusive Time Over Threads run_1_thread (s)	Max Exclusive Time Over Threads run_2_threads (s)	Max Exclusive Time Over Threads run_4_threads (s)	Max Exclusive Time Over Threads run_8_threads (s)	Max Exclusive Time Over Threads run_16_threads (s)	Max Exclusive Time Over Threads run_26_threads (s)	Max Inclusive Time Over Threads run_1_thread (s)	Max Inclusive Time Over Threads run_2_threads (s)	Max Inclusive Time Over Threads run_4_threads (s)	Max Inclusive Time Over Threads run_8_threads (s)	Max Inclusive Time Over Threads run_16_threads (s)	Max Inclusive Time Over Threads run_26_threads (s)	Exclusive Time w.r.t. Wall Time run_1_thread (s)	Exclusive Time w.r.t. Wall Time run_2_threads (s)	Exclusive Time w.r.t. Wall Time run_4_threads (s)	Exclusive Time w.r.t. Wall Time run_8_threads (s)	Exclusive Time w.r.t. Wall Time run_16_threads (s)	Exclusive Time w.r.t. Wall Time run_26_threads (s)	Inclusive Time w.r.t. Wall Time run_1_thread (s)	Inclusive Time w.r.t. Wall Time run_2_threads (s)	Inclusive Time w.r.t. Wall Time run_4_threads (s)	Inclusive Time w.r.t. Wall Time run_8_threads (s)	Inclusive Time w.r.t. Wall Time run_16_threads (s)	Inclusive Time w.r.t. Wall Time run_26_threads (s)	Nb Threads run_1_thread	Nb Threads run_2_threads	Nb Threads run_4_threads	Nb Threads run_8_threads	Nb Threads run_16_threads	Nb Threads run_26_threads	GFLOPS run_1_thread	GFLOPS run_2_threads	GFLOPS run_4_threads	GFLOPS run_8_threads	GFLOPS run_16_threads	GFLOPS run_26_threads	Vectorization Ratio (%)	Vector Length Use (%)	Speedup If No Scalar Integer	Speedup If FP Vectorized	Speedup If Fully Vectorized	Speedup If Perfect Load Balancing run_1_thread	Speedup If Perfect Load Balancing run_2_threads	Speedup If Perfect Load Balancing run_4_threads	Speedup If Perfect Load Balancing run_8_threads	Speedup If Perfect Load Balancing run_16_threads	Speedup If Perfect Load Balancing run_26_threads	Stride 0	Stride 1	Stride n	Stride Unknown	Stride Indirect	Array Access Efficiency	(run_1_thread) Efficiency	(run_1_thread) Potential Speed-Up (%)	(run_2_threads) Efficiency	(run_2_threads) Potential Speed-Up (%)	(run_4_threads) Efficiency	(run_4_threads) Potential Speed-Up (%)	(run_8_threads) Efficiency	(run_8_threads) Potential Speed-Up (%)	(run_16_threads) Efficiency	(run_16_threads) Potential Speed-Up (%)	(run_26_threads) Efficiency	(run_26_threads) Potential Speed-Up (%)
15	kmeans-icpx-O3-funroll-soa - main_soa.cpp:58-70	k_means(int, point_t&, point_t&, int*, point_t&, int, int) [clone .extracted]	Outermost	49.03	45.10	42.33	37.02	30.06	24.34	91.70	88.07	82.21	72.47	58.68	47.41	65.71	30.22	15.37	7.81	3.88	2.47	122.90	58.89	29.45	14.78	7.36	4.54	65.71	31.59	17.15	9.60	5.55	3.81	122.90	61.70	33.30	18.79	10.83	7.41	1	2	4	8	16	26	5.62	11.71	21.36	38.15	65.15	96.75	39.47	20.39	1	1	4.98	1	1	1.01	1.04	1.03	1.06	NA	NA	NA	NA	NA	0.00	1	0	1.04	0	0.96	1.77	0.86	5.33	0.74	7.81	0.66	8.18
16	kmeans-icpx-O3-funroll-soa - main_soa.cpp:58-67	k_means(int, point_t&, point_t&, int*, point_t&, int, int) [clone .extracted]	Innermost	37.43	37.90	35.07	31.19	25.08	20.05	37.43	37.90	35.07	31.19	25.08	20.05	50.16	25.36	12.70	6.53	3.32	2.06	50.16	25.36	12.70	6.53	3.32	2.06	50.16	26.55	14.21	8.09	4.63	3.14	50.16	26.55	14.21	8.09	4.63	3.14	1	2	4	8	16	26	7.73	14.60	27.52	48.36	85.38	123.97	90	46.25	1	1.44	3.06	1	1	1.01	1.03	1.06	1.08	0	2	0	0	0	100.00	1	0	0.94	2.09	0.88	4.11	0.78	7	0.68	8.09	0.62	7.71
11	kmeans-icpx-O3-funroll-soa - main_soa.cpp:81-84	k_means(int, point_t&, point_t&, int*, point_t&, int, int)	Innermost	8.28	8.35	7.71	6.87	5.53	4.47	8.28	8.35	7.71	6.87	5.53	4.47	11.10	11.16	11.04	11.17	11.08	11.10	11.10	11.16	11.04	11.17	11.08	11.10	11.10	5.85	3.12	1.78	1.02	0.70	11.10	5.85	3.12	1.78	1.02	0.70	1	1	1	1	1	1	0.45	0.86	1.60	2.81	4.90	7.15	0	11.61	1.3	1.08	9.81	1	1	1	1	1	1	0	3	0	0	3	50.00	1	0	0.95	0.43	0.89	0.86	0.78	1.52	0.68	1.77	0.61	1.74
14	kmeans-icpx-O3-funroll-soa - main_soa.cpp:62-67	k_means(int, point_t&, point_t&, int*, point_t&, int, int) [clone .extracted]	Innermost	5.24	5.07	4.82	4.26	3.53	3.02	5.24	5.07	4.82	4.26	3.53	3.02	7.02	3.45	1.89	0.95	0.52	0.37	7.02	3.45	1.89	0.95	0.52	0.37	7.02	3.56	1.95	1.10	0.65	0.47	7.02	3.56	1.95	1.10	0.65	0.47	1	2	4	8	16	26	0.00	0.00	0.02	0.02	0.09	0.11	0	11.61	1	2.58	10	1	1.02	1.1	1.1	1.17	1.3	0	2	0	0	0	100.00	1	0	0.99	0.06	0.9	0.49	0.8	0.87	0.67	1.15	0.57	1.29

×