[LLVMdev] Enabling the SLP-vectorizer by default for -O3

Nadav Rotem nrotem at apple.com
Sun Jul 28 01:54:29 CDT 2013


Hi, 

Below you can see the updated benchmark results for the new SLP-vectorizer.  As you can see, there is a small number of compile time regressions, a single major runtime *regression, and many performance gains. There is a tiny increase in code size: 30k for the whole test-suite. Based on the numbers below I would like to enable the SLP-vectorizer by default for -O3. Please let me know if you have any concerns. 

Thanks,
Nadav


* - I now understand the Olden/BH regression better. BH is slower because of a store-buffer stall. This means that the store buffer fills up and the CPU has to wait for some stores to finish. I can think of two reasons that may cause this problem. First, our vectorized stores are followed by a memcpy that's expanded to a list of scalar-read/writes to the same addresses as the vector store. Maybe the processors can’t prune multiple stores to the same address with different sizes (Section 2.2.4 in the optimization guide has some info on this). Another possibility (less likely) is that we increase the critical path by adding a new pshufd instruction before the last vector store and that affects the store-buffer somehow. In any case, there is not much we can do at the IR-level to predict this. 



Performance Regressions - Compile Time	Δ	Previous	Current	σ
MultiSource/Benchmarks/VersaBench/beamformer/beamformer	18.98%	0.0722	0.0859	0.0003
MultiSource/Benchmarks/FreeBench/pifft/pifft	5.66%	0.5003	0.5286	0.0015
MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt	4.85%	0.4084	0.4282	0.0014
MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt	4.36%	0.3856	0.4024	0.0018
MultiSource/Benchmarks/TSVC/ControlFlow-flt/ControlFlow-flt	2.62%	0.4424	0.4540	0.0019
External/SPEC/CINT2006/401_bzip2/401_bzip2	1.50%	1.0613	1.0772	0.0010
MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4	1.23%	12.1337	12.2831	0.0296
MultiSource/Applications/kimwitu++/kc	1.15%	9.3690	9.4769	0.0186
SingleSource/Benchmarks/Misc-C++-EH/spirit	1.13%	3.2769	3.3139	0.0079
External/SPEC/CFP2000/188_ammp/188_ammp	1.01%	1.8632	1.8820	0.0059


Performance Regressions - Execution Time	Δ	Previous	Current	σ
MultiSource/Benchmarks/Olden/bh/bh	19.24%	1.1551	1.3773	0.0021
SingleSource/Benchmarks/SmallPT/smallpt	3.75%	5.8779	6.0983	0.0146
SingleSource/Benchmarks/Misc-C++/Large/ray	1.08%	1.8194	1.8390	0.0009


Performance Improvements - Execution Time	Δ	Previous	Current	σ
SingleSource/Benchmarks/Misc/matmul_f64_4x4	-53.67%	1.4064	0.6516	0.0007
External/Nurbs/nurbs	-19.47%	2.5389	2.0445	0.0029
MultiSource/Benchmarks/Olden/power/power	-18.49%	1.2572	1.0248	0.0004
SingleSource/Benchmarks/Misc/flops-4	-15.93%	0.7767	0.6530	0.0348
MultiSource/Benchmarks/TSVC/LoopRerolling-flt/LoopRerolling-flt	-14.72%	2.3925	2.0404	0.0013
SingleSource/Benchmarks/Misc/flops-6	-11.05%	1.1427	1.0164	0.0009
SingleSource/Benchmarks/Misc/flops-5	-10.43%	1.2771	1.1439	0.0015
MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt	-8.10%	2.3468	2.1568	0.0195
SingleSource/Benchmarks/Misc/pi	-7.18%	0.6042	0.5608	0.0000
External/SPEC/CFP2006/444_namd/444_namd	-4.01%	9.6053	9.2200	0.0064
SingleSource/Benchmarks/Linpack/linpack-pc	-3.85%	95.5313	91.8522	1.1151
MultiSource/Benchmarks/TSVC/LoopRerolling-dbl/LoopRerolling-dbl	-3.52%	3.1962	3.0837	0.0063
MultiSource/Benchmarks/TSVC/LinearDependence-dbl/LinearDependence-dbl	-2.93%	2.9336	2.8477	0.0037
MultiSource/Benchmarks/VersaBench/beamformer/beamformer	-2.79%	0.8845	0.8598	0.0026
SingleSource/Benchmarks/Misc-C++/Large/sphereflake	-2.79%	1.8517	1.8001	0.0014
External/SPEC/CFP2000/177_mesa/177_mesa	-2.15%	1.7214	1.6844	0.0017
SingleSource/Benchmarks/CoyoteBench/fftbench	-2.05%	0.7280	0.7131	0.0049
MultiSource/Benchmarks/TSVC/NodeSplitting-dbl/NodeSplitting-dbl	-1.96%	3.1494	3.0878	0.0034
SingleSource/Benchmarks/Misc/oourafft	-1.70%	3.4625	3.4035	0.0009
SingleSource/Benchmarks/Misc/flops	-1.31%	7.0775	6.9845	0.0014
MultiSource/Applications/JM/lencod/lencod	-1.12%	4.5972	4.5455	0.0050

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20130727/7e6324b0/attachment.html>


More information about the LLVMdev mailing list