This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hello, I wrote at previous version that unaligned read of first 16 bytes is bad tradeoff. When I made faster strcpy header I realized that it was because I was doing separate check if it crosses page. When I do only check if next 64 bytes do not cross page and first do unaligned 16 byte load then it causes only small overhead for larger strings. This makes my implementation faster for wider family of workloads. It speed up gcc benchmark and most other programs. On unit tests revised version is somewhat slower than previous version. It is caused by choosing first 16 bytes only rarely which causes branch misprediction. I did two additional small improvements, first is squashing padding patch. Second bit is test to cross page can be done as x%4096 < 4096-48 instead x%4096 <= 4096-64 because I align x into 16 bytes. I updated benchmarks, difference between new and revised version is at http://kam.mff.cuni.cz/~ondra/benchmark_string/strlen_profile.html http://kam.mff.cuni.cz/~ondra/strlen_profile.tar.bz2 Ondra 2013-01-31 Ondrej Bilka <neleai@seznam.cz> * sysdeps/x86_64/strlen.S: Replace with new SSE2 based implementation which is faster on all x86_64 architectures. Tested on AMD, Intel Nehalem, SNB, IVB. * sysdeps/x86_64/strnlen.S: Likewise. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove all multiarch strlen and strnlen versions. * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Update. Remove strlen and strnlen related parts. * sysdeps/x86_64/multiarch/strcat-sse2-unaligned.S: Update. Inline strlen part. * sysdeps/x86_64/multiarch/strcat-ssse3.S: Likewise. * sysdeps/x86_64/multiarch/strlen.S: Remove. * sysdeps/x86_64/multiarch/strlen-sse2-no-bsf.S: Remove. * sysdeps/x86_64/multiarch/strlen-sse2-pminub.S: Remove. * sysdeps/x86_64/multiarch/rtld-strlen.S: Remove. * sysdeps/x86_64/multiarch/strlen-sse4.S: Remove. * sysdeps/x86_64/multiarch/strnlen.S: Remove. * sysdeps/x86_64/multiarch/strnlen-sse2-no-bsf.S: Remove.
Attachment:
0001-Faster-strlen-on-x86-64.patch
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |