strlen() and similar functions use some cool magic to determine whether any bytes of an integer is zero. This magic is explained in a long comment in all these source files. Part of this comment is: 1) Is this safe? Will it catch all the zero bytes? Suppose there is a byte with all zeros. Any carry bits propagating from its left will fall into the hole at its least significant bit and stop. [...] "propagating from its left" is wrong, it should be "propagating from its right". In glibc-2.7 there are 14 files that contain this typo. Luckily the wording and the formatting of the paragraph is exactly the same everywhere. I don't send a patch because that might easily get outdated or miss some newly added files. Rather, please do a combo of grep and sed or whatever similar tools to fix these.
The comment doesn't match what the code is doing. The comments should all be removed when BZ #5807 is resolved.
Well, the comment does match to what the code is doing now, not in strlen.c but all the other similar files wherever this comment appears (such as strchr.c just to mention one.) No need to fix the comments of course, provided that you change the implementation in all these fourteen files, not just in strlen.c; and throw out the old version, not just put inside #if 0. If you keep the old version around, I recommend to fix the comments because it can be done very easily, and might be a help to anyone trying to understand that. The algorithm you linked from bug #5807 is definitely nicer than the current one, easier to understand, and does exact match with no false positive.
Agreed, all fourteen files should be changed to use a sensible algorithm, and the old comments should be removed. This issue still depends on resolving #5807 first.
This is still in current git.
Code comments are not an issue with the manual.
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, azanella/generic-strings has been created at 72aa7602bb7fc7e54aaf3f1f49a18122676e138b (commit) - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=72aa7602bb7fc7e54aaf3f1f49a18122676e138b commit 72aa7602bb7fc7e54aaf3f1f49a18122676e138b Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Tue Feb 21 17:14:16 2017 -0300 sh: Add string-fzb.h and string-fzi.h Use the SH cmp/str on has_{zero,eq,zero_eq} and avoid use builtin count leading/trailing zero which for SH calls a libgcc function (expanding it to direct byte testing is better than a function call). * sysdeps/sh/string-fzb.h: New file. * sysdeps/sh/string-fzi.h: Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0ce10e49871b2d759f6115bb1355883c31bd5959 commit 0ce10e49871b2d759f6115bb1355883c31bd5959 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:26:18 2017 -0200 powerpc: Add string-fza.h While ppc has the more important string functions in assembly, there are still a few generic routines used. Use the Power 6 CMPB insn for testing of zeros. * sysdeps/powerpc/power6/string-fza.h: New file. * sysdeps/powerpc/powerpc32/power6/string-fza.h: New file. * sysdeps/powerpc/powerpc64/power6/string-fza.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=55ddf7c70da6a8ae23507c41a34d00a127bc1308 commit 55ddf7c70da6a8ae23507c41a34d00a127bc1308 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:24:23 2017 -0200 arm: Add string-fza.h While arm has the more important string functions in assembly, there are still a few generic routines used. Use the UQSUB8 insn for testing of zeros. * sysdeps/arm/armv6t2/string-fza.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=bef385336624e51985182dc9401c702fdfc73817 commit bef385336624e51985182dc9401c702fdfc73817 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:23:27 2017 -0200 alpha: Add string-fzb.h and string-fzi.h While alpha has the more important string functions in assembly, there are still a few for find the generic routines are used. Use the CMPBGE insn, via the builtin, for testing of zeros. Use a simplified expansion of __builtin_ctz when the insn isn't available. * sysdeps/alpha/string-fza.h: New file. * sysdeps/alpha/string-fzb.h: New file. * sysdeps/alpha/string-fzi.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1da9832f154b026451369da54a7860266e691c95 commit 1da9832f154b026451369da54a7860266e691c95 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:22:39 2017 -0200 hppa: Add string-fzb.h and string-fzi.h Use UXOR,SBZ to test for a zero byte within a word. While we can get semi-decent code out of asm-goto, we would do slightly better with a compiler builtin. For index_zero et al, sequential testing of bytes is less expensive than any tricks that involve a count-leading-zeros insn that we don't have. * sysdeps/hppa/string-fza.h: New file. * sysdeps/hppa/string-fzb.h: New file. * sysdeps/hppa/string-fzi.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=6396dd6f4ead1ae41aa2e07103f0a68001f3e208 commit 6396dd6f4ead1ae41aa2e07103f0a68001f3e208 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:22:02 2017 -0200 hppa: Add memcopy.h GCC's combine pass cannot merge (x >> c | y << (32 - c)) into a double-word shift unless (1) the subtract is in the same basic block and (2) the result of the subtract is used exactly once. Neither condition is true for any use of MERGE. By forcing the use of a double-word shift, we not only reduce contention on SAR, but also allow the setting of SAR to be hoisted outside of a loop. * sysdeps/hppa/memcopy.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=903270e20262487e0dbf1a12d36db172787ef2da commit 903270e20262487e0dbf1a12d36db172787ef2da Author: Adhemerval Zanella <adhemerval.zanella@linaro.com> Date: Wed Mar 8 16:56:17 2017 +0100 Improve generic strcpy New generic implementation tries to use word operations along with the new string-fz{b,i} functions even for inputs with different alignments (with still uses aligned access plus merge operation to get a correct word by word comparison). * string/strcpy.c: Rewrite using memcopy.h, string-fzb.h, string-fzi.h. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=d8c455b1a050a8d9806b568b1d064fa46e12f634 commit d8c455b1a050a8d9806b568b1d064fa46e12f634 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:21:26 2017 -0200 Improve generic strcmp New generic implementation tries to use word operations along with the new string-fz{b,i} functions even for inputs with different alignments (with still uses aligned access plus merge operation to get a correct word by word comparison). * string/strcmp.c: Rewrite using memcopy.h, string-fzb.h, string-fzi.h. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=2dba7f25afce8fe2f38ce333bfd6506105b89633 commit 2dba7f25afce8fe2f38ce333bfd6506105b89633 Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Thu Feb 16 16:21:03 2017 -0200 Improve generic strnlen With an optimized memchr, new strnlen implementation basically calls memchr and adjust the result pointer value. [BZ #5806] * string/strnlen.c: Rewrite in terms of memchr. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=445306b7ebb5b544509313edde654315fc34c8a3 commit 445306b7ebb5b544509313edde654315fc34c8a3 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:20:35 2017 -0200 Improve generic memrchr New algorithm have the following key differences: - Use string-fz{b,i} functions. [BZ #5806] * string/memrchr.c: Use string-fzb.h, string-fzi.h. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=231a8739e99c60c6ba0804b22e18093c402d0b03 commit 231a8739e99c60c6ba0804b22e18093c402d0b03 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:19:40 2017 -0200 Improve generic strchrnul New algorithm have the following key differences: - Reads first word unaligned and use string-maskoff function to remove unwanted data. This strategy follow assemble optimized ones for aarch64, powerpc and tile. - Use string-fz{b,i} functions. [BZ #5806] * string/strchrnul.c: Use string-fzb.h, string-fzi.h. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=7fab7d55da5cbcccc034188a4bec35b8a0522402 commit 7fab7d55da5cbcccc034188a4bec35b8a0522402 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:19:12 2017 -0200 Improve generic memchr New algorithm have the following key differences: - Reads first word unaligned and use string-maskoff function to remove unwanted data. This strategy follow assemble optimized ones for aarch64, powerpc and tile. - Use string-fz{b,i} and string-opthr functions. [BZ #5806] * string/memchr.c: Use string-fzb.h, string-fzi.h, string-opthr.h. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c1ed1c8d1b25ec9cc0d788070682e5db2827c147 commit c1ed1c8d1b25ec9cc0d788070682e5db2827c147 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:18:48 2017 -0200 Improve generic strchr New algorithm have the following key differences: - Reads first word unaligned and use string-maskoff function to remove unwanted data. This strategy follow assemble optimized ones for aarch64 and powerpc. - Use string-fz{b,i} and string-extbyte function. [BZ #5806] * string/strchr.c: Use string-fzb.h, string-fzi.h, string-extbyte.h. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=7f989a408cd9197d2b69fbf81a1408200d7efc40 commit 7f989a408cd9197d2b69fbf81a1408200d7efc40 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:18:24 2017 -0200 Improve generic strlen New algorithm have the following key differences: - Reads first word unaligned and use string-maskoff function to remove unwanted data. This strategy follow assemble optimized ones for powerpc, sparc, and SH. - Extract has_zero and index_first_zero tests into headers that can be tailored for the architecture. [BZ #5806] * sysdeps/generic/string-fza.h: New file. * sysdeps/generic/string-fzb.h: New file. * sysdeps/generic/string-fzi.h: New file. * sysdeps/generic/string-extbyte.h: New file. * string/strlen.c: Use them. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=d16d6a7e1fe19a5f252f721a3229c0daf8979a31 commit d16d6a7e1fe19a5f252f721a3229c0daf8979a31 Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Thu Feb 23 18:45:54 2017 -0300 Add string-maskoff.h generic header Macros to operate on unaligned access for string operations, such as to create a bit mask to remove non wanted bytes from an unaligned read, and to repeat byte within a word. * sysdeps/generic/string-maskoff.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ad1d81b8ba31514579912a7ff52c5405c21b9726 commit ad1d81b8ba31514579912a7ff52c5405c21b9726 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:15:27 2017 -0200 Parameterize OP_T_THRES from memcopy.h Basically it moves OP_T_THRES out of memcopy.h to its own header and adjust each architecture that redefines it. * sysdeps/generic/memcopy.h (OP_T_THRES): Move... * sysdeps/generic/string-opthr.h: ... here; new file. * sysdeps/i386/memcopy.h (OP_T_THRES): Move... * sysdeps/i386/string-opthr.h: ... here; new file. * sysdeps/m68k/memcopy.h (OP_T_THRES): Remove. * sysdeps/powerpc/powerpc32/power4/memcopy.h (OP_T_THRES): Remove. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=6b9ecce2f78a2bebc2c1c21c0b21e32ccdad8862 commit 6b9ecce2f78a2bebc2c1c21c0b21e32ccdad8862 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:14:09 2017 -0200 Parameterize op_t from memcopy.h Basically moves op_t definition out to an specific header. * sysdeps/generic/string-optype.h: New file. * sysdeps/generic/memcopy.h: Include it. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=85893852d17ce5742f128308e55ead1f10e2beb0 commit 85893852d17ce5742f128308e55ead1f10e2beb0 Author: Adhemerval Zanella <adhemerval.zanella@linaro.com> Date: Thu Mar 9 15:00:27 2017 +0100 string: Remove __memchr definition Since memchr is a C90 function (so there are no external-linkage namespace issues), and not used in any macros defined in installed headers (so no block-scope namespace issues) it is safe to just remove its internal definition and just set all the arch specific implementation to just define memchr instead. Checked on x86_64-linux-gnu and with build-many-glibc.py. * string/memchr.c (__memchr): Redefine to memchr. * sysdeps/aarch64/memchr.S (__memchr): Likewise. * sysdeps/aarch64/rawmemchr.S (__memchr): Likewise. * sysdeps/i386/i686/multiarch/memchr.S (__memchr): Likewise. * sysdeps/i386/memchr.S (__memchr): Likewise. * sysdeps/ia64/memchr.S (__memchr): Likewise. * sysdeps/m68k/memchr.S (__memchr): Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/memchr-ppc32.c (__memchr): Likewise. * sysdeps/powerpc/powerpc32/power7/memchr.S (__memchr): Likewise. * sysdeps/powerpc/powerpc64/power7/memchr.S (__memchr): Likewise. * sysdeps/sparc/sparc32/memchr.S (__memchr): Likewise. * sysdeps/sparc/sparc64/memchr.S (__memchr): Likewise. * sysdeps/tile/tilegx/memchr.c (__memchr): Likewise. * sysdeps/tile/tilepro/memchr.c (__memchr): Likewise. * sysdeps/x86_64/memchr.S (__memchr): Likewise. -----------------------------------------------------------------------
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, azanella/generic-strings has been created at e24962bc9b04c0d43f02f036be079552e26ddc6a (commit) - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=e24962bc9b04c0d43f02f036be079552e26ddc6a commit e24962bc9b04c0d43f02f036be079552e26ddc6a Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Tue Feb 21 17:14:16 2017 -0300 sh: Add string-fzb.h and string-fzi.h Use the SH cmp/str on has_{zero,eq,zero_eq}. Checked on sh4-linux-gnu. Adhemerval Zanella <adhemerval.zanella@linaro.org> * sysdeps/sh/string-fzb.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=127ee46de04935699d5be9b2f8bc2f01ebf13a63 commit 127ee46de04935699d5be9b2f8bc2f01ebf13a63 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:26:18 2017 -0200 powerpc: Add string-fza.h While ppc has the more important string functions in assembly, there are still a few generic routines used. Use the Power 6 CMPB insn for testing of zeros. Checked on powerpc64le-linux-gnu. Richard Henderson <rth@twiddle.net> * sysdeps/powerpc/power6/string-fza.h: New file. * sysdeps/powerpc/powerpc32/power6/string-fza.h: New file. * sysdeps/powerpc/powerpc64/power6/string-fza.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=317e73df9e1224fe2ba829c3d0e6ab36858752eb commit 317e73df9e1224fe2ba829c3d0e6ab36858752eb Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:24:23 2017 -0200 arm: Add string-fza.h While arm has the more important string functions in assembly, there are still a few generic routines used. Use the UQSUB8 insn for testing of zeros. Checked on armv7-linux-gnueabihf Richard Henderson <rth@twiddle.net> * sysdeps/arm/armv6t2/string-fza.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c5d9130b36ceb00fcf22e79cdaa58c3ddade294a commit c5d9130b36ceb00fcf22e79cdaa58c3ddade294a Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:23:27 2017 -0200 alpha: Add string-fzb.h and string-fzi.h While alpha has the more important string functions in assembly, there are still a few for find the generic routines are used. Use the CMPBGE insn, via the builtin, for testing of zeros. Use a simplified expansion of __builtin_ctz when the insn isn't available. Checked on alpha-linux-gnu. Richard Henderson <rth@twiddle.net> * sysdeps/alpha/string-fza.h: New file. * sysdeps/alpha/string-fzb.h: New file. * sysdeps/alpha/string-fzi.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=8fec1f9ca69bb6e6d4952cee029c1a0bcee3b57d commit 8fec1f9ca69bb6e6d4952cee029c1a0bcee3b57d Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:22:39 2017 -0200 hppa: Add string-fzb.h and string-fzi.h Use UXOR,SBZ to test for a zero byte within a word. While we can get semi-decent code out of asm-goto, we would do slightly better with a compiler builtin. For index_zero et al, sequential testing of bytes is less expensive than any tricks that involve a count-leading-zeros insn that we don't have. Checked on hppa-linux-gnu. Richard Henderson <rth@twiddle.net> * sysdeps/hppa/string-fza.h: New file. * sysdeps/hppa/string-fzb.h: New file. * sysdeps/hppa/string-fzi.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=109c793ed2bce55328c897853b91ae628aa42c6d commit 109c793ed2bce55328c897853b91ae628aa42c6d Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:22:02 2017 -0200 hppa: Add memcopy.h GCC's combine pass cannot merge (x >> c | y << (32 - c)) into a double-word shift unless (1) the subtract is in the same basic block and (2) the result of the subtract is used exactly once. Neither condition is true for any use of MERGE. By forcing the use of a double-word shift, we not only reduce contention on SAR, but also allow the setting of SAR to be hoisted outside of a loop. Checked on hppa-linux-gnu. Richard Henderson <rth@twiddle.net> * sysdeps/hppa/memcopy.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=e644d7431f23aab32f88a9c37ec58704cb4cc8e5 commit e644d7431f23aab32f88a9c37ec58704cb4cc8e5 Author: Adhemerval Zanella <adhemerval.zanella@linaro.com> Date: Wed Mar 8 16:56:17 2017 +0100 Improve generic strcpy New generic implementation tries to use word operations along with the new string-fz{b,i} functions even for inputs with different alignments (with still uses aligned access plus merge operation to get a correct word by word comparison). Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> * string/strcpy.c: Rewrite using memcopy.h, string-fzb.h, string-fzi.h. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=e0b21c7f76a0d88e519891fe71d2f83b0a4a0d27 commit e0b21c7f76a0d88e519891fe71d2f83b0a4a0d27 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:21:26 2017 -0200 Improve generic strcmp New generic implementation tries to use word operations along with the new string-fz{b,i} functions even for inputs with different alignments (with still uses aligned access plus merge operation to get a correct word by word comparison). Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> * string/strcmp.c: Rewrite using memcopy.h, string-fzb.h, string-fzi.h. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=7428f9e9c5d643ee84bf061fe641f9e2ecde372c commit 7428f9e9c5d643ee84bf061fe641f9e2ecde372c Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:19:40 2017 -0200 Improve generic strchrnul New algorithm have the following key differences: - Reads first word unaligned and use string-maskoff function to remove unwanted data. This strategy follow assemble optimized ones for aarch64, powerpc and tile. - Use string-fz{b,i} functions. Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). [BZ #5806] * string/strchrnul.c: Use string-fzb.h, string-fzi.h. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=2a441052e3b8ef83dd3e520bc1f1037a472b90fd commit 2a441052e3b8ef83dd3e520bc1f1037a472b90fd Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:18:48 2017 -0200 Improve generic strchr New algorithm have the following key differences: - Reads first word unaligned and use string-maskoff function to remove unwanted data. This strategy follow assemble optimized ones for aarch64 and powerpc. - Use string-fz{b,i} and string-extbyte function. Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). [BZ #5806] * string/strchr.c: Use string-fzb.h, string-fzi.h, string-extbyte.h. * sysdeps/s390/multiarch/strchr-c.c: Redefine weak_alias. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=96d1ccd369e59c2c741adc81b0a339d90990ae8a commit 96d1ccd369e59c2c741adc81b0a339d90990ae8a Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Thu Feb 16 16:21:03 2017 -0200 Improve generic strnlen With an optimized memchr, new strnlen implementation basically calls memchr and adjust the result pointer value. It also cleanups the multiple inclusion by leaving the ifunc implementation to undef the weak_alias and libc_hidden_def. [BZ #5806] * string/strnlen.c: Rewrite in terms of memchr. * sysdeps/i386/i686/multiarch/strnlen-c.c: Redefine weak_alias and libc_hidden_def. * sysdeps/powerpc/powerpc32/power4/multiarch/strnlen-ppc32.c: Likewise. * sysdeps/s390/multiarch/strnlen-c.c: Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=9e7ec585ae87b5f4ac4e98e3eb8821784cd2b87a commit 9e7ec585ae87b5f4ac4e98e3eb8821784cd2b87a Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:20:35 2017 -0200 Improve generic memrchr New algorithm have the following key differences: - Use string-fz{b,i} functions. It also cleanups the multiple inclusion by leaving the ifunc implementation to undef the weak_alias. Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). [BZ #5806] * string/memrchr.c: Use string-fzb.h, string-fzi.h. * sysdeps/i386/i686/multiarch/memrchr-c.c: Redefined weak_alias. * sysdeps/s390/multiarch/memrchr-c.c: Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=3d0770c276c6db572cb550abf6124da52dfa34a9 commit 3d0770c276c6db572cb550abf6124da52dfa34a9 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:19:12 2017 -0200 Improve generic memchr New algorithm have the following key differences: - Reads first word unaligned and use string-maskoff function to remove unwanted data. This strategy follow assemble optimized ones for aarch64, powerpc and tile. - Use string-fz{b,i} and string-opthr functions. Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). [BZ #5806] * string/memchr.c: Use string-fzb.h, string-fzi.h, string-opthr.h. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a13293ecf9828d582c77b97255835d075311a1a3 commit a13293ecf9828d582c77b97255835d075311a1a3 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:18:24 2017 -0200 string: Improve generic strlen New algorithm have the following key differences: - Reads first word unaligned and use string-maskoff functions to remove unwanted data. This strategy follow assemble optimized ones for powerpc, sparc, and SH. - Use of has_zero and index_first_zero parametrized functions. Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). [BZ #5806] * string/strlen.c: Use them. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a7a46e4ac7a290e16700649fbd596998cea47c19 commit a7a46e4ac7a290e16700649fbd596998cea47c19 Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Mon Jan 8 16:41:43 2018 -0200 Add string vectorized find and detection headers This patch adds generic string find and detection implementation meant to be used in generic vectorized string implementation. The idea is to decompose the basic string operation so each architecture can reimplement if it provides any specialized hardware instruction. The 'string-fza.h' provides zero byte detection functions (find_zero_low, find_zero_all, find_eq_low, find_eq_all, find_zero_eq_low, find_zero_eq_all, find_zero_ne_low, and find_zero_ne_all). They are used on both functions provided by 'string-fzb.h' and 'string-fzi'. The 'string-fzb.h' provides boolean zero byte detection with the functions: - has_zero: determine if any byte within a word is zero. - has_eq: determine byte equality between two words. - has_zero_eq: determine if any byte within a word is zero along with byte equality between two words. The 'string-fzi.h' provides zero byte detection along with its positions: - index_first_zero: return index of first zero byte within a word. - index_first_eq: return index of first byte different between two words. - index_first_zero_eq: return index of first zero byte within a word or first byte different between two words. - index_first_zero_ne: return index of first zero byte within a word or first byte equal between two words. - index_last_zero: return index of last zero byte within a word. - index_last_eq: return index of last byte different between two words. Also, to avoid libcalls in the '__builtin_c{t,l}z{l}' calls (which may add performance degradation), inline implementation based on De Bruijn sequences are added (enabled by a configure check). Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> * config.h.in (HAVE_BUILTIN_CTZ, HAVE_BUILTIN_CLZ): New defines. * configure.ac: Check for __builtin_ctz{l} with no external dependencies * sysdeps/generic/string-extbyte.h: New file. * sysdeps/generic/string-fza.h: Likewise. * sysdeps/generic/string-fzb.h: Likewise. * sysdeps/generic/string-fzi.h: Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=040bb0502a08765e01ba31708719e657a8c113b1 commit 040bb0502a08765e01ba31708719e657a8c113b1 Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Thu Feb 23 18:45:54 2017 -0300 Add string-maskoff.h generic header Macros to operate on unaligned access for string operations: - create_mask: create a mask based on pointer alignment to sets up non-zero bytes before the beginning of the word so a following operation (such as find zero) might ignore these bytes. - highbit_mask: create a mask with high bit of each byte being 1, and the low 7 bits being all the opposite of the input. These macros are meant to be used on optimized vectorized string implementations. Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> * sysdeps/generic/string-maskoff.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=69030c746991d2d12d8b2c37ff84d350d6c412ee commit 69030c746991d2d12d8b2c37ff84d350d6c412ee Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:15:27 2017 -0200 Parameterize OP_T_THRES from memcopy.h Basically it moves OP_T_THRES out of memcopy.h to its own header and adjust each architecture that redefines it. Checked with a build and check with run-built-tests=no for all major Linux ABIs (alpha, aarch64, arm, hppa, i686, ia64, m68k, microblaze, mips, mips64, nios2, powerpc, powerpc64le, s390x, sh4, sparc64, tilegx, and x86_64). Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> * sysdeps/generic/memcopy.h (OP_T_THRES): Move... * sysdeps/generic/string-opthr.h: ... here; new file. * sysdeps/i386/memcopy.h (OP_T_THRES): Move... * sysdeps/i386/string-opthr.h: ... here; new file. * sysdeps/m68k/memcopy.h (OP_T_THRES): Remove. * string/memcmp.c (OP_T_THRES): Remove definition. * sysdeps/powerpc/powerpc32/power4/memcopy.h (OP_T_THRES): Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=aa7c9fa3a21562f9abdacac629b6fcef64551d6a commit aa7c9fa3a21562f9abdacac629b6fcef64551d6a Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:14:09 2017 -0200 Parameterize op_t from memcopy.h Basically moves op_t definition out to an specific header, adds the attribute 'may-alias', and cleanup its duplicated definitions. It lead to inclusion of tilegx32 gmp-mparam.h similar to x32 so op_t can be define as a long long (from _LONG_LONG_LIMB). Checked with a build and check with run-built-tests=no for all major Linux ABIs (alpha, aarch64, arm, hppa, i686, ia64, m68k, microblaze, mips, mips64, nios2, powerpc, powerpc64le, s390x, sh4, sparc64, tilegx, and x86_64). Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> * sysdeps/generic/string-optype.h: New file. * sysdeps/generic/memcopy.h: Include it. * string/memcmp.c (op_t): Remove define. * sysdeps/tile/memcmp.c (op_t): Likewise. * sysdeps/tile/memcopy.h (op_t): Likewise. * sysdeps/tile/tilegx32/gmp-mparam.h: New file. -----------------------------------------------------------------------
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, azanella/generic-strings has been created at 52a9cf06655e8b65b51de679c7550c2c1c2d837b (commit) - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=52a9cf06655e8b65b51de679c7550c2c1c2d837b commit 52a9cf06655e8b65b51de679c7550c2c1c2d837b Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Tue Feb 21 17:14:16 2017 -0300 sh: Add string-fzb.h Use the SH cmp/str on has_{zero,eq,zero_eq}. Checked on sh4-linux-gnu. Adhemerval Zanella <adhemerval.zanella@linaro.org> * sysdeps/sh/string-fzb.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=24d01f48967662e8f356d8367658cbb192617667 commit 24d01f48967662e8f356d8367658cbb192617667 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:26:18 2017 -0200 powerpc: Add string-fza.h While ppc has the more important string functions in assembly, there are still a few generic routines used. Use the Power 6 CMPB insn for testing of zeros. Checked on powerpc64le-linux-gnu. Richard Henderson <rth@twiddle.net> * sysdeps/powerpc/power6/string-fza.h: New file. * sysdeps/powerpc/powerpc32/power6/string-fza.h: New file. * sysdeps/powerpc/powerpc64/power6/string-fza.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=6b67749f9df8400b8d0a768f78333302d7c5d9f6 commit 6b67749f9df8400b8d0a768f78333302d7c5d9f6 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:24:23 2017 -0200 arm: Add string-fza.h While arm has the more important string functions in assembly, there are still a few generic routines used. Use the UQSUB8 insn for testing of zeros. Checked on armv7-linux-gnueabihf Richard Henderson <rth@twiddle.net> * sysdeps/arm/armv6t2/string-fza.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=099a39660c4d50187f30a87a52dc92cfca22df34 commit 099a39660c4d50187f30a87a52dc92cfca22df34 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:23:27 2017 -0200 alpha: Add string-fzb.h and string-fzi.h While alpha has the more important string functions in assembly, there are still a few for find the generic routines are used. Use the CMPBGE insn, via the builtin, for testing of zeros. Use a simplified expansion of __builtin_ctz when the insn isn't available. Checked on alpha-linux-gnu. Richard Henderson <rth@twiddle.net> * sysdeps/alpha/string-fza.h: New file. * sysdeps/alpha/string-fzb.h: New file. * sysdeps/alpha/string-fzi.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=b6d266c6fee641c4c9724086dcb9e07c02c5a5bf commit b6d266c6fee641c4c9724086dcb9e07c02c5a5bf Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:22:39 2017 -0200 hppa: Add string-fzb.h and string-fzi.h Use UXOR,SBZ to test for a zero byte within a word. While we can get semi-decent code out of asm-goto, we would do slightly better with a compiler builtin. For index_zero et al, sequential testing of bytes is less expensive than any tricks that involve a count-leading-zeros insn that we don't have. Checked on hppa-linux-gnu. Richard Henderson <rth@twiddle.net> * sysdeps/hppa/string-fza.h: New file. * sysdeps/hppa/string-fzb.h: New file. * sysdeps/hppa/string-fzi.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ebb0353dba2ed38ff88bcf8ea03eb80bb41b3e4f commit ebb0353dba2ed38ff88bcf8ea03eb80bb41b3e4f Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:22:02 2017 -0200 hppa: Add memcopy.h GCC's combine pass cannot merge (x >> c | y << (32 - c)) into a double-word shift unless (1) the subtract is in the same basic block and (2) the result of the subtract is used exactly once. Neither condition is true for any use of MERGE. By forcing the use of a double-word shift, we not only reduce contention on SAR, but also allow the setting of SAR to be hoisted outside of a loop. Checked on hppa-linux-gnu. Richard Henderson <rth@twiddle.net> * sysdeps/hppa/memcopy.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=5c29485b5e15798ef3ede33272b36df521e4cbec commit 5c29485b5e15798ef3ede33272b36df521e4cbec Author: Adhemerval Zanella <adhemerval.zanella@linaro.com> Date: Wed Mar 8 16:56:17 2017 +0100 string: Improve generic strcpy New generic implementation tries to use word operations along with the new string-fz{b,i} functions even for inputs with different alignments (with still uses aligned access plus merge operation to get a correct word by word comparison). Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> * string/strcpy.c: Rewrite using memcopy.h, string-fzb.h, string-fzi.h. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=84bfd24e57385ce8e5d37e148443f22661e87490 commit 84bfd24e57385ce8e5d37e148443f22661e87490 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:21:26 2017 -0200 string: Improve generic strcmp New generic implementation tries to use word operations along with the new string-fz{b,i} functions even for inputs with different alignments (with still uses aligned access plus merge operation to get a correct word by word comparison). Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> * string/strcmp.c: Rewrite using memcopy.h, string-fzb.h, string-fzi.h. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=572c3a397d79ab9efa89d131f72ef2d6b438eb69 commit 572c3a397d79ab9efa89d131f72ef2d6b438eb69 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:19:40 2017 -0200 string: Improve generic strchrnul New algorithm have the following key differences: - Reads first word unaligned and use string-maskoff function to remove unwanted data. This strategy follow assemble optimized ones for aarch64, powerpc and tile. - Use string-fz{b,i} functions. Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). [BZ #5806] * string/strchrnul.c: Use string-fzb.h, string-fzi.h. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=500a7f1a861e801292bfd70d9e6c4539550f0f56 commit 500a7f1a861e801292bfd70d9e6c4539550f0f56 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:18:48 2017 -0200 string: Improve generic strchr New algorithm have the following key differences: - Reads first word unaligned and use string-maskoff function to remove unwanted data. This strategy follow assemble optimized ones for aarch64 and powerpc. - Use string-fz{b,i} and string-extbyte function. Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> [BZ #5806] * string/strchr.c: Use string-fzb.h, string-fzi.h, string-extbyte.h. * sysdeps/s390/multiarch/strchr-c.c: Redefine weak_alias. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=2368dc1dcac592501eb07bc6c9d0fa6f9c43251b commit 2368dc1dcac592501eb07bc6c9d0fa6f9c43251b Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Thu Feb 16 16:21:03 2017 -0200 string: Improve generic strnlen With an optimized memchr, new strnlen implementation basically calls memchr and adjust the result pointer value. It also cleanups the multiple inclusion by leaving the ifunc implementation to undef the weak_alias and libc_hidden_def. Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> [BZ #5806] * string/strnlen.c: Rewrite in terms of memchr. * sysdeps/i386/i686/multiarch/strnlen-c.c: Redefine weak_alias and libc_hidden_def. * sysdeps/powerpc/powerpc32/power4/multiarch/strnlen-ppc32.c: Likewise. * sysdeps/s390/multiarch/strnlen-c.c: Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=8b5ae84584e867a592a6134c5f9165600b8ae7c1 commit 8b5ae84584e867a592a6134c5f9165600b8ae7c1 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:20:35 2017 -0200 string: Improve generic memrchr New algorithm have the following key differences: - Use string-fz{b,i} functions. It also cleanups the multiple inclusion by leaving the ifunc implementation to undef the weak_alias. Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> [BZ #5806] * string/memrchr.c: Use string-fzb.h, string-fzi.h. * sysdeps/i386/i686/multiarch/memrchr-c.c: Redefined weak_alias. * sysdeps/s390/multiarch/memrchr-c.c: Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=15b7238287bebef55d6ef2191766990713d9b977 commit 15b7238287bebef55d6ef2191766990713d9b977 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:19:12 2017 -0200 string: Improve generic memchr New algorithm have the following key differences: - Reads first word unaligned and use string-maskoff function to remove unwanted data. This strategy follow assemble optimized ones for aarch64, powerpc and tile. - Use string-fz{b,i} and string-opthr functions. Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). [BZ #5806] * string/memchr.c: Use string-fzb.h, string-fzi.h, string-opthr.h. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=096c5ae2a75e1b8b641c97015f9deaee5a17b2cc commit 096c5ae2a75e1b8b641c97015f9deaee5a17b2cc Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:18:24 2017 -0200 string: Improve generic strlen New algorithm have the following key differences: - Reads first word unaligned and use string-maskoff functions to remove unwanted data. This strategy follow assemble optimized ones for powerpc, sparc, and SH. - Use of has_zero and index_first_zero parametrized functions. Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). [BZ #5806] * string/strlen.c: Use them. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c6766b5d52369eae2b64b4c0d23482afa76e0cd6 commit c6766b5d52369eae2b64b4c0d23482afa76e0cd6 Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Mon Jan 8 16:41:43 2018 -0200 Add string vectorized find and detection functions This patch adds generic string find and detection implementation meant to be used in generic vectorized string implementation. The idea is to decompose the basic string operation so each architecture can reimplement if it provides any specialized hardware instruction. The 'string-fza.h' provides zero byte detection functions (find_zero_low, find_zero_all, find_eq_low, find_eq_all, find_zero_eq_low, find_zero_eq_all, find_zero_ne_low, and find_zero_ne_all). They are used on both functions provided by 'string-fzb.h' and 'string-fzi'. The 'string-fzb.h' provides boolean zero byte detection with the functions: - has_zero: determine if any byte within a word is zero. - has_eq: determine byte equality between two words. - has_zero_eq: determine if any byte within a word is zero along with byte equality between two words. The 'string-fzi.h' provides zero byte detection along with its positions: - index_first_zero: return index of first zero byte within a word. - index_first_eq: return index of first byte different between two words. - index_first_zero_eq: return index of first zero byte within a word or first byte different between two words. - index_first_zero_ne: return index of first zero byte within a word or first byte equal between two words. - index_last_zero: return index of last zero byte within a word. - index_last_eq: return index of last byte different between two words. Also, to avoid libcalls in the '__builtin_c{t,l}z{l}' calls (which may add performance degradation), inline implementation based on De Bruijn sequences are added (enabled by a configure check). Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> * config.h.in (HAVE_BUILTIN_CTZ, HAVE_BUILTIN_CLZ): New defines. * configure.ac: Check for __builtin_ctz{l} with no external dependencies * sysdeps/generic/string-extbyte.h: New file. * sysdeps/generic/string-fza.h: Likewise. * sysdeps/generic/string-fzb.h: Likewise. * sysdeps/generic/string-fzi.h: Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=040bb0502a08765e01ba31708719e657a8c113b1 commit 040bb0502a08765e01ba31708719e657a8c113b1 Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Thu Feb 23 18:45:54 2017 -0300 Add string-maskoff.h generic header Macros to operate on unaligned access for string operations: - create_mask: create a mask based on pointer alignment to sets up non-zero bytes before the beginning of the word so a following operation (such as find zero) might ignore these bytes. - highbit_mask: create a mask with high bit of each byte being 1, and the low 7 bits being all the opposite of the input. These macros are meant to be used on optimized vectorized string implementations. Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> * sysdeps/generic/string-maskoff.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=69030c746991d2d12d8b2c37ff84d350d6c412ee commit 69030c746991d2d12d8b2c37ff84d350d6c412ee Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:15:27 2017 -0200 Parameterize OP_T_THRES from memcopy.h Basically it moves OP_T_THRES out of memcopy.h to its own header and adjust each architecture that redefines it. Checked with a build and check with run-built-tests=no for all major Linux ABIs (alpha, aarch64, arm, hppa, i686, ia64, m68k, microblaze, mips, mips64, nios2, powerpc, powerpc64le, s390x, sh4, sparc64, tilegx, and x86_64). Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> * sysdeps/generic/memcopy.h (OP_T_THRES): Move... * sysdeps/generic/string-opthr.h: ... here; new file. * sysdeps/i386/memcopy.h (OP_T_THRES): Move... * sysdeps/i386/string-opthr.h: ... here; new file. * sysdeps/m68k/memcopy.h (OP_T_THRES): Remove. * string/memcmp.c (OP_T_THRES): Remove definition. * sysdeps/powerpc/powerpc32/power4/memcopy.h (OP_T_THRES): Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=aa7c9fa3a21562f9abdacac629b6fcef64551d6a commit aa7c9fa3a21562f9abdacac629b6fcef64551d6a Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:14:09 2017 -0200 Parameterize op_t from memcopy.h Basically moves op_t definition out to an specific header, adds the attribute 'may-alias', and cleanup its duplicated definitions. It lead to inclusion of tilegx32 gmp-mparam.h similar to x32 so op_t can be define as a long long (from _LONG_LONG_LIMB). Checked with a build and check with run-built-tests=no for all major Linux ABIs (alpha, aarch64, arm, hppa, i686, ia64, m68k, microblaze, mips, mips64, nios2, powerpc, powerpc64le, s390x, sh4, sparc64, tilegx, and x86_64). Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> * sysdeps/generic/string-optype.h: New file. * sysdeps/generic/memcopy.h: Include it. * string/memcmp.c (op_t): Remove define. * sysdeps/tile/memcmp.c (op_t): Likewise. * sysdeps/tile/memcopy.h (op_t): Likewise. * sysdeps/tile/tilegx32/gmp-mparam.h: New file. -----------------------------------------------------------------------
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, azanella/generic-strings has been created at 776bf884cdc6d8b7560311db143d954da2e969bf (commit) - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=776bf884cdc6d8b7560311db143d954da2e969bf commit 776bf884cdc6d8b7560311db143d954da2e969bf Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Tue Feb 21 17:14:16 2017 -0300 sh: Add string-fzb.h Use the SH cmp/str on has_{zero,eq,zero_eq}. Checked on sh4-linux-gnu. Adhemerval Zanella <adhemerval.zanella@linaro.org> * sysdeps/sh/string-fzb.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=3f4eb93d500527cb988bca89b3ca1c55eaa7528b commit 3f4eb93d500527cb988bca89b3ca1c55eaa7528b Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:26:18 2017 -0200 powerpc: Add string-fza.h While ppc has the more important string functions in assembly, there are still a few generic routines used. Use the Power 6 CMPB insn for testing of zeros. Checked on powerpc64le-linux-gnu. Richard Henderson <rth@twiddle.net> * sysdeps/powerpc/string-fza.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=faf34eb8e865d099db877561b20d6bd82793ee5d commit faf34eb8e865d099db877561b20d6bd82793ee5d Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:24:23 2017 -0200 arm: Add string-fza.h While arm has the more important string functions in assembly, there are still a few generic routines used. Use the UQSUB8 insn for testing of zeros. Checked on armv7-linux-gnueabihf Richard Henderson <rth@twiddle.net> * sysdeps/arm/armv6t2/string-fza.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=5115c06c369958195d7200f1fefadec6a7194c16 commit 5115c06c369958195d7200f1fefadec6a7194c16 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:23:27 2017 -0200 alpha: Add string-fzb.h and string-fzi.h While alpha has the more important string functions in assembly, there are still a few for find the generic routines are used. Use the CMPBGE insn, via the builtin, for testing of zeros. Use a simplified expansion of __builtin_ctz when the insn isn't available. Checked on alpha-linux-gnu. Richard Henderson <rth@twiddle.net> * sysdeps/alpha/string-fzb.h: New file. * sysdeps/alpha/string-fzi.h: Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=eb974666a0cb7f1c834e23c598bd70ea5467d85b commit eb974666a0cb7f1c834e23c598bd70ea5467d85b Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:22:39 2017 -0200 hppa: Add string-fzb.h and string-fzi.h Use UXOR,SBZ to test for a zero byte within a word. While we can get semi-decent code out of asm-goto, we would do slightly better with a compiler builtin. For index_zero et al, sequential testing of bytes is less expensive than any tricks that involve a count-leading-zeros insn that we don't have. Checked on hppa-linux-gnu. Richard Henderson <rth@twiddle.net> * sysdeps/hppa/string-fzb.h: New file. * sysdeps/hppa/string-fzi.h: Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=44c7540c69f6c01d19e34b8c6d06dc8eabef8ef9 commit 44c7540c69f6c01d19e34b8c6d06dc8eabef8ef9 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:22:02 2017 -0200 hppa: Add memcopy.h GCC's combine pass cannot merge (x >> c | y << (32 - c)) into a double-word shift unless (1) the subtract is in the same basic block and (2) the result of the subtract is used exactly once. Neither condition is true for any use of MERGE. By forcing the use of a double-word shift, we not only reduce contention on SAR, but also allow the setting of SAR to be hoisted outside of a loop. Checked on hppa-linux-gnu. Richard Henderson <rth@twiddle.net> * sysdeps/hppa/memcopy.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=fc8a77c5b25196847a405781ffe472ff9e8eb6ce commit fc8a77c5b25196847a405781ffe472ff9e8eb6ce Author: Adhemerval Zanella <adhemerval.zanella@linaro.com> Date: Wed Mar 8 16:56:17 2017 +0100 string: Improve generic strcpy New generic implementation tries to use word operations along with the new string-fz{b,i} functions even for inputs with different alignments (with still uses aligned access plus merge operation to get a correct word by word comparison). Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> * string/strcpy.c: Rewrite using memcopy.h, string-fzb.h, string-fzi.h. * string/test-strcpy.c (test_main): Add more coverage. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0c653b42a1f2ef3b52be833b56596aefa8ad5736 commit 0c653b42a1f2ef3b52be833b56596aefa8ad5736 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:21:26 2017 -0200 string: Improve generic strcmp New generic implementation tries to use word operations along with the new string-fz{b,i} functions even for inputs with different alignments (with still uses aligned access plus merge operation to get a correct word by word comparison). Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> * string/strcmp.c: Rewrite using memcopy.h, string-fzb.h, string-fzi.h. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c79804d028a9ebd21de6ae1442ce706fa3c1d7c2 commit c79804d028a9ebd21de6ae1442ce706fa3c1d7c2 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:19:40 2017 -0200 string: Improve generic strchrnul New algorithm have the following key differences: - Reads first word unaligned and use string-maskoff function to remove unwanted data. This strategy follow assemble optimized ones for aarch64, powerpc and tile. - Use string-fz{b,i} functions. Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). [BZ #5806] * string/strchrnul.c: Use string-fzb.h, string-fzi.h. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1751ee1c285284a28725b820c18f8d2b7b5b9258 commit 1751ee1c285284a28725b820c18f8d2b7b5b9258 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:18:48 2017 -0200 string: Improve generic strchr New algorithm have the following key differences: - Reads first word unaligned and use string-maskoff function to remove unwanted data. This strategy follow assemble optimized ones for aarch64 and powerpc. - Use string-fz{b,i} and string-extbyte function. Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> [BZ #5806] * string/strchr.c: Use string-fzb.h, string-fzi.h, string-extbyte.h. * sysdeps/s390/multiarch/strchr-c.c: Redefine weak_alias. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=6d2690c0cb8b6b73ff6eb1309d231467c688aafd commit 6d2690c0cb8b6b73ff6eb1309d231467c688aafd Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Thu Feb 16 16:21:03 2017 -0200 string: Improve generic strnlen With an optimized memchr, new strnlen implementation basically calls memchr and adjust the result pointer value. It also cleanups the multiple inclusion by leaving the ifunc implementation to undef the weak_alias and libc_hidden_def. Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> [BZ #5806] * string/strnlen.c: Rewrite in terms of memchr. * sysdeps/i386/i686/multiarch/strnlen-c.c: Redefine weak_alias and libc_hidden_def. * sysdeps/powerpc/powerpc32/power4/multiarch/strnlen-ppc32.c: Likewise. * sysdeps/s390/multiarch/strnlen-c.c: Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=6702dd345ea55ccb468684bd809835a6c719c198 commit 6702dd345ea55ccb468684bd809835a6c719c198 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:20:35 2017 -0200 string: Improve generic memrchr New algorithm have the following key differences: - Use string-fz{b,i} functions. It also cleanups the multiple inclusion by leaving the ifunc implementation to undef the weak_alias. Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> [BZ #5806] * string/memrchr.c: Use string-fzb.h, string-fzi.h. * sysdeps/i386/i686/multiarch/memrchr-c.c: Redefined weak_alias. * sysdeps/s390/multiarch/memrchr-c.c: Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1c12833514864aba99b01b2f173e1a98ec1f9658 commit 1c12833514864aba99b01b2f173e1a98ec1f9658 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:19:12 2017 -0200 string: Improve generic memchr New algorithm have the following key differences: - Reads first word unaligned and use string-maskoff function to remove unwanted data. This strategy follow assemble optimized ones for aarch64, powerpc and tile. - Use string-fz{b,i} and string-opthr functions. Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). [BZ #5806] * string/memchr.c: Use string-fzb.h, string-fzi.h, string-opthr.h. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=96419fb9b7feee1c05dd99ba4afdc89d94ef4aad commit 96419fb9b7feee1c05dd99ba4afdc89d94ef4aad Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:18:24 2017 -0200 string: Improve generic strlen New algorithm have the following key differences: - Reads first word unaligned and use string-maskoff functions to remove unwanted data. This strategy follow assemble optimized ones for powerpc, sparc, and SH. - Use of has_zero and index_first_zero parametrized functions. Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). [BZ #5806] * string/strlen.c: Use them. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=438e5fcc0aff82752229bd88bbaffafc63ec6b81 commit 438e5fcc0aff82752229bd88bbaffafc63ec6b81 Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Mon Jan 8 16:41:43 2018 -0200 Add string vectorized find and detection functions This patch adds generic string find and detection implementation meant to be used in generic vectorized string implementation. The idea is to decompose the basic string operation so each architecture can reimplement if it provides any specialized hardware instruction. The 'string-fza.h' provides zero byte detection functions (find_zero_low, find_zero_all, find_eq_low, find_eq_all, find_zero_eq_low, find_zero_eq_all, find_zero_ne_low, and find_zero_ne_all). They are used on both functions provided by 'string-fzb.h' and 'string-fzi'. The 'string-fzb.h' provides boolean zero byte detection with the functions: - has_zero: determine if any byte within a word is zero. - has_eq: determine byte equality between two words. - has_zero_eq: determine if any byte within a word is zero along with byte equality between two words. The 'string-fzi.h' provides zero byte detection along with its positions: - index_first_zero: return index of first zero byte within a word. - index_first_eq: return index of first byte different between two words. - index_first_zero_eq: return index of first zero byte within a word or first byte different between two words. - index_first_zero_ne: return index of first zero byte within a word or first byte equal between two words. - index_last_zero: return index of last zero byte within a word. - index_last_eq: return index of last byte different between two words. Also, to avoid libcalls in the '__builtin_c{t,l}z{l}' calls (which may add performance degradation), inline implementation based on De Bruijn sequences are added (enabled by a configure check). Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> * config.h.in (HAVE_BUILTIN_CTZ, HAVE_BUILTIN_CLZ): New defines. * configure.ac: Check for __builtin_ctz{l} with no external dependencies * sysdeps/generic/string-extbyte.h: New file. * sysdeps/generic/string-fza.h: Likewise. * sysdeps/generic/string-fzb.h: Likewise. * sysdeps/generic/string-fzi.h: Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=447f15451d1761d9d1aad2b8440b19eb189e8811 commit 447f15451d1761d9d1aad2b8440b19eb189e8811 Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Thu Feb 23 18:45:54 2017 -0300 Add string-maskoff.h generic header Macros to operate on unaligned access for string operations: - create_mask: create a mask based on pointer alignment to sets up non-zero bytes before the beginning of the word so a following operation (such as find zero) might ignore these bytes. - highbit_mask: create a mask with high bit of each byte being 1, and the low 7 bits being all the opposite of the input. These macros are meant to be used on optimized vectorized string implementations. Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> * sysdeps/generic/string-maskoff.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c43ccd7a9f279b15f3e7241ecc35e49ea6330a81 commit c43ccd7a9f279b15f3e7241ecc35e49ea6330a81 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:15:27 2017 -0200 Parameterize OP_T_THRES from memcopy.h Basically it moves OP_T_THRES out of memcopy.h to its own header and adjust each architecture that redefines it. Checked with a build and check with run-built-tests=no for all major Linux ABIs (alpha, aarch64, arm, hppa, i686, ia64, m68k, microblaze, mips, mips64, nios2, powerpc, powerpc64le, s390x, sh4, sparc64, tilegx, and x86_64). Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> * sysdeps/generic/memcopy.h (OP_T_THRES): Move... * sysdeps/generic/string-opthr.h: ... here; new file. * sysdeps/i386/memcopy.h (OP_T_THRES): Move... * sysdeps/i386/string-opthr.h: ... here; new file. * sysdeps/m68k/memcopy.h (OP_T_THRES): Remove. * string/memcmp.c (OP_T_THRES): Remove definition. * sysdeps/powerpc/powerpc32/power4/memcopy.h (OP_T_THRES): Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1cca35adf040a1a22f48138569709ab71fd5bb32 commit 1cca35adf040a1a22f48138569709ab71fd5bb32 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:14:09 2017 -0200 Parameterize op_t from memcopy.h Basically moves op_t definition out to an specific header, adds the attribute 'may-alias', and cleanup its duplicated definitions. It lead to inclusion of tilegx32 gmp-mparam.h similar to x32 so op_t can be define as a long long (from _LONG_LONG_LIMB). Checked with a build and check with run-built-tests=no for all major Linux ABIs (alpha, aarch64, arm, hppa, i686, ia64, m68k, microblaze, mips, mips64, nios2, powerpc, powerpc64le, s390x, sh4, sparc64, tilegx, and x86_64). Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> * sysdeps/generic/string-optype.h: New file. * sysdeps/generic/memcopy.h: Include it. * string/memcmp.c (op_t): Remove define. * sysdeps/tile/memcmp.c (op_t): Likewise. * sysdeps/tile/memcopy.h (op_t): Likewise. * sysdeps/tile/tilegx32/gmp-mparam.h: New file. -----------------------------------------------------------------------
The comment is still wrong, but for another reason: In 71a5bd3e177e7748cf8993b0577d65d8986b44bc Ulrich Drepper <drepper@redhat.com> 2009-03-15 10:03:38 replaced the implementation, but not the comments. Today, the code is the hack taken from Alan Mycroft's HAKMEMC postings, but the comments describes an old implementation. It could be fixed be doing what Stas Yakovlev proposed on 2008-04-17 19:55:59 IST in https://sourceware.org/bugzilla/attachment.cgi?id=2703&action=diff.
In my view, the right approach for this issue is a general overhaul of the generic string functions as in Richard Henderson's / Adhemerval Zanella's patchset <https://sourceware.org/ml/libc-alpha/2018-01/msg00318.html> (that patchset may not address all instances of that comment, but it provides the infrastructure for doing so).
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, azanella/generic-strings has been created at af69c5ba72b80b2bc937243801349eb197ad5553 (commit) - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=af69c5ba72b80b2bc937243801349eb197ad5553 commit af69c5ba72b80b2bc937243801349eb197ad5553 Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Tue Feb 21 17:14:16 2017 -0300 sh: Add string-fzb.h Use the SH cmp/str on has_{zero,eq,zero_eq}. Checked on sh4-linux-gnu. Adhemerval Zanella <adhemerval.zanella@linaro.org> * sysdeps/sh/string-fzb.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=16f899cfa103a17c08c3b2f61a8d40f4f5e914a6 commit 16f899cfa103a17c08c3b2f61a8d40f4f5e914a6 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:26:18 2017 -0200 powerpc: Add string-fza.h While ppc has the more important string functions in assembly, there are still a few generic routines used. Use the Power 6 CMPB insn for testing of zeros. Checked on powerpc64le-linux-gnu. Richard Henderson <rth@twiddle.net> * sysdeps/powerpc/string-fza.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=4de86538e47d0b7fe56678b22f05550900e3b87a commit 4de86538e47d0b7fe56678b22f05550900e3b87a Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:24:23 2017 -0200 arm: Add string-fza.h While arm has the more important string functions in assembly, there are still a few generic routines used. Use the UQSUB8 insn for testing of zeros. Checked on armv7-linux-gnueabihf Richard Henderson <rth@twiddle.net> * sysdeps/arm/armv6t2/string-fza.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=38cd3552d8b6df18b14b4625ef29ae9e27f9a704 commit 38cd3552d8b6df18b14b4625ef29ae9e27f9a704 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:23:27 2017 -0200 alpha: Add string-fzb.h and string-fzi.h While alpha has the more important string functions in assembly, there are still a few for find the generic routines are used. Use the CMPBGE insn, via the builtin, for testing of zeros. Use a simplified expansion of __builtin_ctz when the insn isn't available. Checked on alpha-linux-gnu. Richard Henderson <rth@twiddle.net> * sysdeps/alpha/string-fzb.h: New file. * sysdeps/alpha/string-fzi.h: Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=70ea2db81287ba26a33960568f325ebfbd7479ef commit 70ea2db81287ba26a33960568f325ebfbd7479ef Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:22:39 2017 -0200 hppa: Add string-fzb.h and string-fzi.h Use UXOR,SBZ to test for a zero byte within a word. While we can get semi-decent code out of asm-goto, we would do slightly better with a compiler builtin. For index_zero et al, sequential testing of bytes is less expensive than any tricks that involve a count-leading-zeros insn that we don't have. Checked on hppa-linux-gnu. Richard Henderson <rth@twiddle.net> * sysdeps/hppa/string-fzb.h: New file. * sysdeps/hppa/string-fzi.h: Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=7f84429390ba38906bc7438f0035f81d207f2ab0 commit 7f84429390ba38906bc7438f0035f81d207f2ab0 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:22:02 2017 -0200 hppa: Add memcopy.h GCC's combine pass cannot merge (x >> c | y << (32 - c)) into a double-word shift unless (1) the subtract is in the same basic block and (2) the result of the subtract is used exactly once. Neither condition is true for any use of MERGE. By forcing the use of a double-word shift, we not only reduce contention on SAR, but also allow the setting of SAR to be hoisted outside of a loop. Checked on hppa-linux-gnu. Richard Henderson <rth@twiddle.net> * sysdeps/hppa/memcopy.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c9181a2ed5c336e080c6751a117e2f5e533ca06a commit c9181a2ed5c336e080c6751a117e2f5e533ca06a Author: Adhemerval Zanella <adhemerval.zanella@linaro.com> Date: Wed Mar 8 16:56:17 2017 +0100 string: Improve generic strcpy New generic implementation tries to use word operations along with the new string-fz{b,i} functions even for inputs with different alignments (with still uses aligned access plus merge operation to get a correct word by word comparison). Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> * string/strcpy.c: Rewrite using memcopy.h, string-fzb.h, string-fzi.h. * string/test-strcpy.c (test_main): Add more coverage. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=550081c1e4d6263b1b641a0ef50eb6002ada4019 commit 550081c1e4d6263b1b641a0ef50eb6002ada4019 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:21:26 2017 -0200 string: Improve generic strcmp New generic implementation tries to use word operations along with the new string-fz{b,i} functions even for inputs with different alignments (with still uses aligned access plus merge operation to get a correct word by word comparison). Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> * string/strcmp.c: Rewrite using memcopy.h, string-fzb.h, string-fzi.h. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=b9121635b413aa6584b385b7135803f868cf6673 commit b9121635b413aa6584b385b7135803f868cf6673 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:19:40 2017 -0200 string: Improve generic strchrnul New algorithm have the following key differences: - Reads first word unaligned and use string-maskoff function to remove unwanted data. This strategy follow assemble optimized ones for aarch64, powerpc and tile. - Use string-fz{b,i} functions. Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). [BZ #5806] * string/strchrnul.c: Use string-fzb.h, string-fzi.h. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=5e8d99f2a619b38f27092f286ad06ccf690f5d0b commit 5e8d99f2a619b38f27092f286ad06ccf690f5d0b Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:18:48 2017 -0200 string: Improve generic strchr New algorithm have the following key differences: - Reads first word unaligned and use string-maskoff function to remove unwanted data. This strategy follow assemble optimized ones for aarch64 and powerpc. - Use string-fz{b,i} and string-extbyte function. Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> [BZ #5806] * string/strchr.c: Use string-fzb.h, string-fzi.h, string-extbyte.h. * sysdeps/s390/multiarch/strchr-c.c: Redefine weak_alias. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=df07959c3721e083e10aaa4c3c1b9a483f9955fa commit df07959c3721e083e10aaa4c3c1b9a483f9955fa Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Thu Feb 16 16:21:03 2017 -0200 string: Improve generic strnlen With an optimized memchr, new strnlen implementation basically calls memchr and adjust the result pointer value. It also cleanups the multiple inclusion by leaving the ifunc implementation to undef the weak_alias and libc_hidden_def. Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> [BZ #5806] * string/strnlen.c: Rewrite in terms of memchr. * sysdeps/i386/i686/multiarch/strnlen-c.c: Redefine weak_alias and libc_hidden_def. * sysdeps/powerpc/powerpc32/power4/multiarch/strnlen-ppc32.c: Likewise. * sysdeps/s390/multiarch/strnlen-c.c: Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=bdd041c15306e91a9cc4d26975e73a7a7d84742b commit bdd041c15306e91a9cc4d26975e73a7a7d84742b Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:20:35 2017 -0200 string: Improve generic memrchr New algorithm have the following key differences: - Use string-fz{b,i} functions. It also cleanups the multiple inclusion by leaving the ifunc implementation to undef the weak_alias. Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> [BZ #5806] * string/memrchr.c: Use string-fzb.h, string-fzi.h. * sysdeps/i386/i686/multiarch/memrchr-c.c: Redefined weak_alias. * sysdeps/s390/multiarch/memrchr-c.c: Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=01c324107db27294a46141414fdc1094686f5508 commit 01c324107db27294a46141414fdc1094686f5508 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:19:12 2017 -0200 string: Improve generic memchr New algorithm have the following key differences: - Reads first word unaligned and use string-maskoff function to remove unwanted data. This strategy follow assemble optimized ones for aarch64, powerpc and tile. - Use string-fz{b,i} and string-opthr functions. Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). [BZ #5806] * string/memchr.c: Use string-fzb.h, string-fzi.h, string-opthr.h. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c1dd2be8155f7277d5604f5d91fc2d4a0fc03fd4 commit c1dd2be8155f7277d5604f5d91fc2d4a0fc03fd4 Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:18:24 2017 -0200 string: Improve generic strlen New algorithm have the following key differences: - Reads first word unaligned and use string-maskoff functions to remove unwanted data. This strategy follow assemble optimized ones for powerpc, sparc, and SH. - Use of has_zero and index_first_zero parametrized functions. Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). [BZ #5806] * string/strlen.c: Use them. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=4725ea128cef7de4faf023898b2b04f9176a2604 commit 4725ea128cef7de4faf023898b2b04f9176a2604 Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Mon Jan 8 16:41:43 2018 -0200 Add string vectorized find and detection functions This patch adds generic string find and detection implementation meant to be used in generic vectorized string implementation. The idea is to decompose the basic string operation so each architecture can reimplement if it provides any specialized hardware instruction. The 'string-fza.h' provides zero byte detection functions (find_zero_low, find_zero_all, find_eq_low, find_eq_all, find_zero_eq_low, find_zero_eq_all, find_zero_ne_low, and find_zero_ne_all). They are used on both functions provided by 'string-fzb.h' and 'string-fzi'. The 'string-fzb.h' provides boolean zero byte detection with the functions: - has_zero: determine if any byte within a word is zero. - has_eq: determine byte equality between two words. - has_zero_eq: determine if any byte within a word is zero along with byte equality between two words. The 'string-fzi.h' provides zero byte detection along with its positions: - index_first_zero: return index of first zero byte within a word. - index_first_eq: return index of first byte different between two words. - index_first_zero_eq: return index of first zero byte within a word or first byte different between two words. - index_first_zero_ne: return index of first zero byte within a word or first byte equal between two words. - index_last_zero: return index of last zero byte within a word. - index_last_eq: return index of last byte different between two words. Also, to avoid libcalls in the '__builtin_c{t,l}z{l}' calls (which may add performance degradation), inline implementation based on De Bruijn sequences are added (enabled by a configure check). Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> * config.h.in (HAVE_BUILTIN_CTZ, HAVE_BUILTIN_CLZ): New defines. * configure.ac: Check for __builtin_ctz{l} with no external dependencies * sysdeps/generic/string-extbyte.h: New file. * sysdeps/generic/string-fza.h: Likewise. * sysdeps/generic/string-fzb.h: Likewise. * sysdeps/generic/string-fzi.h: Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=eed029d7eeafd01c8488b4cbadef4ec9dad10164 commit eed029d7eeafd01c8488b4cbadef4ec9dad10164 Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Thu Feb 23 18:45:54 2017 -0300 Add string-maskoff.h generic header Macros to operate on unaligned access for string operations: - create_mask: create a mask based on pointer alignment to sets up non-zero bytes before the beginning of the word so a following operation (such as find zero) might ignore these bytes. - highbit_mask: create a mask with high bit of each byte being 1, and the low 7 bits being all the opposite of the input. These macros are meant to be used on optimized vectorized string implementations. Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> * sysdeps/generic/string-maskoff.h: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=aedf9dd6eb0995c4f805b0e1a76509ad0379a46d commit aedf9dd6eb0995c4f805b0e1a76509ad0379a46d Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:15:27 2017 -0200 Parameterize OP_T_THRES from memcopy.h Basically it moves OP_T_THRES out of memcopy.h to its own header and adjust each architecture that redefines it. Checked with a build and check with run-built-tests=no for all major Linux ABIs (alpha, aarch64, arm, hppa, i686, ia64, m68k, microblaze, mips, mips64, nios2, powerpc, powerpc64le, s390x, sh4, sparc64, tilegx, and x86_64). Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> * sysdeps/generic/memcopy.h (OP_T_THRES): Move... * sysdeps/generic/string-opthr.h: ... here; new file. * sysdeps/i386/memcopy.h (OP_T_THRES): Move... * sysdeps/i386/string-opthr.h: ... here; new file. * sysdeps/m68k/memcopy.h (OP_T_THRES): Remove. * string/memcmp.c (OP_T_THRES): Remove definition. * sysdeps/powerpc/powerpc32/power4/memcopy.h (OP_T_THRES): Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=e5364d387c6a5436015f9083a2b26d68eab0b2ec commit e5364d387c6a5436015f9083a2b26d68eab0b2ec Author: Richard Henderson <rth@twiddle.net> Date: Thu Feb 16 16:14:09 2017 -0200 Parameterize op_t from memcopy.h Basically moves op_t definition out to an specific header, adds the attribute 'may-alias', and cleanup its duplicated definitions. It lead to inclusion of tilegx32 gmp-mparam.h similar to x32 so op_t can be define as a long long (from _LONG_LONG_LIMB). Checked with a build and check with run-built-tests=no for all major Linux ABIs (alpha, aarch64, arm, hppa, i686, ia64, m68k, microblaze, mips, mips64, nios2, powerpc, powerpc64le, s390x, sh4, sparc64, tilegx, and x86_64). Richard Henderson <rth@twiddle.net> Adhemerval Zanella <adhemerval.zanella@linaro.org> * sysdeps/generic/string-optype.h: New file. * sysdeps/generic/memcopy.h: Include it. * string/memcmp.c (op_t): Remove define. * sysdeps/tile/memcmp.c (op_t): Likewise. * sysdeps/tile/memcopy.h (op_t): Likewise. * sysdeps/tile/tilegx32/gmp-mparam.h: New file. -----------------------------------------------------------------------
(In reply to joseph@codesourcery.com from comment #11) > In my view, the right approach for this issue is a general overhaul of the > generic string functions as in Richard Henderson's / Adhemerval Zanella's > patchset <https://sourceware.org/ml/libc-alpha/2018-01/msg00318.html> > (that patchset may not address all instances of that comment, but it > provides the infrastructure for doing so). I updated my personal branch rebased against master (it removed tile changes and fixed some build issues).
The original comment is not present on generic implementation, only for some asm optimization for i386 (which I think we can just remove it).