Created attachment 7444 [details] make string/strtok match x86 The strtok.S implementations for x86_64 and i386 vary from the generic string/strtok.c version. In the former case, if str == NULL, and the saved string is also NULL, the strtok call returns NULL. In contrast, the string/strtok.c call proceeds to pass s = olds = NULL to strspn which consequently crashes. While this behaviour is probably permissible, it results in odd portability issues where the behaviour can't be reproduced on x86_64. As well, the generic versions in the BSD libc I looked at (which appears to date back to 4.3BSD or earlier...) also checks for the (s = olds) == NULL condition and handles it, so we have a bit of precedent here. Attached is a patch which brings the generic string/strtok.c in-line with i386, x86_64 and BSD. It seems better to do that, rather than suddenly make working code SIGSEGV on x86_64... regards, Kyle
(In reply to Kyle McMartin from comment #0) > Created attachment 7444 [details] > make string/strtok match x86 > > The strtok.S implementations for x86_64 and i386 vary from the generic > string/strtok.c version. In the former case, if str == NULL, and the saved > string is also NULL, the strtok call returns NULL. > > In contrast, the string/strtok.c call proceeds to pass s = olds = NULL to > strspn which consequently crashes. > > While this behaviour is probably permissible, it results in odd portability > issues where the behaviour can't be reproduced on x86_64. As well, the > generic versions in the BSD libc I looked at (which appears to date back to > 4.3BSD or earlier...) also checks for the (s = olds) == NULL condition and > handles it, so we have a bit of precedent here. > > Attached is a patch which brings the generic string/strtok.c in-line with > i386, x86_64 and BSD. It seems better to do that, rather than suddenly make > working code SIGSEGV on x86_64... The x86_64 and i386 implementations are wrong, they should also fault. This issue hs come up so many times we've enshrined it into our coding style and guidlines: https://sourceware.org/glibc/wiki/Style_and_Conventions#Invalid_pointers ~~~ 9. Invalid pointers The GNU C library considers it a QoI feature not to mask user bugs by detecting invalid pointers and returning EINVAL (unless the API is standardized and says it does that). If passing a bad pointer has undefined behavior, it is far more useful in the long run if it crashes quickly rather than diagnosing an error that is probably ignored by the flaky caller. 9.1. NULL pointers If you're going to check for NULL pointer arguments where you have not entered into a contract to accept and interpret them, do so with an assert, not a conditional error return. This way the bugs in the caller will be immediately detected and can be fixed, and it makes it easy to disable the overhead in production builds. The assert can be valuable as code documentation. However, a segfault from dereferencing the NULL pointer is just as effective for debugging. If you return an error code to a caller which has already proven itself buggy, the most likely result is that the caller will ignore the error, and bad things will happen much later down the line when the original cause of the error has become difficult or impossible to track down. Why is it reasonable to assume the caller will ignore the error you return? Because the caller already ignored the error return of malloc or fopen or some other library-specific allocation function which returned NULL to indicate an error. In summary: If you have no contract to accept NULL and you don't immediately dereference the pointer then use an assert to raise an error when NULL is passed as an invalid argument. If you have no contract to accept NULL and immediately dereference the pointer then the segfault is sufficient to indicate the error. If you have a contract to accept NULL then do so. ~~~ In this case strtok has no contract to accept a NULL for s2 and therefore segfaulting is the best option. I'd accept a patch to have the x86 and x86-64 assembly crash since a NULL s2 has no valid meaning.
On Thu, Feb 27, 2014 at 06:54:43AM +0000, carlos at redhat dot com wrote: > https://sourceware.org/bugzilla/show_bug.cgi?id=16640 > > Carlos O'Donell <carlos at redhat dot com> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Status|NEW |WAITING > CC| |carlos at redhat dot com > > --- Comment #1 from Carlos O'Donell <carlos at redhat dot com> --- > (In reply to Kyle McMartin from comment #0) > > Created attachment 7444 [details] > > make string/strtok match x86 > > > > The strtok.S implementations for x86_64 and i386 vary from the generic > > string/strtok.c version. In the former case, if str == NULL, and the saved > > string is also NULL, the strtok call returns NULL. > > > > In contrast, the string/strtok.c call proceeds to pass s = olds = NULL to > > strspn which consequently crashes. > > > > While this behaviour is probably permissible, it results in odd portability > > issues where the behaviour can't be reproduced on x86_64. As well, the > > generic versions in the BSD libc I looked at (which appears to date back to > > 4.3BSD or earlier...) also checks for the (s = olds) == NULL condition and > > handles it, so we have a bit of precedent here. > > > > Attached is a patch which brings the generic string/strtok.c in-line with > > i386, x86_64 and BSD. It seems better to do that, rather than suddenly make > > working code SIGSEGV on x86_64... > > The x86_64 and i386 implementations are wrong, they should also fault. > I looked to implementations and they are outdated, a generic implementation with sse4_2 strpbrk should be faster here. I will send patch to remove these once I check performance impact.
(In reply to Ondrej Bilka from comment #2) > > The x86_64 and i386 implementations are wrong, they should also fault. > > > I looked to implementations and they are outdated, a generic > implementation with sse4_2 strpbrk should be faster here. > > I will send patch to remove these once I check performance impact. Thanks Ondrej!
Meh. Your manpage says "conforming to ... 4.3BSD" so I'd expect behaviour consistent with that, which the assembly versions provide. But I don't really care one way or another as long as it's consistent.
(In reply to Kyle McMartin from comment #4) > Meh. Your manpage says "conforming to ... 4.3BSD" so I'd expect behaviour > consistent with that, which the assembly versions provide. But I don't > really care one way or another as long as it's consistent. I'd consider this also a bug in BSD.
ok, i'm done arguing, but i just went to check system III, and that also checks this case... so basically glibc is the only place being difficult. feel free to close this, or re-use it for the x86_64/i386 asm removal or whatever you think is best. ;-)
Fixed on 2.26 (f2d7f23a300f57).
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, master has been updated via f2d7f23a300f57e36cd849ce80a93ccbcebd9968 (commit) from 841a67a0ade3ea9e3d10c044524a199fa608cc7e (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=f2d7f23a300f57e36cd849ce80a93ccbcebd9968 commit f2d7f23a300f57e36cd849ce80a93ccbcebd9968 Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Thu Jan 5 19:43:25 2017 -0200 Remove i686, x86_64, and powerpc strtok implementations Based on comments on previous attempt to address BZ#16640 [1], the idea is not support invalid use of strtok (the original bug report proposal). This leader to a new strtok optimized strtok implementation [2]. The idea of this patch is to fix BZ#16640 to align all the implementations to a same contract. However, with newer strtok code it is better to get remove the old assembly ones instead of fix them. For x86 is a gain in all cases since the new implementation can potentially use sse2/sse42 implementation for strspn and strcspn. This shows a better performance on both i686 and x86_64 using the string benchtests. On powerpc64 the gains are mixed, where only for larger inputs or keys some gains are showns (based on benchtest it seems that it shows some gains for keys larger than 10 and inputs larger than 32). I would prefer to remove the optimized implementation based on first code simplicity and second because some more gain could be optimized using a better optimized strcspn/strspn code (as for x86). However if powerpc arch maintainers prefer I can send a v2 with the assembly code adjusted instead. Checked on x86_64-linux-gnu, i686-linux-gnu, and powerpc64le-linux-gnu. [BZ #16640] * sysdeps/i386/i686/strtok.S: Remove file. * sysdeps/i386/i686/strtok_r.S: Likewise. * sysdeps/i386/strtok.S: Likewise. * sysdeps/i386/strtok_r.S: Likewise. * sysdeps/powerpc/powerpc64/strtok.S: Likewise. * sysdeps/powerpc/powerpc64/strtok_r.S: Likewise. * sysdeps/x86_64/strtok.S: Likewise. * sysdeps/x86_64/strtok_r.S: Likewise. [1] https://sourceware.org/ml/libc-alpha/2016-10/msg00411.html [2] https://sourceware.org/ml/libc-alpha/2016-12/msg00461.html ----------------------------------------------------------------------- Summary of changes: ChangeLog | 10 + sysdeps/i386/i686/strtok.S | 244 ----------------------- sysdeps/i386/i686/strtok_r.S | 5 - sysdeps/i386/strtok.S | 358 ---------------------------------- sysdeps/i386/strtok_r.S | 5 - sysdeps/powerpc/powerpc64/strtok.S | 226 --------------------- sysdeps/powerpc/powerpc64/strtok_r.S | 24 --- sysdeps/x86_64/strtok.S | 208 -------------------- sysdeps/x86_64/strtok_r.S | 5 - 9 files changed, 10 insertions(+), 1075 deletions(-) delete mode 100644 sysdeps/i386/i686/strtok.S delete mode 100644 sysdeps/i386/i686/strtok_r.S delete mode 100644 sysdeps/i386/strtok.S delete mode 100644 sysdeps/i386/strtok_r.S delete mode 100644 sysdeps/powerpc/powerpc64/strtok.S delete mode 100644 sysdeps/powerpc/powerpc64/strtok_r.S delete mode 100644 sysdeps/x86_64/strtok.S delete mode 100644 sysdeps/x86_64/strtok_r.S