Bug 16640 - string/strtok.c: undefined behaviour inconsistent between x86 and other generic code
Summary: string/strtok.c: undefined behaviour inconsistent between x86 and other gener...
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: string (show other bugs)
Version: unspecified
: P2 minor
Target Milestone: 2.26
Assignee: Adhemerval Zanella
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-02-26 20:35 UTC by Kyle McMartin
Modified: 2017-02-06 12:46 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
make string/strtok match x86 (196 bytes, patch)
2014-02-26 20:35 UTC, Kyle McMartin
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kyle McMartin 2014-02-26 20:35:29 UTC
Created attachment 7444 [details]
make string/strtok match x86

The strtok.S implementations for x86_64 and i386 vary from the generic string/strtok.c version. In the former case, if str == NULL, and the saved string is also NULL, the strtok call returns NULL.

In contrast, the string/strtok.c call proceeds to pass s = olds = NULL to strspn which consequently crashes.

While this behaviour is probably permissible, it results in odd portability issues where the behaviour can't be reproduced on x86_64. As well, the generic versions in the BSD libc I looked at (which appears to date back to 4.3BSD or earlier...) also checks for the (s = olds) == NULL condition and handles it, so we have a bit of precedent here.

Attached is a patch which brings the generic string/strtok.c in-line with i386, x86_64 and BSD. It seems better to do that, rather than suddenly make working code SIGSEGV on x86_64...

regards, Kyle
Comment 1 Carlos O'Donell 2014-02-27 06:54:43 UTC
(In reply to Kyle McMartin from comment #0)
> Created attachment 7444 [details]
> make string/strtok match x86
> 
> The strtok.S implementations for x86_64 and i386 vary from the generic
> string/strtok.c version. In the former case, if str == NULL, and the saved
> string is also NULL, the strtok call returns NULL.
> 
> In contrast, the string/strtok.c call proceeds to pass s = olds = NULL to
> strspn which consequently crashes.
> 
> While this behaviour is probably permissible, it results in odd portability
> issues where the behaviour can't be reproduced on x86_64. As well, the
> generic versions in the BSD libc I looked at (which appears to date back to
> 4.3BSD or earlier...) also checks for the (s = olds) == NULL condition and
> handles it, so we have a bit of precedent here.
> 
> Attached is a patch which brings the generic string/strtok.c in-line with
> i386, x86_64 and BSD. It seems better to do that, rather than suddenly make
> working code SIGSEGV on x86_64...

The x86_64 and i386 implementations are wrong, they should also fault.

This issue hs come up so many times we've enshrined it into our coding style and guidlines:
https://sourceware.org/glibc/wiki/Style_and_Conventions#Invalid_pointers
~~~
9. Invalid pointers

The GNU C library considers it a QoI feature not to mask user bugs by detecting invalid pointers and returning EINVAL (unless the API is standardized and says it does that). If passing a bad pointer has undefined behavior, it is far more useful in the long run if it crashes quickly rather than diagnosing an error that is probably ignored by the flaky caller.

9.1. NULL pointers

If you're going to check for NULL pointer arguments where you have not entered into a contract to accept and interpret them, do so with an assert, not a conditional error return. This way the bugs in the caller will be immediately detected and can be fixed, and it makes it easy to disable the overhead in production builds. The assert can be valuable as code documentation. However, a segfault from dereferencing the NULL pointer is just as effective for debugging. If you return an error code to a caller which has already proven itself buggy, the most likely result is that the caller will ignore the error, and bad things will happen much later down the line when the original cause of the error has become difficult or impossible to track down. Why is it reasonable to assume the caller will ignore the error you return? Because the caller already ignored the error return of malloc or fopen or some other library-specific allocation function which returned NULL to indicate an error.

In summary:

    If you have no contract to accept NULL and you don't immediately dereference the pointer then use an assert to raise an error when NULL is passed as an invalid argument.
    If you have no contract to accept NULL and immediately dereference the pointer then the segfault is sufficient to indicate the error.
    If you have a contract to accept NULL then do so. 
~~~

In this case strtok has no contract to accept a NULL for s2 and therefore segfaulting is the best option.

I'd accept a patch to have the x86 and x86-64 assembly crash since a NULL s2 has no valid meaning.
Comment 2 Ondrej Bilka 2014-02-27 12:14:49 UTC
On Thu, Feb 27, 2014 at 06:54:43AM +0000, carlos at redhat dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=16640
> 
> Carlos O'Donell <carlos at redhat dot com> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>              Status|NEW                         |WAITING
>                  CC|                            |carlos at redhat dot com
> 
> --- Comment #1 from Carlos O'Donell <carlos at redhat dot com> ---
> (In reply to Kyle McMartin from comment #0)
> > Created attachment 7444 [details]
> > make string/strtok match x86
> > 
> > The strtok.S implementations for x86_64 and i386 vary from the generic
> > string/strtok.c version. In the former case, if str == NULL, and the saved
> > string is also NULL, the strtok call returns NULL.
> > 
> > In contrast, the string/strtok.c call proceeds to pass s = olds = NULL to
> > strspn which consequently crashes.
> > 
> > While this behaviour is probably permissible, it results in odd portability
> > issues where the behaviour can't be reproduced on x86_64. As well, the
> > generic versions in the BSD libc I looked at (which appears to date back to
> > 4.3BSD or earlier...) also checks for the (s = olds) == NULL condition and
> > handles it, so we have a bit of precedent here.
> > 
> > Attached is a patch which brings the generic string/strtok.c in-line with
> > i386, x86_64 and BSD. It seems better to do that, rather than suddenly make
> > working code SIGSEGV on x86_64...
> 
> The x86_64 and i386 implementations are wrong, they should also fault.
> 
I looked to implementations and they are outdated, a generic
implementation with sse4_2 strpbrk should be faster here.

I will send patch to remove these once I check performance impact.
Comment 3 Carlos O'Donell 2014-02-27 14:59:56 UTC
(In reply to Ondrej Bilka from comment #2)
> > The x86_64 and i386 implementations are wrong, they should also fault.
> > 
> I looked to implementations and they are outdated, a generic
> implementation with sse4_2 strpbrk should be faster here.
> 
> I will send patch to remove these once I check performance impact.

Thanks Ondrej!
Comment 4 Kyle McMartin 2014-02-27 17:19:17 UTC
Meh. Your manpage says "conforming to ... 4.3BSD" so I'd expect behaviour consistent with that, which the assembly versions provide. But I don't really care one way or another as long as it's consistent.
Comment 5 Carlos O'Donell 2014-02-27 18:44:40 UTC
(In reply to Kyle McMartin from comment #4)
> Meh. Your manpage says "conforming to ... 4.3BSD" so I'd expect behaviour
> consistent with that, which the assembly versions provide. But I don't
> really care one way or another as long as it's consistent.

I'd consider this also a bug in BSD.
Comment 6 Kyle McMartin 2014-02-27 18:48:30 UTC
ok, i'm done arguing, but i just went to check system III, and that also checks this case... so basically glibc is the only place being difficult.

feel free to close this, or re-use it for the x86_64/i386 asm removal or whatever you think is best. ;-)
Comment 7 Adhemerval Zanella 2017-02-06 12:45:56 UTC
Fixed on 2.26 (f2d7f23a300f57).
Comment 8 cvs-commit@gcc.gnu.org 2017-02-06 12:46:14 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  f2d7f23a300f57e36cd849ce80a93ccbcebd9968 (commit)
      from  841a67a0ade3ea9e3d10c044524a199fa608cc7e (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=f2d7f23a300f57e36cd849ce80a93ccbcebd9968

commit f2d7f23a300f57e36cd849ce80a93ccbcebd9968
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Thu Jan 5 19:43:25 2017 -0200

    Remove i686, x86_64, and powerpc strtok implementations
    
    Based on comments on previous attempt to address BZ#16640 [1],
    the idea is not support invalid use of strtok (the original
    bug report proposal).  This leader to a new strtok optimized
    strtok implementation [2].
    
    The idea of this patch is to fix BZ#16640 to align all the
    implementations to a same contract.  However, with newer strtok
    code it is better to get remove the old assembly ones instead of
    fix them.
    
    For x86 is a gain in all cases since the new implementation can
    potentially use sse2/sse42 implementation for strspn and strcspn.
    This shows a better performance on both i686 and x86_64 using
    the string benchtests.
    
    On powerpc64 the gains are mixed, where only for larger inputs
    or keys some gains are showns (based on benchtest it seems that
    it shows some gains for keys larger than 10 and inputs larger
    than 32).  I would prefer to remove the optimized implementation
    based on first code simplicity and second because some more gain
    could be optimized using a better optimized strcspn/strspn
    code (as for x86).  However if powerpc arch maintainers prefer I
    can send a v2 with the assembly code adjusted instead.
    
    Checked on x86_64-linux-gnu, i686-linux-gnu, and powerpc64le-linux-gnu.
    
    	[BZ #16640]
    	* sysdeps/i386/i686/strtok.S: Remove file.
    	* sysdeps/i386/i686/strtok_r.S: Likewise.
    	* sysdeps/i386/strtok.S: Likewise.
    	* sysdeps/i386/strtok_r.S: Likewise.
    	* sysdeps/powerpc/powerpc64/strtok.S: Likewise.
    	* sysdeps/powerpc/powerpc64/strtok_r.S: Likewise.
    	* sysdeps/x86_64/strtok.S: Likewise.
    	* sysdeps/x86_64/strtok_r.S: Likewise.
    
    [1] https://sourceware.org/ml/libc-alpha/2016-10/msg00411.html
    [2] https://sourceware.org/ml/libc-alpha/2016-12/msg00461.html

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                            |   10 +
 sysdeps/i386/i686/strtok.S           |  244 -----------------------
 sysdeps/i386/i686/strtok_r.S         |    5 -
 sysdeps/i386/strtok.S                |  358 ----------------------------------
 sysdeps/i386/strtok_r.S              |    5 -
 sysdeps/powerpc/powerpc64/strtok.S   |  226 ---------------------
 sysdeps/powerpc/powerpc64/strtok_r.S |   24 ---
 sysdeps/x86_64/strtok.S              |  208 --------------------
 sysdeps/x86_64/strtok_r.S            |    5 -
 9 files changed, 10 insertions(+), 1075 deletions(-)
 delete mode 100644 sysdeps/i386/i686/strtok.S
 delete mode 100644 sysdeps/i386/i686/strtok_r.S
 delete mode 100644 sysdeps/i386/strtok.S
 delete mode 100644 sysdeps/i386/strtok_r.S
 delete mode 100644 sysdeps/powerpc/powerpc64/strtok.S
 delete mode 100644 sysdeps/powerpc/powerpc64/strtok_r.S
 delete mode 100644 sysdeps/x86_64/strtok.S
 delete mode 100644 sysdeps/x86_64/strtok_r.S