This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: memcmp-sse4.S EqualHappy bug

From: Torvald Riegel <triegel at redhat dot com>
To: "Dr. David Alan Gilbert" <dgilbert at redhat dot com>
Cc: Simo Sorce <ssorce at redhat dot com>, Andrea Arcangeli <aarcange at redhat dot com>, "Carlos O'Donell" <carlos at redhat dot com>, Szabolcs Nagy <nsz at port70 dot net>, libc-alpha at sourceware dot org, "H.J. Lu" <hongjiu dot lu at intel dot com>
Date: Sat, 20 Jun 2015 15:00:00 +0200
Subject: Re: memcmp-sse4.S EqualHappy bug
Authentication-results: sourceware.org; auth=none
References: <20150618161943 dot GN14955 at redhat dot com> <20150618172231 dot GS14955 at redhat dot com> <1434649785 dot 30819 dot 37 dot camel at localhost dot localdomain> <20150618181219 dot GL2248 at work-vm> <55841B6B dot 10104 at redhat dot com> <20150619140710 dot GA14955 at redhat dot com> <1434724946 dot 2716 dot 88 dot camel at willson dot usersys dot redhat dot com> <1434727945 dot 3061 dot 51 dot camel at localhost dot localdomain> <20150619155907 dot GF2147 at work-vm> <1434733019 dot 3061 dot 68 dot camel at localhost dot localdomain> <20150619180635 dot GL2147 at work-vm>

On Fri, 2015-06-19 at 19:06 +0100, Dr. David Alan Gilbert wrote:
> * Torvald Riegel (triegel@redhat.com) wrote:
> > On Fri, 2015-06-19 at 16:59 +0100, Dr. David Alan Gilbert wrote:
> > > * Torvald Riegel (triegel@redhat.com) wrote:
> > > > On Fri, 2015-06-19 at 10:42 -0400, Simo Sorce wrote:
> > > > > On Fri, 2015-06-19 at 16:07 +0200, Andrea Arcangeli wrote:
> > > > > > On Fri, Jun 19, 2015 at 09:38:51AM -0400, Carlos O'Donell wrote:
> > > > > > > I agree with aborting, but only as long as the hot path's performance
> > > > > > > is not impacted and I haven't thought about how to do that.
> > > > > > > 
> > > 
> > > > > Clearly people are better using atomic comparisons on canary values
> > > > > instead, but it seem easy to avoid false positives (returning 0 when
> > > > > memory is clearly different) and keep these things working, so why not
> > > > > do it ?
> > > > 
> > > > I see two separate issues here.  First, where do we draw the line, and
> > > > what do we guarantee.  I strongly believe that programs must not have
> > > > data races, and that they should use atomics or other synchronization
> > > > properly (this doesn't mean locks, but relaxed memory order atomics, for
> > > > example).
> > > > Second, do we try to keep buggy programs working.  If it has no cost to
> > > > do so (e.g., like it might be true in this case), then doing that can
> > > > help to trigger less errors.  But that doesn't mean the buggy programs
> > > > should get fixed eventually.
> > > 
> > > I find it difficult to understand the boundaries of what the C library
> > > is allowed to do in this type of optimisation.
> > > 
> > > For example, consider the following:
> > > 
> > >     char a[128];
> > >     char b[128];
> > > 
> > >     put some static data in a[0-63]
> > >     put some static data in b[0-63]
> > >     a[64]=0;
> > >     b[64]=0;
> > >     start a thread doing stuff in a[65..]
> > >     start a thread doing stuff in b[65..]
> > >     
> > >     if (!strcmp(a,b)) {
> > >       /* Do something */
> > >     }
> > > 
> > >    a) Is that behaviour defined?
> > 
> > Good question.  I think it should be.  This depends on both the data
> > race definition and what strcmp/strncmp/memcmp are specified to do using
> > the abstract machine.
> > 
> > The data race definition uses memory locations as granularity, which is
> > in 3.14 in C11.  Separate characters in an array should be separate
> > memory locations.
> > 
> > C11 isn't very specific regarding strcmp, and just says that it
> > "compares the string[s]" (7.24.4.2).  C++14 is a bit more specific
> > regarding basic_string::compare (21.4.7.9), saying that first the length
> > of the strings are determined, and then a strncmp is run using the
> > smaller of the two lengths.
> > 
> > Assuming the C++ specs, only the array indices [0..64] should be
> > accessed by the abstract machine.  So no data race with the stuff going
> > on in [65..).
> > 
> > >    b) Is it defined with strncmp(a,b,64) ?
> > 
> > Yes.
> > 
> > >    c) Is it defined with strncmp(a,b,128)?
> > 
> > Not sure.  C11 says that not more than "n characters" are compared, and
> > characters that follow the a null character aren't compared either.
> > This indicates it wouldn't be different from strncmp(a,b,64) in the
> > particular case.
> > Regarding C++11, I'm not sure.  The closest copies a substring
> > (conceptually) and then compares, but the substring copying has to
> > determine length of the string and then subtracting the max length.
> > This would do a strlen first, which wouldn't access past index 64.
> > Thus, should be fine too.
> > 
> > >    d) Is it defined with memcmp(a,b,64)?
> > 
> > No data race, IMO.
> 
> OK, so you mean that's defined behavior?

Yes, there's no undefined behavior, so memcmp should do what it's
specified to do in the normal case.

> > >    e) Is it defined with memcmp(a,b,128)?
> > 
> > Data race.  Undefined behavior.
> 
> OK, so it's interesting your answer to (e) is different from (c);
> aren't there compiler optimisations that replace some strcmp s with
> memcmp s? (I'm not sure whether it would in this case?)

I believe it only does that when the length of the strings is known.

> > >    f) If I moved that boundary off a nice round % 8 mark would
> > >       it matter?
> > 
> > No difference as far as the standard is concerned.
> 
> OK, and I've not looked at this x86 code in detail, but are we sure
> that where the memcmp length finishes in the middle of a word
> the bytes after the end can't cause a match/mismatch that would
> later change and be classed as undefined?

I haven't looked at the code either, but this would be a bug.  It could
also introduce data races, which the implementation must not do :)

References:
- Re: memcmp-sse4.S EqualHappy bug
  - From: Andrea Arcangeli
- Re: memcmp-sse4.S EqualHappy bug
  - From: Andrea Arcangeli
- Re: memcmp-sse4.S EqualHappy bug
  - From: Torvald Riegel
- Re: memcmp-sse4.S EqualHappy bug
  - From: Dr. David Alan Gilbert
- Re: memcmp-sse4.S EqualHappy bug
  - From: Carlos O'Donell
- Re: memcmp-sse4.S EqualHappy bug
  - From: Andrea Arcangeli
- Re: memcmp-sse4.S EqualHappy bug
  - From: Simo Sorce
- Re: memcmp-sse4.S EqualHappy bug
  - From: Torvald Riegel
- Re: memcmp-sse4.S EqualHappy bug
  - From: Dr. David Alan Gilbert
- Re: memcmp-sse4.S EqualHappy bug
  - From: Torvald Riegel
- Re: memcmp-sse4.S EqualHappy bug
  - From: Dr. David Alan Gilbert

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]