This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: memcmp-sse4.S EqualHappy bug

From: Andrea Arcangeli <aarcange at redhat dot com>
To: Torvald Riegel <triegel at redhat dot com>
Cc: Szabolcs Nagy <nsz at port70 dot net>, libc-alpha at sourceware dot org, "H.J. Lu" <hongjiu dot lu at intel dot com>
Date: Thu, 18 Jun 2015 16:52:02 +0200
Subject: Re: memcmp-sse4.S EqualHappy bug
Authentication-results: sourceware.org; auth=none
References: <20150617172903 dot GC4317 at redhat dot com> <20150617185952 dot GE22285 at port70 dot net> <20150617210612 dot GB14955 at redhat dot com> <1434621040 dot 5250 dot 212 dot camel at localhost dot localdomain> <20150618124900 dot GD14955 at redhat dot com> <1434637415 dot 5250 dot 271 dot camel at localhost dot localdomain>

On Thu, Jun 18, 2015 at 04:23:35PM +0200, Torvald Riegel wrote:
> There are a few differences to your memcmp test case though.  If
> barrier() is a compiler barrier, this should "remove" prior knowledge
> the compiler has about memory.  Second, you access volatile-qualified
> data, which at least requires to do the accesses byte-wise or similar.
> Does the memcmp test case also fail when you acess volatiles?

Actually the kernel code in the __builtin_memcpy case removes the
volatile qualifier.

On a side note it's not normal to use READ_ONCE/WRITE_ONCE on
objects larger than sizeof(long) and it would warn at build time but
it would build successfully.

Changing my testcase like this just creates warning about volatile
warning: passing argument 1 of âmemsetâ discards âvolatileâ qualifier
from pointer target type, etc... it still builds and fail at runtime
like before. I added another barrier as well but it can't make a
difference, the problem is very clear and how to prevent this
practical problem is clear too. As long as memcmp-sse4.S is called
there's nothing the compiler can do to prevent this problem from
materializing.

diff --git a/equalhappybug.c b/equalhappybug.c
index 6c5fac5..ed51fd7 100644
--- a/equalhappybug.c
+++ b/equalhappybug.c
@@ -18,7 +18,7 @@

 static long nr_cpus, page_size;
 #define NON_ZERO_OFFSET 16U
-static char *page, *zeropage;
+static volatile char *page, *zeropage;
 volatile int finished;

 static int my_bcmp(char *str1, char *str2, unsigned long n)
@@ -47,6 +47,7 @@ static void *locking_thread(void *arg)
 #else
 		unsigned long loops = 0;
 		//while (!bcmp(page, zeropage, page_size))
+		asm volatile("" : : : "memory");
 		while (!memcmp(page, zeropage, page_size)) {
 			loops += 1;
 			asm volatile("" : : : "memory");

> Note that I'm not saying that this is guaranteed to work.  It's still UB
> if there is a data race.  However, in practice, this might just continue
> to work.

READ_ONCE/WRITE_ONCE are only used if there is going to be a data
race, otherwise they're never used.

> Userland programs should really try to transition to atomics and the
> C11/C++11 memory model or just the __atomic builtins, if possible.

We should look into that.

Without a way to access non-atomic regions with C, RCU in the kernel
cannot work. But if memcpy in the kernel has to behave sane (and not
break out of the loop too happily) in presence of not-atomic changes
to the memory, then I don't see why memcmp in userland is different.

Also note there is also an userspace RCU, I don't know exactly how
that works because I never used it, but the whole concept of RCU is to
allow C language to access not-atomic regions of memory and to get
away with it. And if that way is memcpy, I don't see what's
fundamentally different about memcpy being required to cope with that,
but memcmp not.

I totally see why memcmp-sse4.S is safe by the standard, just the
standard is not the only way to gain performance, RCU is required to
scale 100% in SMP, pthread_mutex_rwlock wouldn't scale, so it's
unthinkable to get rid of READ_ONCE/WRITE_ONCE. Switching to __atomic
would be fine, and if __atomic can make memcpy (and IMHO memcmp)
behave in a way they won't happily return too soon on __atomic
builtins, it'd be great.

Follow-Ups:
- Re: memcmp-sse4.S EqualHappy bug
  - From: Torvald Riegel

References:
- memcmp-sse4.S EqualHappy bug
  - From: Andrea Arcangeli
- Re: memcmp-sse4.S EqualHappy bug
  - From: Szabolcs Nagy
- Re: memcmp-sse4.S EqualHappy bug
  - From: Andrea Arcangeli
- Re: memcmp-sse4.S EqualHappy bug
  - From: Torvald Riegel
- Re: memcmp-sse4.S EqualHappy bug
  - From: Andrea Arcangeli
- Re: memcmp-sse4.S EqualHappy bug
  - From: Torvald Riegel

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]