This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
* Anton Blanchard: > Hi Florian, > >> > I see a 16% regression on ppc64le with a simple threaded malloc test >> > case. I guess the C11 atomics aren't as good as what we have in >> > glibc. >> >> Uh-oh. Would you please check if replacing the two >> atomic_load_acquire with atomic_load_relaxed restore the previous >> performance? > > As you suspect, doing this does restore the performance. The two lwsync > barrier instructions must be causing the slow down. Okay, I'll post a patch to revert that commit. However, the old code had this in _int_malloc (where the arena lock is acquired): #define REMOVE_FB(fb, victim, pp) \ do \ { \ victim = pp; \ if (victim == NULL) \ break; \ } \ while ((pp = catomic_compare_and_exchange_val_acq (fb, victim->fd, victim)) \ != victim); \ … REMOVE_FB (fb, pp, victim); And this in _int_free (without the arena lock): do { /* Check that the top of the bin is not the record we are going to add (i.e., double free). */ if (__builtin_expect (old == p, 0)) malloc_printerr ("double free or corruption (fasttop)"); p->fd = old2 = old; } while ((old = catomic_compare_and_exchange_val_rel (fb, p, old2)) != old2); I really don't see what makes sure that the store of p->fd happens before the load of victim->fd. It works out on POWER for some reason. I'm attaching a test case that should exercise these two code paths. Thanks, Florian
Attachment:
parallel-free.c
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |