This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] malloc: Use current (C11-style) atomics for fastbin access

From: Florian Weimer <fweimer at redhat dot com>
To: Anton Blanchard <anton at ozlabs dot org>
Cc: libc-alpha at sourceware dot org, Tulio Magno Quites Machado Filho <tuliom at linux dot vnet dot ibm dot com>, Paul Clarke <pc at us dot ibm dot com>, Bill Schmidt <wschmidt at us dot ibm dot com>
Date: Wed, 16 Jan 2019 13:30:16 +0100
Subject: Re: [PATCH] malloc: Use current (C11-style) atomics for fastbin access
References: <87va52nupb.fsf@oldenburg.str.redhat.com> <20190116092655.151bdbbd@kryten> <87won5zgz5.fsf@oldenburg2.str.redhat.com> <20190116171838.25c612fd@kryten>

* Anton Blanchard:

> Hi Florian,
>
>> > I see a 16% regression on ppc64le with a simple threaded malloc test
>> > case. I guess the C11 atomics aren't as good as what we have in
>> > glibc.  
>> 
>> Uh-oh.  Would you please check if replacing the two
>> atomic_load_acquire with atomic_load_relaxed restore the previous
>> performance?
>
> As you suspect, doing this does restore the performance. The two lwsync
> barrier instructions must be causing the slow down.

Okay, I'll post a patch to revert that commit.

However, the old code had this in _int_malloc (where the arena lock is
acquired):

#define REMOVE_FB(fb, victim, pp)                       \
  do                                                    \
    {                                                   \
      victim = pp;                                      \
      if (victim == NULL)                               \
        break;                                          \
    }                                                   \
  while ((pp = catomic_compare_and_exchange_val_acq (fb, victim->fd, victim)) \
         != victim);                                    \
…
            REMOVE_FB (fb, pp, victim);

And this in _int_free (without the arena lock):

      do
        {
          /* Check that the top of the bin is not the record we are going to
             add (i.e., double free).  */
          if (__builtin_expect (old == p, 0))
            malloc_printerr ("double free or corruption (fasttop)");
          p->fd = old2 = old;
        }
      while ((old = catomic_compare_and_exchange_val_rel (fb, p, old2))
             != old2);

I really don't see what makes sure that the store of p->fd happens
before the load of victim->fd.

It works out on POWER for some reason.  I'm attaching a test case that
should exercise these two code paths.

Thanks,
Florian

Attachment: parallel-free.c
Description: Text document

Follow-Ups:
- Re: [PATCH] malloc: Use current (C11-style) atomics for fastbin access
  - From: Carlos O'Donell

References:
- Re: [PATCH] malloc: Use current (C11-style) atomics for fastbin access
  - From: Anton Blanchard
- Re: [PATCH] malloc: Use current (C11-style) atomics for fastbin access
  - From: Florian Weimer
- Re: [PATCH] malloc: Use current (C11-style) atomics for fastbin access
  - From: Anton Blanchard

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]