This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] malloc: Use current (C11-style) atomics for fastbin access


* Anton Blanchard:

> Hi Florian,
>
>> > I see a 16% regression on ppc64le with a simple threaded malloc test
>> > case. I guess the C11 atomics aren't as good as what we have in
>> > glibc.  
>> 
>> Uh-oh.  Would you please check if replacing the two
>> atomic_load_acquire with atomic_load_relaxed restore the previous
>> performance?
>
> As you suspect, doing this does restore the performance. The two lwsync
> barrier instructions must be causing the slow down.

Okay, I'll post a patch to revert that commit.

However, the old code had this in _int_malloc (where the arena lock is
acquired):

#define REMOVE_FB(fb, victim, pp)                       \
  do                                                    \
    {                                                   \
      victim = pp;                                      \
      if (victim == NULL)                               \
        break;                                          \
    }                                                   \
  while ((pp = catomic_compare_and_exchange_val_acq (fb, victim->fd, victim)) \
         != victim);                                    \
…
            REMOVE_FB (fb, pp, victim);

And this in _int_free (without the arena lock):

      do
        {
          /* Check that the top of the bin is not the record we are going to
             add (i.e., double free).  */
          if (__builtin_expect (old == p, 0))
            malloc_printerr ("double free or corruption (fasttop)");
          p->fd = old2 = old;
        }
      while ((old = catomic_compare_and_exchange_val_rel (fb, p, old2))
             != old2);

I really don't see what makes sure that the store of p->fd happens
before the load of victim->fd.

It works out on POWER for some reason.  I'm attaching a test case that
should exercise these two code paths.

Thanks,
Florian

Attachment: parallel-free.c
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]