This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC/PoC] malloc: use wfcqueue to speed up remote frees
- From: Eric Wong <normalperson at yhbt dot net>
- To: Carlos O'Donell <carlos at redhat dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Wed, 1 Aug 2018 09:26:26 +0000
- Subject: Re: [RFC/PoC] malloc: use wfcqueue to speed up remote frees
- References: <20180731084936.g4yw6wnvt677miti@dcvr> <0cfdccea-d173-486c-85f4-27e285a30a1a@redhat.com> <20180731231819.57xsqvdfdyfxrzy5@whir> <c061de55-cc2a-88fe-564b-2ea9c4a7e632@redhat.com> <20180801062352.rlrjqmsszntkzlfe@untitled> <aa9b36d4-02a1-e3cc-30e8-eca9f0d8b6eb@redhat.com>
Carlos O'Donell <carlos@redhat.com> wrote:
> On 08/01/2018 02:23 AM, Eric Wong wrote:
> > Carlos O'Donell <carlos@redhat.com> wrote:
> >> On 07/31/2018 07:18 PM, Eric Wong wrote:
> >>> Also, if I spawn a bunch of threads and get a bunch of
> >>> arenas early in the program lifetime; and then only have few
> >>> threads later, there can be a lot of idle arenas.
> >>
> >> Yes. That is true. We don't coalesce arenas to match the thread
> >> demand.
> >
> > Eep :< If contention can be avoided (which tcache seems to
> > work well for), limiting arenas to CPU count seems desirable and
> > worth trying.
>
> Agreed.
>
> In general it is not as bad as you think.
>
> An arena is made up of a chain of heaps, each an mmap'd block, and
> if we can manage to free an entire heap then we unmap the heap,
> and if we're lucky we can manage to free down the entire arena
> (_int_free -> large chunk / consolidate -> heap_trim -> shrink_heap).
>
> So we might just end up with a large number of arena's that don't
> have very much allocated at all, but are all on the arena free list
> waiting for a thread to attach to them to reduce overall contention.
>
> I agree that it would be *better* if we had one arena per CPU and
> each thread could easily determine the CPU it was on (via a
> restartable sequence) and then allocate CPU-local memory to work
> with (the best you can do; ignoring NUMA effects).
Thanks for the info on arenas. One problem for Ruby is we get
many threads[1], and they create allocations of varying
lifetimes. All this while malloc contention is rarely a
problem in Ruby because of the global VM lock (GVL).
Even without restartable sequences, I was wondering if lfstack
(also in urcu) could even be used for sharing/distributing
arenas between threads. This would require tcache to avoid
retries on lfstack pop/push.
Much less straighforward than using wfcqueue for frees with
this patch, though :)
[1] we only had green-threads back in Ruby 1.8, and I guess many
Rubyists got used to the idea that they could have many
threads cheaply. Ruby 1.9+ moved to 100% native threads,
so I'm also trying to reintroduce green threads as an option
back into Ruby (but still keeping native threads)
> > OK, I noticed my patch fails conformance tests because
> > (despite my use of __cds_wfcq_splice_nonblocking) it references
> > poll(), despite poll() being in an impossible code path:
> >
> > __cds_wfcq_splice_nonblocking -> ___cds_wfcq_splice
> > -> ___cds_wfcq_busy_wait -> poll
> >
> > The poll call is impossible because the `blocking' parameter is 0;
> > but I guess the linker doesn't know that?
>
> Correct. We can fix that easily at a later date. Don't worry about it.
Heh, a bit dirty, but #define-ing poll away seems to work :)
diff --git a/malloc/malloc.c b/malloc/malloc.c
index 40d61e45db..89e675c7a0 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -247,6 +247,11 @@
/* For SINGLE_THREAD_P. */
#include <sysdep-cancel.h>
+/* prevent wfcqueue.h from including poll.h and linking to it */
+#include <poll.h>
+#undef poll
+#define poll(a,b,c) assert(0 && "should not be called")
+
#define _LGPL_SOURCE /* allows inlines */
#include <urcu/wfcqueue.h>