This is the mail archive of the
mailing list for the glibc project.
Re: The direction of malloc?
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Torvald Riegel <triegel at redhat dot com>
- Cc: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>, libc-alpha at sourceware dot org
- Date: Wed, 18 Dec 2013 15:05:01 +0100
- Subject: Re: The direction of malloc?
- Authentication-results: sourceware.org; auth=none
- References: <20131210121622 dot GA5416 at domone dot podge> <52A75502 dot 6040500 at linux dot vnet dot ibm dot com> <20131210210541 dot GA19161 at domone dot podge> <1387213140 dot 23049 dot 8010 dot camel at triegel dot csb> <20131216212334 dot GA21284 at domone dot podge> <1387285197 dot 23049 dot 9075 dot camel at triegel dot csb> <20131217190817 dot GA32756 at domone dot podge> <1387324884 dot 23049 dot 10462 dot camel at triegel dot csb> <20131218101110 dot GB5472 at domone dot podge> <1387368668 dot 23049 dot 10661 dot camel at triegel dot csb>
On Wed, Dec 18, 2013 at 01:11:08PM +0100, Torvald Riegel wrote:
> > > > Real implementation will be bit faster as dynamic tls slows this down a
> > > > bit.
> > > >
> > > > Memory system effects are not a factor here, as allocation pattern is
> > > > identical (stack in both cases).
> > >
> > > I was referring to memory system effects when the data is actually
> > > accessed by an application. It does matter on which page allocated data
> > > ends up (and where on a page relative to other allocations).
> > And as it ends at same virtual address as algorithms are identical this
> > does not matter here.
> You seemed to say that you want to move from concurrent code with
> synchronization to nonconcurrent code. As far as I was able to
> interpret what you wrote, it seemed that you wanted to move from
> (potentially) concurrent allocation from fastbins(?) to strictly
> per-thread allocation bins. Unless the former is concurrent and uses
> synchronization for no reason, it should be possible that you can have
> situations in which threads allocate from more areas than before.
Not neccessarily, you could add a check if it belong to thread and use
standard free if not, I wrote bit more about that here which was mainly
to get comments.
> > > The speed
> > > of allocations is just one thing. For example, just to illustrate, if
> > > you realloc by copying to a different arena, this can slow down programs
> > > due to NUMA effects if those programs expect the allocation to likely
> > > remain on the same page after a realloc; such effects can be much more
> > > costly than a somewhat slower allocation.
> > You cannot optimize code for unlikely case.
> We do NOT know what the unlikely case is, currently. This is why I
> suggested to start with analyzing application workloads and access
> patterns, building a model of it (ie, informal but at a level of detail
> that is sufficient to actually agree on a clear set of assumptions and
> not just handwaving), document it, and discuss it with the rest of the
> > When a memory is allocated
> > in thread A and reallocated in thread B there could be three cases
> > 1) Memory is passed to thread B which primarily access it.
> > 2) Memory is shared between A and B.
> > 3) Memory is primarily accessed by thread A.
> > As effect of cases 1) and 3) is symetrical
> Yes, both can happen, and there might always be a trade-off, and however
> you decide, you might decrease performance in some situations.
> > it suffices to estimate which
> > one is more likely and case 1 seems a best candidate.
> We do NOT know that. If you do, please show the evidence.
> > Realloc definitely does move in most cases as common usage pattern is
> > doubling size allocated and as we use best fit there is not enough room.
> How do you know it's really a common usage pattern? And, why should it
> not just be common but one of the most common usage patterns? What is
> common? Which applications? And so on...
It is about only way how avoid quadratic slowdown when repeately reallocating.
Ideally this should not be neccessary as we preallocate twice than requested
in realloc but we do not do this yet. This affects gcc which repeately tries
extend buffer by 8.
To test these use following program.
int moved, unmoved;
void *(*reallocp)(void *, size_t);
void __attribute__ ((constructor))
reallocp = dlsym (RTLD_NEXT, "realloc");
realloc (void *old, size_t size)
return malloc (size);
struct header *h = (struct header *) old;
size_t oldsize = (h->size & (~15)) - 16;
void *n = reallocp (old, size);
fprintf (stderr, "ptr: %llx old: %i new: %i moved: %i\n", old, oldsize, size, old != n);