This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] Dynamic growable arrays for internal use
- From: Jeff Law <law at redhat dot com>
- To: Carlos O'Donell <carlos at redhat dot com>, Paul Eggert <eggert at cs dot ucla dot edu>, Florian Weimer <fweimer at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Thu, 1 Jun 2017 11:09:48 -0600
- Subject: Re: [PATCH] Dynamic growable arrays for internal use
- Authentication-results: sourceware.org; auth=none
- Authentication-results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
- Authentication-results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=law at redhat dot com
- Dkim-filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 7A0401201D
- Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 7A0401201D
- References: <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org> <email@example.com>
On 06/01/2017 10:13 AM, Carlos O'Donell wrote:
> On 05/20/2017 08:43 PM, Paul Eggert wrote:
>> Carlos O'Donell wrote:
>>> The dynamic array growth factor appears to be a value of 2, and
>>> that doesn't match best practice nor academic literature. I would
>>> be happier with a growth factor closer to 1.5.
>> For what it's worth, Gnulib (which has supported dynamically
>> growabable arrays for some time) generates the new size via "n += n /
>> 2 + 1;" by default. This guarantees growth even if N is zero, and
>> approximates the growth factor of 1.5 that you're suggesting. (The
>> code also checks for integer overflow of course.) The caller can ask
>> for a different amount of growth, if necessary.
> The growth factor is related to the underlying storage being used by
> the dynamic array. If the storage had such properties that the array
> could be grown in-place and to any size with zero cost, then the growth
> could be incredibly small and size kept optimal. Since no underlying storage
> has those properties at all times we choose a growth factor that maps to the
> properties that are present. Here, those properties are a function
> of the underlying system allocator, how it calls mmap, when, how often,
> and the costs associated with it. In general I'm surprised to see that 1.5
> is a universally good constant, then again, most OS have similar subsystems
> when it comes to allocators and virtual memory.
FWIW, we use a 1.5 growth factor for similar situations within GCC.
> I don't know that having the caller ask for different growth is going to be
> initially useful in glibc so I'm not going to ask for this in the API.
>>> Can we make the choice to have dynarray always heap allocated?
>> In Gnulib we've found it better to support initial buffers on the
>> stack, as an option. Commonly these initial buffers are already big
>> enough, which is a performance win in the typical case.
> OK. Thanks for the data point here.
And I'll chime in here that based on experiences within glibc that
alloca and vlas are ultimately a bad idea. Yes, they can be orders of
magnitude more efficient than hitting the heap, but time and time again
they've been mis-managed leading to security vulnerabilities. Walk
through the CVEs for glibc over the last 10 years and count how many are
related to dynamic stack allocations.
IMHO dynamic stack allocations should not be exposed to developers. We
simply get them wrong too often.
I'll climb down off the soapbox now...
>> Why impose this requirement? In Gnulib the caller can specify an
>> initial size, but this is optional. The default initial size is
>> chosen to be efficient for the underlying malloc implementation. It
>> would seem odd to require callers to know this implementation
> I see your point. Having the caller convey that no information is relevant
> in the context allows the implementation to choose something optimal given
> the backing store. And we have some use cases like this in the loader.
> At most we know we have say one link map, but how many more there are after
> is completely unknown (unless we had statistical data to guide us).
We've found in GCC that an initial size is useful for growable arrays.
Often we know enough to guess the common size for particular objects.