This is the mail archive of the
mailing list for the glibc project.
Re: [RFC][BZ #16009] Memory handling in strxfrm_l()
- From: Leonhard Holz <leonhard dot holz at web dot de>
- To: libc-alpha at sourceware dot org
- Date: Mon, 24 Nov 2014 10:26:07 +0100
- Subject: Re: [RFC][BZ #16009] Memory handling in strxfrm_l()
- Authentication-results: sourceware.org; auth=none
- References: <546DCC25 dot 4020808 at web dot de> <20141121040317 dot GT22465 at brightrain dot aerifal dot cx> <546F12A3 dot 5030204 at web dot de> <20141122233907 dot GA2516 at domone>
No, the function is not permitted to return an error; it's required by
ISO C to produce a result. Falsely reporting that it needs more space
for the result, and thereby causing the caller to keep allocating
larger and larger buffers until it runs out of memory itself, is not
valid; in particular, it could report different needed lengths for the
same string at different calls in the ame program with the same
If strcoll_l is using an algorithm that requires allocation, this
needs to be fixed -- there's no fundamental reason it needs to
Ok. It is no big deal to add a none-allocating path but the question
than is when to use it. We could stick to the current implementation
and just try to malloc() if the stack is not available but
personally I would not want strxfrm to even try to allocate memory
beyond a certain amount. Considering that __MAX_ALLOCA_CUTOFF is
actually 64KB so that strings up to 12.8KB could have a stack based
index & rules cache one could maybe avoid malloc() at all without
hurting most real world use cases.
You could also only cache last 16k characters on stack and if function
goes beyond that then recompute these / switch to uncached version.
Thank you all for the feedback. There are two things I overlooked:
strxfrm needs to compute the whole src string because it has to return
the needed dest length in any case and the weight-indices-cache is
modified while traversing the string. So it's not possible to use a
sliding-window-approach or restrict the cache size based on dest length.
I also agree that strxfrm is a function for pre-computing things that
need to be fast somewhere else, so performance has not the highest
priority. Anyway, the "faster" approach is implemented so why not reuse it.
My proposal now is the following:
* allocate a fixed size cache array on the stack (e.g. 20kb supporting
strings up to 4000 characters)
* fill it with values until either the end of the string is reached or
the cache is full
* go with the cached version if end of string is reached
* go with the uncached version if not
This avoids strlen() + malloc() and is "fast" for standard real world
issues like word sorting and solid for large strings.