This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC][BZ #16009] Memory handling in strxfrm_l()

From: Leonhard Holz <leonhard dot holz at web dot de>
To: libc-alpha at sourceware dot org
Date: Mon, 24 Nov 2014 10:26:07 +0100
Subject: Re: [RFC][BZ #16009] Memory handling in strxfrm_l()
Authentication-results: sourceware.org; auth=none
References: <546DCC25 dot 4020808 at web dot de> <20141121040317 dot GT22465 at brightrain dot aerifal dot cx> <546F12A3 dot 5030204 at web dot de> <20141122233907 dot GA2516 at domone>


No, the function is not permitted to return an error; it's required by
ISO C to produce a result. Falsely reporting that it needs more space
for the result, and thereby causing the caller to keep allocating
larger and larger buffers until it runs out of memory itself, is not
valid; in particular, it could report different needed lengths for the
same string at different calls in the ame program with the same
locale.

If strcoll_l is using an algorithm that requires allocation, this
needs to be fixed -- there's no fundamental reason it needs to
allocate.


Ok. It is no big deal to add a none-allocating path but the question
than is when to use it. We could stick to the current implementation
and just try to malloc() if the stack is not available but
personally I would not want strxfrm to even try to allocate memory
beyond a certain amount. Considering that __MAX_ALLOCA_CUTOFF is
actually 64KB so that strings up to 12.8KB could have a stack based
index & rules cache one could maybe avoid malloc() at all without
hurting most real world use cases.


You could also only cache last 16k characters on stack and if function
goes beyond that then recompute these / switch to uncached version.

Thank you all for the feedback. There are two things I overlooked:strxfrm needs to compute the whole src string because it has to returnthe needed dest length in any case and the weight-indices-cache ismodified while traversing the string. So it's not possible to use asliding-window-approach or restrict the cache size based on dest length.

I also agree that strxfrm is a function for pre-computing things thatneed to be fast somewhere else, so performance has not the highestpriority. Anyway, the "faster" approach is implemented so why not reuse it.


My proposal now is the following:

* allocate a fixed size cache array on the stack (e.g. 20kb supportingstrings up to 4000 characters)* fill it with values until either the end of the string is reached orthe cache is full

* go with the cached version if end of string is reached
* go with the uncached version if not

This avoids strlen() + malloc() and is "fast" for standard real worldissues like word sorting and solid for large strings.


Leonhard

Follow-Ups:
- Re: [RFC][BZ #16009] Memory handling in strxfrm_l()
  - From: Paul Eggert

References:
- [RFC][BZ #16009] Memory handling in strxfrm_l()
  - From: Leonhard Holz
- Re: [RFC][BZ #16009] Memory handling in strxfrm_l()
  - From: Rich Felker
- Re: [RFC][BZ #16009] Memory handling in strxfrm_l()
  - From: Leonhard Holz
- Re: [RFC][BZ #16009] Memory handling in strxfrm_l()
  - From: OndÅej BÃlka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]