strxfrm output stability
Paul Eggert
eggert@cs.ucla.edu
Wed Sep 9 19:05:00 GMT 2015
On 09/09/2015 09:54 AM, Zack Weinberg wrote:
> https://lwn.net/Articles/653411/ indicates that PostgreSQL's use of
> strxfrm is new in 9.5
Ah, I was looking at the latest stable PostgreSQL release (9.4.4), so it
sounds like they've cleaned up the strxfrm-using code. Yet another
reason for me to rethink my position that strxfrm is silly.
When I tried a decade ago to use strxfrm in GNU sort to speed it up, I
ran into two problems. First, glibc strxfrm generates lengthy output,
and since I wanted only the first 8 bytes or so I got relatively little
information out of the prefix and overall it was a performance loss on
my benchmarks. As I understand it the new PostgreSQL code works around
this problem by doing a collation-preserving compression of the strxfrm
output. It would be better, I expect, if GNU libc strxfrm did that
compression itself rather than require every caller who cares about
efficiency to do it.
The second problem was more of an irritation: I wanted the first (say) 8
bytes of strxfrm output, but there's no standard way of getting them
without generating all the strxfrm output bytes and then discarding the
unwanted trailing bytes, which is slow. (I hacked around this problem
by asking for the first 8+DELTA bytes, where DELTA is adjusted so that
it works in practice, but this is clearly an unportable-in-theory
hack.) It would be nicer if glibc strxfrm would guarantee to fill up
its output buffer even when truncated.
Perhaps if I stold the idea of collation-preserving compression from
PostgreSQL and used that idea in GNU sort, there would be enough
performance gain to make strxfrm worthwhile there. Or perhaps we could
fix glibc strxfrm to perform better as described above -- though this
would mean strxfrm's output wouldn't be stable from the current glibc
release to the next one.
More information about the Libc-alpha
mailing list