strxfrm output stability

Paul Eggert eggert@cs.ucla.edu
Wed Sep 9 19:05:00 GMT 2015


On 09/09/2015 09:54 AM, Zack Weinberg wrote:
> https://lwn.net/Articles/653411/  indicates that PostgreSQL's use of
> strxfrm is new in 9.5

Ah, I was looking at the latest stable PostgreSQL release (9.4.4), so it 
sounds like they've cleaned up the strxfrm-using code.  Yet another 
reason for me to rethink my position that strxfrm is silly.

When I tried a decade ago to use strxfrm in GNU sort to speed it up, I 
ran into two problems.  First, glibc strxfrm generates lengthy output, 
and since I wanted only the first 8 bytes or so I got relatively little 
information out of the prefix and overall it was a performance loss on 
my benchmarks.  As I understand it the new PostgreSQL code works around 
this problem by doing a collation-preserving compression of the strxfrm 
output.  It would be better, I expect, if GNU libc strxfrm did that 
compression itself rather than require every caller who cares about 
efficiency to do it.

The second problem was more of an irritation: I wanted the first (say) 8 
bytes of strxfrm output, but there's no standard way of getting them 
without generating all the strxfrm output bytes and then discarding the 
unwanted trailing bytes, which is slow.  (I hacked around this problem 
by asking for the first 8+DELTA bytes, where DELTA is adjusted so that 
it works in practice, but this is clearly an unportable-in-theory 
hack.)  It would be nicer if glibc strxfrm would guarantee to fill up 
its output buffer even when truncated.

Perhaps if I stold the idea of collation-preserving compression from 
PostgreSQL and used that idea in GNU sort, there would be enough 
performance gain to make strxfrm worthwhile there.  Or perhaps we could 
fix glibc strxfrm to perform better as described above -- though this 
would mean strxfrm's output wouldn't be stable from the current glibc 
release to the next one.



More information about the Libc-alpha mailing list