Most of the functions similar to strlen() that have to detect whether any bytes
of an integer is zero are very efficient. However, in glibc-2.7/string/strlen.c
this efficient code that's used in lots of other functions is surrounded by an
#if 0, and instead a trivial code is used which exits the loop and examines each
bytes separately if any of the bytes is within the range 129-255 or 0. That is
roughly 15/16 of all random cases in 4-bit architecture and even more in 8-bit.
Hence I think this function is hardly any more efficient than if you read one
long int and then simply examined all its bytes separately.
Is there any reason for the code that looks way more effective and is being used
in many other source files to be commented out here?
The math is wrong.
It looks glibc has a broken version of Alan Mycroft's HAKMEMC postings.
The solution is "((x - 0x01010101) & ~x & 0x80808080)", but the "& ~x" is
missing from the glibc version.
The "#if 0" was added Tue Jan 21 03:39:54 1992 UTC (16 years, 1 month ago) by
roland, and the patch looked like this:
I can reproduce this on cvs head. The generic strlen function is horribly
Roland can you comment on this?
What's the legal status of using that algorithm?
Note: All the string/* operations should use the corrected algorithm, and the
old comments should be removed.
Created attachment 2703 [details]
Alan Mycroft's hack for strlen()
I know that you only accept patches if
contributor signs an assignment.
But do you accept small bug fix without an
assignment, like that I attached to the bug ?
This is a fix for strlen() only
(I can do it for all other functions if you can accept
patches like this). It deletes old comments,
and adds Alan Mycroft's hack.
The changes are more than trivial, you would need a copyright assignment for
glibc on file with the FSF.
Thanks for reply.
I changed the code. Although noboy seems to use it.