Generic strlen

Eric Blake
Mon Nov 1 21:21:00 GMT 2010

On 10/29/2010 04:46 PM, David A. Ramos wrote:
>> As long as reading beyond the end of a string does not fault, you can't
>> detect the violation of the standard, so the as-if rule applies.  Prove
>> to me that there is an architecture that can fault on anything less than
>> a word boundary, and then we'll talk about changing the code.  Until
>> then, this implementation may violate strict C89, but it is by all means
>> portable to all possible platforms that newlib will ever target.
> Take a look at the February 2008 edition of the Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 3B: System Programming Guide, Part 2, Section 18.2: Debug Registers:
> "For each breakpoint, the following information can be specified:
> - The linear address where the breakpoint is to occur.
> - The length of the breakpoint location (1, 2, or 4 bytes)."

Running under a debugger is not a normal expectation, and you are naive
if you expect that libc will be using byte accesses when it is much
faster to use word accesses.

> "When the DE flag is set, the processor interprets bits as follows:
> 11 - Break on data reads or writes but not instruction fetches."
> Using this version of strlen precludes a developer from setting a watchpoint on a byte within the same word as the end of a string. It would, in fact, fault erroneously and make debugging difficult.

If you're going to the extremes of setting watchpoints on the tail of a
string, then you should either be prepared to watch all possible word
read sizes, or supply your own strlen() implementation, overriding libc,
that does the naive (and SLOW) byte-wise access to guarantee that your
debugging session will hit what you want.  But we should not penalize
libc for this non-typical use.

glibc's generic code versions do _the exact same thing_ of reading
beyond string bounds in a lot of their str* functions, and I don't see
anyone asking glibc to change their generic version.  Just because word
accesses might make debugging a bit more difficult, and just because you
have to add exceptions to your memory tracer tools to skip known safe
patterns like strlen() reading an entire aligned word even though it
exceeds the bounds of the string ending in that word, does not mean that
we should pessimize the code.

Eric Blake    +1-801-349-2682
Libvirt virtualization library

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 619 bytes
Desc: OpenPGP digital signature
URL: <>

More information about the Newlib mailing list