Generic strlen

Jeff Johnston
Thu Nov 4 01:07:00 GMT 2010

On 10/29/2010 06:53 PM, Eric Blake wrote:
> On 10/29/2010 04:46 PM, David A. Ramos wrote:
>>> As long as reading beyond the end of a string does not fault, you can't
>>> detect the violation of the standard, so the as-if rule applies.  Prove
>>> to me that there is an architecture that can fault on anything less than
>>> a word boundary, and then we'll talk about changing the code.  Until
>>> then, this implementation may violate strict C89, but it is by all means
>>> portable to all possible platforms that newlib will ever target.
>> Take a look at the February 2008 edition of the Intel 64 and IA-32 Architectures Software DeveloperÂ’s Manual Volume 3B: System Programming Guide, Part 2, Section 18.2: Debug Registers:
>> "For each breakpoint, the following information can be specified:
>> - The linear address where the breakpoint is to occur.
>> - The length of the breakpoint location (1, 2, or 4 bytes)."
> Running under a debugger is not a normal expectation, and you are naive
> if you expect that libc will be using byte accesses when it is much
> faster to use word accesses.
>> "When the DE flag is set, the processor interprets bits as follows:
>> 11 - Break on data reads or writes but not instruction fetches."
>> Using this version of strlen precludes a developer from setting a watchpoint on a byte within the same word as the end of a string. It would, in fact, fault erroneously and make debugging difficult.
> If you're going to the extremes of setting watchpoints on the tail of a
> string, then you should either be prepared to watch all possible word
> read sizes, or supply your own strlen() implementation, overriding libc,
> that does the naive (and SLOW) byte-wise access to guarantee that your
> debugging session will hit what you want.  But we should not penalize
> libc for this non-typical use.
> glibc's generic code versions do _the exact same thing_ of reading
> beyond string bounds in a lot of their str* functions, and I don't see
> anyone asking glibc to change their generic version.  Just because word
> accesses might make debugging a bit more difficult, and just because you
> have to add exceptions to your memory tracer tools to skip known safe
> patterns like strlen() reading an entire aligned word even though it
> exceeds the bounds of the string ending in that word, does not mean that
> we should pessimize the code.

Agreed.  The glibc implementation of strlen uses the same algorithm. 
One can already optionally build newlib with str* byte versions using 
the --enable-target-optspace option.

-- Jeff J.

More information about the Newlib mailing list