Generic strlen

David A. Ramos
Sat Oct 30 01:22:00 GMT 2010

> On 10/29/2010 11:49 AM, David A. Ramos wrote:
>> Hi newlib maintainers,
>> Our checking tools (KLEE) keeps complaining about newlib's generic strlen version. It looks like it was patched back in May 2008 to include a speed hack that violates ISO C. It attempts to first word align the pointer, and then read a word at a time to check for a NULL:
>> libc/string/strlen.c:
>> 73  /* If the string is word-aligned, we can check for the presence of                     
>> 74     a null in each word-sized block.  */
>> 75  aligned_addr = (unsigned long *)str;
>> 76  while (!DETECTNULL (*aligned_addr))
>> 77    aligned_addr++;
>> Obviously, this can read out of bounds if the memory allocated to that string is less than a word in length. While on most architectures this wouldn't actually cause a segfault, I don't think that's a safe assumption for the generic version of a libc routine. The same patch included an i386 target containing the same algorithm, which may be perfectly acceptable.
>> Thoughts?

On Oct 29, 2010, at 11:02 AM, Eric Blake wrote:
> As long as reading beyond the end of a string does not fault, you can't
> detect the violation of the standard, so the as-if rule applies.  Prove
> to me that there is an architecture that can fault on anything less than
> a word boundary, and then we'll talk about changing the code.  Until
> then, this implementation may violate strict C89, but it is by all means
> portable to all possible platforms that newlib will ever target.

Take a look at the February 2008 edition of the Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 3B: System Programming Guide, Part 2, Section 18.2: Debug Registers:

"For each breakpoint, the following information can be specified:
- The linear address where the breakpoint is to occur.
- The length of the breakpoint location (1, 2, or 4 bytes)."

"When the DE flag is set, the processor interprets bits as follows:
11 - Break on data reads or writes but not instruction fetches."

Using this version of strlen precludes a developer from setting a watchpoint on a byte within the same word as the end of a string. It would, in fact, fault erroneously and make debugging difficult.

More information about the Newlib mailing list