ctype macros broken on 64-bits builds?

Thu Jul 24 18:02:00 GMT 2008

Martijn van Buul wrote:
> Hello,
>
> I'm using newlib on x86_64-elf, and I've ran into problems with the various
> is...() macros in ctype.h. According to C90 (and C99, and possibly earlier
> standards before that..) these macros/functions are required to accept an
> integer input with range [-1 .. 255]. It appears they are currently broken
> for 64-bits targets. As an example, I used isalpha(), but the others have
> exactly the same problem:
>
> isalpha() is defined in ctype as:
>
> #define	isalpha(c)  ((__ctype_ptr)[(unsigned)(c)]&(_U|_L))
>
> where __ctype_ptr points to element 1 of a 257-entry array, so 
> __ctype_ptr[-1] is actually valid.
>
> This works for targets with 32-bit pointers and 32-bits integers,
> as accessing element [-1] from an array will access exactly the same memory
> as accessing element[(unsigned)(-1)], as there will be an implicit 
> overflow:
>
> Assuming a char foo[10], I'd get:
>
> &foo[0]:              0xbd4de86
> &foo[-1]:             0xbd4de85
> &foo[(unsigned)(-1)]: 0xbd4de85
>
> However, this no longer works on a platform with 32 bits integers and 64-bits
> pointers (like x86_64..), since the implicit overflow will not occur:
>
> &foo[0]:              0x28000109c30
> &foo[-1]:             0x28000109c2f
> &foo[(unsigned)(-1)]: 0x28100109c2f
>
> Note how the [(unsigned) (-1)] address ended up 4GB -1 beyond the first
> element, instead of just before it.
>
> All in all, this means that using any of the ctype(3) macros with -1
> as an argument will cause a segmentation fault, where it should have been
> defined behaviour.
>
> It is the explicit cast to unsigned that's causing the problem here, as
> using (signed) would've yielded the expected result:
>
> &foo[(signed)(-1)]:   0x28000109c2f
>
> I rewrote all appropriate macros in ctype.h to cast to (signed) instead of
> (unsigned), with no adverse affects. My code no longer crashes now, but my
> testbed is limited so I don't know if this might break other targets.
>
> The alternative option would be to do what the rest of the world has been
> doing for a while (Including the BSDs, from which this ctype.* seems to 
> have borrowed quite a bit), and rewriting isalpha and friends to
>
> #define	isalpha(c)  ((__ctype_ptr)[(unsigned)(c + 1)]&(_U|_L))
>
> with __ctype_ptr pointing at element 0 of the array in ctype/ctype_.c,
> instead of at element 1.
>
>   
Hi Martin,

  Thanks for catching this.

  I have checked in the accompanying patch which implements the 
alternative you mention above.  To prevent breakage in existing code, I 
have created a new pointer: __ctype_ptr__ and changed the ctype 
macros/functions to use it.

  Cygwin folks will probably need to add __ctype_ptr__ to the list of 
library globals.

  If anybody finds any problems, just let me know.

-- Jeff J.

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ctype.patch
URL: <http://sourceware.org/pipermail/newlib/attachments/20080724/a5e718aa/attachment.ksh>