ctype macros broken on 64-bits builds?

Martijn van Buul pino@dohd.org
Tue Jul 22 14:13:00 GMT 2008


I'm using newlib on x86_64-elf, and I've ran into problems with the various
is...() macros in ctype.h. According to C90 (and C99, and possibly earlier
standards before that..) these macros/functions are required to accept an
integer input with range [-1 .. 255]. It appears they are currently broken
for 64-bits targets. As an example, I used isalpha(), but the others have
exactly the same problem:

isalpha() is defined in ctype as:

#define	isalpha(c)  ((__ctype_ptr)[(unsigned)(c)]&(_U|_L))

where __ctype_ptr points to element 1 of a 257-entry array, so 
__ctype_ptr[-1] is actually valid.

This works for targets with 32-bit pointers and 32-bits integers,
as accessing element [-1] from an array will access exactly the same memory
as accessing element[(unsigned)(-1)], as there will be an implicit 

Assuming a char foo[10], I'd get:

&foo[0]:              0xbd4de86
&foo[-1]:             0xbd4de85
&foo[(unsigned)(-1)]: 0xbd4de85

However, this no longer works on a platform with 32 bits integers and 64-bits
pointers (like x86_64..), since the implicit overflow will not occur:

&foo[0]:              0x28000109c30
&foo[-1]:             0x28000109c2f
&foo[(unsigned)(-1)]: 0x28100109c2f

Note how the [(unsigned) (-1)] address ended up 4GB -1 beyond the first
element, instead of just before it.

All in all, this means that using any of the ctype(3) macros with -1
as an argument will cause a segmentation fault, where it should have been
defined behaviour.

It is the explicit cast to unsigned that's causing the problem here, as
using (signed) would've yielded the expected result:

&foo[(signed)(-1)]:   0x28000109c2f

I rewrote all appropriate macros in ctype.h to cast to (signed) instead of
(unsigned), with no adverse affects. My code no longer crashes now, but my
testbed is limited so I don't know if this might break other targets.

The alternative option would be to do what the rest of the world has been
doing for a while (Including the BSDs, from which this ctype.* seems to 
have borrowed quite a bit), and rewriting isalpha and friends to

#define	isalpha(c)  ((__ctype_ptr)[(unsigned)(c + 1)]&(_U|_L))

with __ctype_ptr pointing at element 0 of the array in ctype/ctype_.c,
instead of at element 1.

Martijn van Buul - pino@dohd.org 

More information about the Newlib mailing list