ctype macros broken on 64-bits builds?
Martijn van Buul
pino@dohd.org
Tue Jul 22 14:13:00 GMT 2008
Hello,
I'm using newlib on x86_64-elf, and I've ran into problems with the various
is...() macros in ctype.h. According to C90 (and C99, and possibly earlier
standards before that..) these macros/functions are required to accept an
integer input with range [-1 .. 255]. It appears they are currently broken
for 64-bits targets. As an example, I used isalpha(), but the others have
exactly the same problem:
isalpha() is defined in ctype as:
#define isalpha(c) ((__ctype_ptr)[(unsigned)(c)]&(_U|_L))
where __ctype_ptr points to element 1 of a 257-entry array, so
__ctype_ptr[-1] is actually valid.
This works for targets with 32-bit pointers and 32-bits integers,
as accessing element [-1] from an array will access exactly the same memory
as accessing element[(unsigned)(-1)], as there will be an implicit
overflow:
Assuming a char foo[10], I'd get:
&foo[0]: 0xbd4de86
&foo[-1]: 0xbd4de85
&foo[(unsigned)(-1)]: 0xbd4de85
However, this no longer works on a platform with 32 bits integers and 64-bits
pointers (like x86_64..), since the implicit overflow will not occur:
&foo[0]: 0x28000109c30
&foo[-1]: 0x28000109c2f
&foo[(unsigned)(-1)]: 0x28100109c2f
Note how the [(unsigned) (-1)] address ended up 4GB -1 beyond the first
element, instead of just before it.
All in all, this means that using any of the ctype(3) macros with -1
as an argument will cause a segmentation fault, where it should have been
defined behaviour.
It is the explicit cast to unsigned that's causing the problem here, as
using (signed) would've yielded the expected result:
&foo[(signed)(-1)]: 0x28000109c2f
I rewrote all appropriate macros in ctype.h to cast to (signed) instead of
(unsigned), with no adverse affects. My code no longer crashes now, but my
testbed is limited so I don't know if this might break other targets.
The alternative option would be to do what the rest of the world has been
doing for a while (Including the BSDs, from which this ctype.* seems to
have borrowed quite a bit), and rewriting isalpha and friends to
#define isalpha(c) ((__ctype_ptr)[(unsigned)(c + 1)]&(_U|_L))
with __ctype_ptr pointing at element 0 of the array in ctype/ctype_.c,
instead of at element 1.
--
Martijn van Buul - pino@dohd.org
More information about the Newlib
mailing list