This is the mail archive of the
mailing list for the glibc project.
Re: Inline function definitions for isdigit and isxdigit?
On 09/15/2016 08:02 PM, Joseph Myers wrote:
On Thu, 15 Sep 2016, Florian Weimer wrote:
Can we provide inline definitions for isdigit and xdigit?
POSIX says (for the digit character class):
In a locale definition file, only the digits <zero>, <one>, <two>, <three>,
<four>, <five>, <six>, <seven>, <eight>, and <nine> shall be specified, and in
contiguous ascending sequence by numerical value. The digits <zero> to <nine>
of the portable character set are automatically included in this class.
This means it's fixed to '0' .. '9' for our purposes (our locales must be
ASCII-transparent at least as far as the digits are concerned).
localedef should then disallow charmap files that aren't ASCII-transparent
for digits, if it doesn't already.
Right. There is a warning already (“character map `%s' is not ASCII
compatible, locale not ISO C compliant”), but the triggering condition
is not quite clear to me.
I'm basing my assertion above on the fact that we want that isdigit
('0'), ..., isdigit ('9') are all true based on the specification in C99
and C11. There are only ten decimal digits according to POSIX, and it
does not appear to be possible to have a symbolic name such as <zero> to
stand for more than one encoded character sequence, so the set of
decimal digits is clearly fixed.
For xdigit, one can have more than two sequences of 'A' .. 'F' letters:
In a locale definition file, only the characters defined for the class digit
shall be specified, in contiguous ascending sequence by numerical value,
followed by one or more sets of six characters representing the hexadecimal
digits 10 to 15 inclusive, with each set in ascending order (for example, <A>,
<B>, <C>, <D>, <E>, <F>, <a>, <b>, <c>, <d>, <e>, <f>).
But I wonder how useful this is in practice. One might be tempted to define
It seems perfectly valid in accordance with POSIX, and we support users
defining locales, so can't restrict functions to what's valid only for the
locales shipped with glibc (whereas support for alternative charmaps is
implementation-defined, so we can limit what we allow in charmap files).
For isxdigit, C99 and C11 make a final determination that only '0' …
'9', 'a' … 'f' and 'A' … 'F' are hexadecimal digits. But POSIX allows
more symbolic names in the xdigit character class. Much hand-waving is
still required to make this C99/C11 compliant because the standard only
lists 22 hexadecimal digits. One could perhaps argue that the
additional digits introduced by a locale are alternative representations
of the six letters.
We do have users of locales which are not ISO-C-compliant because they
are not completely ASCII-transparent. They are rather iffy from a
security perspective because string escaping is ambiguous if multi-byte
character sequences can contain characters which need escaping. But I'm
not sure if we have to extend this possibility to isdigit and isxdigit.