Inline function definitions for isdigit and isxdigit?

Florian Weimer fweimer@redhat.com
Fri Sep 16 12:06:00 GMT 2016


On 09/15/2016 08:02 PM, Joseph Myers wrote:
> On Thu, 15 Sep 2016, Florian Weimer wrote:
>
>> Can we provide inline definitions for isdigit and xdigit?
>>
>> POSIX says (for the digit character class):
>>
>> “
>> In a locale definition file, only the digits <zero>, <one>, <two>, <three>,
>> <four>, <five>, <six>, <seven>, <eight>, and <nine> shall be specified, and in
>> contiguous ascending sequence by numerical value. The digits <zero> to <nine>
>> of the portable character set are automatically included in this class.
>> ”
>>
>> This means it's fixed to '0' .. '9' for our purposes (our locales must be
>> ASCII-transparent at least as far as the digits are concerned).
>
> localedef should then disallow charmap files that aren't ASCII-transparent
> for digits, if it doesn't already.

Right.  There is a warning already (“character map `%s' is not ASCII 
compatible, locale not ISO C compliant”), but the triggering condition 
is not quite clear to me.

I'm basing my assertion above on the fact that we want that isdigit 
('0'), ..., isdigit ('9') are all true based on the specification in C99 
and C11.  There are only ten decimal digits according to POSIX, and it 
does not appear to be possible to have a symbolic name such as <zero> to 
stand for more than one encoded character sequence, so the set of 
decimal digits is clearly fixed.

>> For xdigit, one can have more than two sequences of 'A' .. 'F' letters:
>>
>> “
>> In a locale definition file, only the characters defined for the class digit
>> shall be specified, in contiguous ascending sequence by numerical value,
>> followed by one or more sets of six characters representing the hexadecimal
>> digits 10 to 15 inclusive, with each set in ascending order (for example, <A>,
>> <B>, <C>, <D>, <E>, <F>, <a>, <b>, <c>, <d>, <e>, <f>).
>> ”
>>
>> But I wonder how useful this is in practice.  One might be tempted to define
>
> It seems perfectly valid in accordance with POSIX, and we support users
> defining locales, so can't restrict functions to what's valid only for the
> locales shipped with glibc (whereas support for alternative charmaps is
> implementation-defined, so we can limit what we allow in charmap files).

For isxdigit, C99 and C11 make a final determination that only '0' … 
'9', 'a' … 'f' and 'A' … 'F' are hexadecimal digits.  But POSIX allows 
more symbolic names in the xdigit character class.  Much hand-waving is 
still required to make this C99/C11 compliant because the standard only 
lists 22 hexadecimal digits.  One could perhaps argue that the 
additional digits introduced by a locale are alternative representations 
of the six letters.

We do have users of locales which are not ISO-C-compliant because they 
are not completely ASCII-transparent.  They are rather iffy from a 
security perspective because string escaping is ambiguous if multi-byte 
character sequences can contain characters which need escaping.  But I'm 
not sure if we have to extend this possibility to isdigit and isxdigit.

Florian



More information about the Libc-alpha mailing list