Handling C2x binary integer I/O

Carlos O'Donell carlos@redhat.com
Fri Dec 4 20:37:11 GMT 2020


On 12/4/20 12:42 PM, Joseph Myers wrote:
> C2x has support for binary integer constants starting 0b (accepted at the 
> October WG14 meeting, not yet in the main branch in the C standard git 
> repository).  By itself that's a language feature not a library one, 
> except that strtol with base 0 accepts all unsuffixed integer constants, 
> so binary constants imply it needs to handle 0b, and so then does scanf 
> %i.  At today's WG14 meeting there was support for further related 
> features (strtol accepting optional 0b prefix in base 2, printf/scanf %b 
> for binary), though a further paper on that will be needed at the March 
> meeting to decide on those features.
> 
> How do we wish to handle these features in glibc?  New printf/scanf 
> formats pose no problems, but changes to strings accepted by strtol are in 
> principle an incompatible change: strings starting 0b are required to be 
> handled differently in standards before C2x than in C2x.  We don't have 
> different symbols in glibc to support pre-C99 and C99 strtod (C99 
> introduced support for hex input to strtod), but do have different symbols 
> for scanf %a (C99 feature, previously used as a GNU extension for memory 
> allocation for a string).
> 
> Keeping full compatibility with pre-C2x code would indicate having 
> separate versions of all affected symbols - presumably including those 
> that are extensions, not just those that are actually in the C2x standard, 
> as it would seem very confusing for e.g. strtol and strtol_l to behave 
> differently in this regard.  That is, there would be __isoc23_* versions 
> (C2x is expected to be published as C23) of the following 32 functions:

Agreed.

>   strtol strtoll strtoul strtoull strtol_l strtoll_l strtoul_l strtoull_l
>    strtoimax strtoumax fscanf scanf sscanf vscanf vsscanf vfscanf
>   wcstol wcstoll wcstoul wcstoull wcstol_l wcstoll_l wcstoul_l wcstoull_l
>    wcstoimax wcstoumax fwscanf wscanf swscanf vfwscanf vwscanf vswscanf
> 
> (Platforms with two long double variants would have 44 new functions, and 
> powerpc64le would have 56 new functions, because the scanf functions also 
> need replicating for each long double variant.  The number of function 
> names could be reduced by 4, at the cost of more header complexity if e.g. 

I would not do this reduction.

I don't think the header complexity is worth the reduction in the number
of functions.

It is easier for developers to know that new functions exist and that we
model them in a logical straight forward way for interposition.

The names will need interposition by the sanitizers and it is easier if
we expose logical symbol names in that case IMO.

> strtoimax gets mapped to __isoc23_strtoll rather than needing 
> __isoc23_strtoimax; likewise, 8 more variants could be avoided on systems 
> where long and long long are both 64-bit, by using the same __isoc23 names 
> there.  But the long / long long case would only work with the correct 
> types given a real __REDIRECT implementation; the fallback #define in the 
> absence of __REDIRECT would give one function the wrong type.  Given that 
> the header support for missing __REDIRECT support is probably broken 
> anyway, that may not matter.)
> 
> There are also the following functions:
> 
>   __strtol_internal __strtoul_internal __strtoll_internal __strtoull_internal
>   __wcstol_internal __wcstoul_internal __wcstoll_internal __wcstoull_internal
> 
> The only public use of these (i.e. in installed headers) is for inline 
> versions of functions such as strtoimax in inttypes.h.  Those inlines were 
> left behind when such inlines for other strto* etc. functions were removed 
> in glibc 2.7.  Although they were apparently left behind deliberately, I 
> don't think it really makes sense to have inline versions of those few 
> inttypes.h functions (much more rarely used than the functions that are 
> not inlined); I think that rather than adding __isoc23_* versions of these 
> *_internal functions, the inlines should be removed.  (And we could 
> consider independently (a) whether those *_internal functions should 
> become compat symbols and (b) whether to make the *max functions into 
> proper aliases of the strtol/strtoll etc. functions rather than thin 
> wrappers round *_internal functions.)

Agreed. I would strongly consider them for compat symbols.

I don't see asan interceptors for these functions, only their out-of-line
variants and only for strtol/strtoll.

-- 
Cheers,
Carlos.



More information about the Libc-alpha mailing list