Handling C2x binary integer I/O
Carlos O'Donell
carlos@redhat.com
Fri Dec 4 20:37:11 GMT 2020
On 12/4/20 12:42 PM, Joseph Myers wrote:
> C2x has support for binary integer constants starting 0b (accepted at the
> October WG14 meeting, not yet in the main branch in the C standard git
> repository). By itself that's a language feature not a library one,
> except that strtol with base 0 accepts all unsuffixed integer constants,
> so binary constants imply it needs to handle 0b, and so then does scanf
> %i. At today's WG14 meeting there was support for further related
> features (strtol accepting optional 0b prefix in base 2, printf/scanf %b
> for binary), though a further paper on that will be needed at the March
> meeting to decide on those features.
>
> How do we wish to handle these features in glibc? New printf/scanf
> formats pose no problems, but changes to strings accepted by strtol are in
> principle an incompatible change: strings starting 0b are required to be
> handled differently in standards before C2x than in C2x. We don't have
> different symbols in glibc to support pre-C99 and C99 strtod (C99
> introduced support for hex input to strtod), but do have different symbols
> for scanf %a (C99 feature, previously used as a GNU extension for memory
> allocation for a string).
>
> Keeping full compatibility with pre-C2x code would indicate having
> separate versions of all affected symbols - presumably including those
> that are extensions, not just those that are actually in the C2x standard,
> as it would seem very confusing for e.g. strtol and strtol_l to behave
> differently in this regard. That is, there would be __isoc23_* versions
> (C2x is expected to be published as C23) of the following 32 functions:
Agreed.
> strtol strtoll strtoul strtoull strtol_l strtoll_l strtoul_l strtoull_l
> strtoimax strtoumax fscanf scanf sscanf vscanf vsscanf vfscanf
> wcstol wcstoll wcstoul wcstoull wcstol_l wcstoll_l wcstoul_l wcstoull_l
> wcstoimax wcstoumax fwscanf wscanf swscanf vfwscanf vwscanf vswscanf
>
> (Platforms with two long double variants would have 44 new functions, and
> powerpc64le would have 56 new functions, because the scanf functions also
> need replicating for each long double variant. The number of function
> names could be reduced by 4, at the cost of more header complexity if e.g.
I would not do this reduction.
I don't think the header complexity is worth the reduction in the number
of functions.
It is easier for developers to know that new functions exist and that we
model them in a logical straight forward way for interposition.
The names will need interposition by the sanitizers and it is easier if
we expose logical symbol names in that case IMO.
> strtoimax gets mapped to __isoc23_strtoll rather than needing
> __isoc23_strtoimax; likewise, 8 more variants could be avoided on systems
> where long and long long are both 64-bit, by using the same __isoc23 names
> there. But the long / long long case would only work with the correct
> types given a real __REDIRECT implementation; the fallback #define in the
> absence of __REDIRECT would give one function the wrong type. Given that
> the header support for missing __REDIRECT support is probably broken
> anyway, that may not matter.)
>
> There are also the following functions:
>
> __strtol_internal __strtoul_internal __strtoll_internal __strtoull_internal
> __wcstol_internal __wcstoul_internal __wcstoll_internal __wcstoull_internal
>
> The only public use of these (i.e. in installed headers) is for inline
> versions of functions such as strtoimax in inttypes.h. Those inlines were
> left behind when such inlines for other strto* etc. functions were removed
> in glibc 2.7. Although they were apparently left behind deliberately, I
> don't think it really makes sense to have inline versions of those few
> inttypes.h functions (much more rarely used than the functions that are
> not inlined); I think that rather than adding __isoc23_* versions of these
> *_internal functions, the inlines should be removed. (And we could
> consider independently (a) whether those *_internal functions should
> become compat symbols and (b) whether to make the *max functions into
> proper aliases of the strtol/strtoll etc. functions rather than thin
> wrappers round *_internal functions.)
Agreed. I would strongly consider them for compat symbols.
I don't see asan interceptors for these functions, only their out-of-line
variants and only for strtol/strtoll.
--
Cheers,
Carlos.
More information about the Libc-alpha
mailing list