Handling C2x binary integer I/O

Joseph Myers joseph@codesourcery.com
Fri Dec 4 17:42:25 GMT 2020


C2x has support for binary integer constants starting 0b (accepted at the 
October WG14 meeting, not yet in the main branch in the C standard git 
repository).  By itself that's a language feature not a library one, 
except that strtol with base 0 accepts all unsuffixed integer constants, 
so binary constants imply it needs to handle 0b, and so then does scanf 
%i.  At today's WG14 meeting there was support for further related 
features (strtol accepting optional 0b prefix in base 2, printf/scanf %b 
for binary), though a further paper on that will be needed at the March 
meeting to decide on those features.

How do we wish to handle these features in glibc?  New printf/scanf 
formats pose no problems, but changes to strings accepted by strtol are in 
principle an incompatible change: strings starting 0b are required to be 
handled differently in standards before C2x than in C2x.  We don't have 
different symbols in glibc to support pre-C99 and C99 strtod (C99 
introduced support for hex input to strtod), but do have different symbols 
for scanf %a (C99 feature, previously used as a GNU extension for memory 
allocation for a string).

Keeping full compatibility with pre-C2x code would indicate having 
separate versions of all affected symbols - presumably including those 
that are extensions, not just those that are actually in the C2x standard, 
as it would seem very confusing for e.g. strtol and strtol_l to behave 
differently in this regard.  That is, there would be __isoc23_* versions 
(C2x is expected to be published as C23) of the following 32 functions:

  strtol strtoll strtoul strtoull strtol_l strtoll_l strtoul_l strtoull_l
   strtoimax strtoumax fscanf scanf sscanf vscanf vsscanf vfscanf
  wcstol wcstoll wcstoul wcstoull wcstol_l wcstoll_l wcstoul_l wcstoull_l
   wcstoimax wcstoumax fwscanf wscanf swscanf vfwscanf vwscanf vswscanf

(Platforms with two long double variants would have 44 new functions, and 
powerpc64le would have 56 new functions, because the scanf functions also 
need replicating for each long double variant.  The number of function 
names could be reduced by 4, at the cost of more header complexity if e.g. 
strtoimax gets mapped to __isoc23_strtoll rather than needing 
__isoc23_strtoimax; likewise, 8 more variants could be avoided on systems 
where long and long long are both 64-bit, by using the same __isoc23 names 
there.  But the long / long long case would only work with the correct 
types given a real __REDIRECT implementation; the fallback #define in the 
absence of __REDIRECT would give one function the wrong type.  Given that 
the header support for missing __REDIRECT support is probably broken 
anyway, that may not matter.)

There are also the following functions:

  __strtol_internal __strtoul_internal __strtoll_internal __strtoull_internal
  __wcstol_internal __wcstoul_internal __wcstoll_internal __wcstoull_internal

The only public use of these (i.e. in installed headers) is for inline 
versions of functions such as strtoimax in inttypes.h.  Those inlines were 
left behind when such inlines for other strto* etc. functions were removed 
in glibc 2.7.  Although they were apparently left behind deliberately, I 
don't think it really makes sense to have inline versions of those few 
inttypes.h functions (much more rarely used than the functions that are 
not inlined); I think that rather than adding __isoc23_* versions of these 
*_internal functions, the inlines should be removed.  (And we could 
consider independently (a) whether those *_internal functions should 
become compat symbols and (b) whether to make the *max functions into 
proper aliases of the strtol/strtoll etc. functions rather than thin 
wrappers round *_internal functions.)

-- 
Joseph S. Myers
joseph@codesourcery.com


More information about the Libc-alpha mailing list