possible snprintf() regression in 3.3.2

Tue Nov 23 09:48:21 GMT 2021

On Nov 23 17:34, Takashi Yano via Cygwin wrote:
> On Tue, 23 Nov 2021 10:23:02 +1100
> Tony Cook wrote:
> > On Mon, Nov 22, 2021 at 02:04:06PM +0100, Corinna Vinschen via Cygwin wrote:
> > > On Nov 22 11:34, Corinna Vinschen via Cygwin wrote:
> > > > On Nov 21 11:16, Tony Cook wrote:
> > > > > A simple option would be to use an small auto fixed buffer for most
> > > > > conversions, but use malloc() for %f formats for numbers greater in
> > > > > magnitude than some limit, though it would also need to be adjusted
> > > > > for the precision (ndigits here), since they take extra space.
> > > > > 
> > > > > This would avoid using the optional-to-implement VLA feature too.
> > > > 
> > > > Good idea.  I guess I create a simple fix doing just that.
> > > 
> > > I created a patch:
> > > https://sourceware.org/git/?p=newlib-cygwin.git;a=commitdiff;h=68faeef4be71
> > I don't think this solves the fundamental problem.
> > 
> > Simply looking at ndigits isn't enough for %f.
> > 
> > For %f with a large number (like 9e99), the buffer size required is
> > ndigits plus (roughly) log10(n), which we can further estimate
> > with log2(n)*146/485 (log2(10) is 3.32 ~== 485/146)
> > 
> > I think something more like:
> > 
> >   size_t outsize;
> >   if (mode == 3) {        /* %f */
> >     int expon = (e[NI-1] & 0x7fff) - (EXONE - 1); /* exponent part of float */
> >     /* log2(10) approximately 485/146 */
> >     outsize = expon * 146 / 485 + ndigits + 10;
> >   }
> >   else { /* %g/%e */
> >     outsize = ndigits + MAX_EXP_DIGITS + 10;
> >   }
> >   if (outsize > NDEC_SML) {
> >     outbuf = (char *)_malloc_r(ptr, outsize);
> >   }
> > 
> > You'll probably need to pass outsize into etoasc() rather than
> > calculating it.
> > 
> > See https://github.com/Perl/perl5/blob/blead/sv.c#L13295 for code in
> > perl that calculates the buffer size needed for %f (precis aka ndigits
> > is added at line 13385).
> 
> I guess Corinna thinks that 'ndigits' keeps the total number
> of digits to be printed.

No, I don't.  It's the requested decimal precision.

However, the fun fact is that ldtoa in newlib is more than 20 years old,
with only minor changes in 2003.  My patches don't change the basic
mechanism of ldtoa.  I just don't have enough knowledge of floating
point arithmetic to do that.  My patches only try to raise the number of
*possible* digits by raising the matching macro and raising the size of
the single, local digit buffer accordingly.

If the above crashed, then probably because the buffer was too small.
That should be fixed now, because the second patch fixes the buffer size
and the computation based on the buffer size.  If that's not the
problem, then, in theory, the same would have occured with the old code.

If my patches are inadequate, we can revert the patches and then the
precision will be restricted to 42 digits again, as before, see the
thread https://sourceware.org/pipermail/newlib/2021/018626.html

For everything else, we either need somebody who knows how to change the
current ldtoa to "do the right thing", whatever that is, or somebody who
takes a stab at replacing ldtoa with another, better alternative.

> However, in reality, for example in the case:
> snprintf(buf, sizeof(buf), "%.3f", 1234567890123456.789);
> 'ndigits' is only 3 even though total digits will be 20.
> 
> So, Tony thinks current code does not correct.
> 
> However, I think something is wrong with interpretation
> of 'ndigits' in dltoa.c.
> 
> printf("%.3f\n", sqrt(2)*1e70);
> printf("%.50f\n", sqrt(2)*1e70);
> 
> outputs
> 
> 14142135623730951759073108307330633613786387000000000000000000000000000.000
> 14142135623730951759073108307330633613786386978891021459448717416650727.13402790000888758223149296720949629080194006476078
> 
> Is this as intended?

On Linux I see

14142135623730951759073108307330633613786387161811679011529922516615168.000
14142135623730951759073108307330633613786387161811679011529922516615168.00000000000000000000000000000000000000000000000000

The newlib output for .3f probably suffers from the fact that ldtoa
chooses the small buffer, which restricts the number of digits to 43 or
44.  But keep in mind that ldtoa *always* restricted the output to 42,
so you never got a more precise output anyway.  Every digit beyond digit
42 is only printed due to the bigger buffer sizes.

So, what newlib and, in extension, Cygwin really needs at this point are
patches to the existing ldtoa or a change to gdtoa or equivalent.

https://cygwin.com/acronyms/#SHTDI
https://cygwin.com/acronyms/#PTC

Corinna