This is the mail archive of the
newlib@sourceware.org
mailing list for the newlib project.
newlib's version of strtod, strtof, strtold, and atof
- From: Gregory Pietsch <gpietsch at comcast dot net>
- To: newlib at sources dot redhat dot com
- Date: Sat, 07 Jan 2006 18:43:27 -0500
- Subject: newlib's version of strtod, strtof, strtold, and atof
Dear Newlib mavens,
I am donating to Newlib a replacement for the atof, strtod, strtof, and
strtold functions. Please take a look at it and hack it.
The version of strtod in the Newlib stdlib section was hacked to
generate NaNs, but it doesn't do infinites or hexadecimal floating point
numbers. The one I wrote is a lot closer to the C99 spec. The only thing
I didn't try to be meticulous about was the n-char-sequence in NaNs, but
on the whole I thought it was better than the one that was there. (It
also introduces strtold!)
Please take a look at it and tell me what you think.
Gregory Pietsch
/* strtold.c - code for the atof, strtod, strtof, and strtold functions
AUTHOR: Gregory Pietsch
DESCRIPTION:
The strtod, strtof, and strtold functions convert the initial portion of the
string pointed to by nptr to double, float, and long double representations,
respectively. First, they decompose the input string into three parts: an
initial, possibly empty, sequence of white-space characters (as specified by
the isspace function), a subject sequence resembling a floating-point
constant or representing infinity or NaN, and a final sequence of one or
more unrecognized characters, including the terminating null character of
the input string. Then, they attempt to convert the subject string to a
floating-point number, and return the result.
The expected form of the subject sequence is an optional plus or minus sign,
then one of the following:
- a nonempty sequence of decimal digits optionally containing a decimal-
point character, then an optional exponent part as defined in 6.4.4.2;
- a 0x or 0X, then a nonempty sequence of hexadecimal digits optionally
containing a decimal-point character, then an optional binary exponent
part as defined in 6.4.4.2;
- one of INF or INFINITY, ignoring case
- one of NAN or NAN(n-char-sequence_opt), ignoring case in the NAN part,
where:
n-char-sequence:
digit
nondigit
n-char-sequence digit
n-char-sequence non-digit
The subject sequence is defined as the longest initial subsequence of the
input string, starting with the first non-white-space character, that is of
the expected form. The subject sequence contains no characters if the input
string is not of the expected form.
If the subject sequence has the expected form for a floating-point number,
the sequence of characters starting with the first digit or the decimal-
point character (whichever comes first) is interpreted as a floating
constant according to the rules of 6.4.4.2, except that the decimal-point
character is used in place of a period, and that if neither an exponent part
nor a decimal-point character appears in a decimal floating-point number,
or if a binary exponent part does not appear in a hexadecimal floating-point
number, an exponent part of the appropriate type with value zero is assumed
to follow the last digit in the string. If the subject sequence begins with
a minus sign, the sequence is interpreted as negated. (It is unspecified
whether a minus-signed sequence is converted to a negative number directly
or by negating the value resulting from converting the corresponding
unsigned sequence [see F.5]; the two methods may yield different results if
rounding is toward positive or negative infinity. In either case, the
functions honor the sign of zero if floating-point arithmetic supports
signed zeros.) A character sequence INF or INFINITY is interpreted as an
infinity, if representable in the return type, else like a floating constant
that is too large for the range of the return type. A character sequence
NAN or NAN(n-char-sequence_opt), is interpreted as a quiet NaN, if supported
in the return type, else like a subject sequence part that does not have the
expected form; the meaning of the n-char-sequences is implementation-defined
(an implementation may use the n-char-sequence to determine extra
information to be represented in the NaN's significand). A pointer to the
final string is stored in the object pointed to by endptr, provided that
endptr is not a null pointer.
If the subject string has the hexadecimal form and FLT_RADIX is a power of
2, the value resulting from the conversion is correctly rounded.
In other than the "C" locale, additional locale-specific subject sequences
may be accepted.
If the subject sequence is empty or does not have the expected form, no
conversion is performed; the value of nptr is stored in the object pointed
to by endptr, provided that endptr is not a null pointer.
RECOMMENDED PRACTICE:
If the subject sequence has the hexadecimal form, FLT_RADIX is not a power
of 2, and the result is not exactly representable, the result should be one
of the two numbers in the appropriate internal format that are adjacent to
the hexadecimal floating source value, with the stipulation that the error
should have the correct sign for the current rounding direction.
If the subject sequence has the decimal form and at most DECIMAL_DIG
(defined in <float.h>) significant digits, the result should be correctly
rounded. If the subject sequence D has the decimal form and more than
DECIMAL_DIG significant digits, consider the two bounding, adjacent
decimal strings L and U, both having DECIMAL_DIG significant digits, such
that the values of L, D, and U satisfy L <= D <= U. The result should be
one of the (equal or adjacent) values that would be obtained by correctly
rounding L and U according to the current rounding direction, with the extra
stipulation that the error with respect to D should have a correct sign for
the current rounding direction. (DECIMAL_DIG, defined in <float.h>, should
be significantly large that L and U will usually round to the same internal
floating value, but if not will round to adjacent values.)
RETURNS:
The functions return the converted value, if any. If no conversion could be
performed, zero is returned. If the correct value is outside the range of
representable values, plus or minus HUGE_VAL, HUGE_VALF, or HUGE_VALL is
returned (according to the return type and sign of the value), and the value
of the macro ERANGE is stored in errno. If the result underflows (7.12.1),
the functions return a value whose magnitude is no greater than the smallest
normalized positive number in the return type; whether errno acquires the
value ERANGE is implementation-defined.
COPYRIGHT NOTICE:
This file is placed into the public domain by the author, Gregory Pietsch.
Do with it what you wish; just don't pretend that you wrote it.
*/
/* includes needed by this file */
#include <ctype.h>
#include <errno.h>
#include <float.h>
#include <limits.h>
#include <locale.h>
#include <math.h>
#include <stdlib.h>
#include <string.h>
/* defines */
/* SIG_MAX: the maximum number of significant digits we have. The maximum
number of significant digits in a long double must be at least 35,
which is the number of significant digits in an IEEE quad precision
floating-point number (113 * log10(2.0), rounded up). */
#define SIG_MAX 40
/* _Memcasecmp: return a case-insensitive comparison of two areas of
memory. */
static int
_Memcasecmp (const void *s1, const void *s2, size_t n)
{
const unsigned char *su1, *su2;
for (su1 = s1, su2 = s2; 0 < n; ++su1, ++su2, --n)
{
if (toupper (*su1) != toupper (*su2))
return (*su1 < *su2 ? -1 : +1);
}
return 0;
}
/* _Ldmul: multiply y by *px with checking */
static int
_Ldmul(long double *px, long double y)
{
int lexp;
long double ld;
ld = frexpl(*px, &lexp) * y;
errno = 0;
*px = ldexpl(ld, lexp); /* ldexpl can overflow */
return errno != 0 ? -1 : 0;
}
/* _Ldtento: compute x * 10^n, paranoid style */
static long double
_Ldtento(long double x, int n)
{
long double factor, fac10;
int errx;
unsigned nu;
if (n == 0 || x == 0.0L)
return x;
factor = 1.0L;
if (n < 0)
{
/* scale down */
nu = -(unsigned)n;
for (fac10 = 10.0L; 0 < nu; nu >>= 1, fac10 *= fac10)
{
if (nu & 1)
factor *= fac10;
}
errx = _Ldmul(&x, 1.0L/factor);
}
else
{
/* scale up */
for (fac10 = 10.0L; 0 < n; n >>= 1, fac10 *= fac10)
{
if (n & 1)
factor *= fac10;
}
errx = _Ldmul(&x, factor);
}
if (0 < errx)
errno = ERANGE;
return x;
}
/* _Stold: return the classification of the floating-point number (FP_ZERO
if there's no number; if there's a zero, return FP_NORMAL).
The pld parameter is a possibly-returned long double;
the pnan parameter is the n-char-sequence returned if we're
returning FP_NAN. */
static int
_Stold (const char *s, char **endptr, long double *pld, char **pnan)
{
static const char hexits[] = {
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e',
'f'
};
const char point = localeconv ()->decimal_point[0];
const char *sc;
char buf[SIG_MAX];
char sign;
long double x, fac, facpow;
int ndigit, nsig, nzero, olead, opoint, base;
size_t m;
char *p;
const char *pc;
int n;
unsigned long lo[SIG_MAX / 8 + 1], *pl;
short sexp;
long lexp;
const char *scsav, esign;
/* Get past the spaces. */
for (sc = s; isspace (*sc); ++sc)
;
/* Get past the initial sign. */
sign = (*sc == '-' || *sc == '+') ? *sc++ : '+';
/* Do we have INFINITY or INF? */
if (_Memcasecmp (*sc, "INFINITY", m = 8) == 0
|| _Memcasecmp (*sc, "INF", m = 3) == 0)
{
if (endptr)
*endptr = sc + m;
*pld = (sign == '-') ? -1.0L : +1.0L;
return FP_INFINITE;
}
/* Do we have NAN or NAN(nan-char-sequence_opt)? */
if (_Memcasecmp (*sc, "NAN(", m = 4) == 0
|| _Memcasecmp (*sc, "NAN", m = 3) == 0)
{
sc += m;
if (m == 4)
{
/* pick off the optional n-char-sequence */
p = strchr (sc, ')');
if (p == 0)
sc--; /* pretend the '(' doesn't belong */
else if (p == sc)
{
/* no n-char-sequence */
*pnan = 0;
sc++;
}
else
{
/* This could be a little more meticulous... */
*pnan = malloc ((p - sc) + 1);
if (*pnan == 0)
sc--; /* out of memory */
else
{
memcpy (*pnan, sc, (p - sc) - 1);
(*pnan)[p - sc] = '\0';
}
}
}
else
*nan = 0;
if (endptr)
*endptr = sc;
*pld = (sign == '-') ? -1.0L : +1.0L;
return FP_NAN;
}
/* Do we have a hexadecimal floating-point constant? */
if (_Memcasecmp (*sc, "0X", 2) == 0)
{
sc += 2;
base = 16;
}
else
base = 10;
/* get the digits/hexits before the exponent */
for (ndigit = nsig = nzero = 0, olead = opoint = -1;; ++sc)
{
if (*sc == point)
{
/* found a decimal point */
if (0 <= opoint)
break; /* already seen point */
else
opoint = ndigit;
}
else if (*sc == '0')
{
/* saw a zero */
++nzero;
++ndigit;
}
else if ((base == 16 && !isxdigit (*sc))
|| (base == 10 && !isdigit (*sc)))
break;
else
{
/* found a nonzero digit */
if (olead < 0)
olead = nzero;
else
{
/* deliver zeros */
for (; 0 < nzero && nsig < SIG_MAX; --nzero)
buf[nsig++] = 0;
}
++ndigit;
if (nsig < SIG_MAX)
{
/* deliver digit */
if (base == 10)
buf[nsig++] = *sc - '0';
else
{
p = strchr (hexits, tolower (*sc));
buf[nsig++] = p - hexits;
}
}
}
}
if (ndigit == 0)
{
/* no digits? return not-a-long double */
if (endptr)
*endptr = (char *) s;
return FP_ZERO;
}
/* skip trailing digits */
for (; 0 < nsig && buf[nsig - 1] == 0; --nsig)
;
/* compute significand */
*pc = buf;
pl = lo + (nsig >> 3);
for (*pl = 0, n = nsig; 0 < n; --n)
{
if ((n & 7) == 0)
*--pl = *pc++;
else
*pl = *pl * base + *pc++;
}
fac = facpow = (base == 10) ? 1e8L : +4294967296.0L;
for (x = (long double) *lo, n = 0; ++n <= (nsig >> 3);)
{
if (lo[n] != 0UL)
x += fac * lo[n];
fac *= facpow;
}
/* fold in any explicit exponent */
lexp = 0;
if ((base == 10 && (*sc == 'e' || *sc == 'E'))
|| (base == 16 && (*sc == 'p' || *sc == 'P')))
{
*scsav = sc;
esign = *++sc == '+' || *sc == '-' ? *sc++ : '+';
if (!isdigit (*sc))
*sc = *scsav; /* ill-formed exponent */
else
{
/* exponent looks valid */
for (; isdigit (*sc); ++sc)
if (lexp < 10000)
lexp = lexp * 10 + (*sc - '0');
/* else overflow */
if (esign == '-')
lexp = -lexp;
}
}
if (endptr)
*endptr = (char *) sc;
if (opoint < 0)
lexp += ndigit - nsig;
else
lexp += opoint - olead - nsig;
sexp =
(lexp <
SHRT_MIN) ? SHRT_MIN : ((lexp < SHRT_MAX) ? (short) lexp : SHRT_MAX);
if (base == 10)
*pld = _Ldtento (x, sexp);
else
*pld = ldexpl (x, (int) (sexp) << 2);
if (sign == '-')
*pld = -*ld;
return FP_NORMAL;
}
/* exported functions */
/* atof */
double (atof) (const char *nptr)
{
return strtod (nptr, 0);
}
/* strtod */
double (strtod) (const char *restrict nptr, char **restrict endptr)
{
long double ld;
char *nan_arg = 0;
double d;
switch (_Stold (nptr, endptr, &ld, &nan_arg))
{
case FP_NORMAL:
default:
if ((ld < 0.0L ? -ld : ld) > DBL_MAX)
{
errno = ERANGE;
return (ld < 0.0L) ? -HUGE_VAL : HUGE_VAL;
}
else
return (double) ld;
case FP_ZERO:
return 0.0;
case FP_NAN:
d = copysign (nan (nan_arg), (double) ld);
if (nan_arg != 0)
free (nan_arg);
return d;
case FP_INFINITE:
return ld < 0.0L ? -HUGE_VAL : HUGE_VAL;
}
}
/* strtof */
float (strtof) (const char *restrict nptr, char **restrict endptr)
{
long double ld;
char *nan_arg = 0;
float f;
switch (_Stold (nptr, endptr, &ld, &nan_arg))
{
case FP_NORMAL:
default:
if ((ld < 0.0L ? -ld : ld) > FLT_MAX)
{
errno = ERANGE;
return (ld < 0.0L) ? -HUGE_VALF : HUGE_VALF;
}
else
return (float) ld;
case FP_ZERO:
return 0.0F;
case FP_NAN:
f = copysignf (nanf (nan_arg), (float) ld);
if (nan_arg != 0)
free (nan_arg);
return f;
case FP_INFINITE:
return ld < 0.0L ? -HUGE_VALF : HUGE_VALF;
}
}
/* strtold */
long double (strtold) (const char *restrict nptr, char **restrict endptr)
{
long double ld;
char *nan_arg = 0;
switch (_Stold (nptr, endptr, &ld, &nan_arg))
{
case FP_NORMAL:
default:
return ld;
case FP_ZERO:
return 0.0;
case FP_NAN:
ld = copysignl (nanl (nan_arg), ld);
if (nan_arg != 0)
free (nan_arg);
return ld;
case FP_INFINITE:
return ld < 0.0L ? -HUGE_VALL : HUGE_VALL;
}
}
/* END OF FILE */