strptime() could hangs for hours

Howland Craig D (Craig)
Fri Apr 1 16:14:00 GMT 2011

On Apr  1 04:59, Corinna Vinschen wrote:
>> 2011-03-30  Craig Howland <...>
>>      * libc/time/strptime.c:  ... limit
>>      year to 1753 or later.
>Thanks for the patch.  I don't think it's right to limit the date
>to >= 1753, though.  Think about it:
>  strptime ("Apr 01, 1489", "%b %d, %Y", tm_ptr);
>There's no good reason to treat this as wrong.  If anything, you could
>skip the %[UVW] handling for dates < 1753.
     Based on your comment, I just looked a little bit more at the
function overall and found another flaw:  it does not calculate the day
of the week from other information, but only when there is one of the
formats directly related to it.  So even if "6 Dec 2001 12:33:45",
were given--which correctly, fully specifies sufficient information--the
output would be incomplete because tm_wday would not have been set.
(And in this particular case, the default 0 would be wrong.)  This date
is from the POSIX example, which goes on to show tm_wday being printed--
showing that they do expect it to be set correctly by inference.  It
is actually more general than this:  it does not infer any of the fields
from any other.  For example, of %j is given to specify the day of the
year, only tm_yday is set--month and day of month are not calculated
from it.  So a sufficient call (for the second that I typed this) of
        strptime("091 2011 10:54:28", "%j %Y %H:%M:%S", &tm)
fails to set tm_mon, tm_mday, and tm_wday (April, 1, Friday).  Adding
to my original list, this would be problem #7.
     So in this respect you are correct, that skipping UVW handling
would avoid an incorrect calculation.  But avoiding making the incorrect
calculation does not mean that a correct answer is given--rather, a
different wrong answer would be given.
     Now realizing the #7 problem, I feel slighlty less strongly about
gating for 1753 in the present state of the function, since most calls
will have tm_wday returned incorrectly.  So short of fixing that, not
gating could provide consistently silently-bad results.  But not gating
would only be a temporary solution, linking itself to the larger bug.
future corrections were made to calculate tm_wday when it needs to be
inferred from other data, the gating issue would return for inputs such
as "Apr 01, 1489 14:40:00".
     But there is another complication in the year versus day of week
even if one were to surmount the exact changeover to the Gregorian
calendar.  And that is that the Julian calendar started in 45 B.C.  This
could move the gate from circa 1753, but it would still be needed.
     All told, I still think gating is best.  I suppose that if you
hated it I could be persuaded to not gate temporarily while tm_wday is
generally broken, but gating would need to return when tm_wday were
(even if it were 45BC rather than AD1753).
     My main intent in this patch was to clean up the problem that
Aleksandr identified.  In so doing, I am also trying to rectify the
of the other flaws that I noticed, but stopped short of fixing
because there is only so much time in the day.  I think that failing
explicitly is better than silently failing, and that this behavior can
be explained to the user on the man page--when one is added.  Have I
persuaded you?  Do you have a suggestion for alternate behavior?
     Not in the patch, but for the record for later, what were your
thoughts on the non-POSIX k, l, V, and Z formats?  And %u, which I just
noticed as also not being POSIX.
     Perhaps we ought to glance at FreeBSD to see if their's is not so

More information about the Newlib mailing list