Time Fixes Diff

Brian Inglis Brian.Inglis@SystematicSw.ab.ca
Tue Sep 12 05:34:00 GMT 2017


On 2017-09-11 09:17, Craig Howland wrote:
> On 09/10/2017 12:27 AM, Brian Inglis wrote:
>> On 2017-09-08 10:34, Brian Inglis wrote:
>>> On 2017-09-08 09:00, Corinna Vinschen wrote:
>>>> On Sep  8 08:46, Brian Inglis wrote:
>>>>> On 2017-09-08 02:18, Corinna Vinschen wrote:
>>>>>> On Sep  8 07:40, Freddie Chopin wrote:
>>>>>>> On Thu, 2017-09-07 at 15:58 -0600, Brian Inglis wrote:
>>>>>>>> The question is, going forward, should we use long for bigger than
>>>>>>>> int values as usual, or use time_t instead, to consistently avoid
>>>>>>>> long, in these time functions.
>>>>>>> I think we should go with time_t, which will be 32-bit or 64-bit long,
>>>>>>> depending on what was chosen during newlib compilation.
>>>>>> While that's nice and all, two points to keep in mind:
>>>>>>
>>>>>> * 64 bit computations are costly on 32 bit (or even 16 bit) platforms and
>>>>>>    should be avoided if overkill isn't your goal.
>>>>>>
>>>>>> * days, months, years, etc, are *all* of type int in struct tm, per
>>>>>>    POSIX. Adding 64 bit ops just to fold stuff back into ints might only
>>>>>>    make marginal sense.
>>>>> Max year == MAX_INT + 1900 == 2^31-1 + 1900 == 2147485547 on 32 bit platforms,
>>>> You wrote this before but it sounds like a pretty academic problem.
>>> Check out the strftime.c regression test vectors before you conclude that.
>>>
>>>>> if you want to avoid arbitrary limitations, and be as (Linux) compatible as
>>>>> possible, which the existing library tries to do, but there are various issues
>>>>> to be fixed. There are places where int, unsigned, long, and unsigned long are
>>>>> used which may or may not work depending on time_t, int, and long sizes.
>>>> Glibc compat is nice, and as far as Cygwin is concerned I'm not
>>>> concerned for size.  However, newlib strives to small footprints
>>>> given that many embedded targets are using it.  This should be kept
>>>> in mind.
>>>>
>>>>> Is long bigger than int on all newlib platforms? What is bigger than int?
>>>>>
>>>>> Using time_t as required makes temps and variables big enough to
>>>>> avoid overflow for seconds, days, and years counts, without
>>>>> penalizing 32 bit time_t platforms.
>>>> Rather than int, long, long long you can use explicit sized ints
>>>> like int16_t, int32_t, uint64_t.
>>> We need space for things that are bigger than ints up to a few to a couple
>>> orders of magnitude smaller than time_t, not even half the length, so picking
>>> types smaller than time_t does not seem to be worth the code?
>>>
>>>>> For I/O conversion of 64 bit time_t max year on 32 bit platforms long long is
>>>>> required, it may also be used with 32 bit time_t to avoid conditionals, and it
>>>>> imposes no extra overhead on 64 bit platforms.
>>>>>
>>>>> Everything depends on how many conditionals we want to use to
>>>>> minimize the data and code space and time spent at runtime and for
>>>>> development.
>>> I'm inclined to use common code and types that will work correctly on all
>>> platforms *in places where they are needed* to get things working correctly,
>>> then we can worry about space and time reductions if required?
>>>
>>> I can post diffs for discussion before making any local commits or patches.
>> Attached for discussion.
>>
>> Used local.h definitions, removed some local duplications, changed
>> tests/compares of > ... - 1) to >= ..., changed declarations to time_t and added
>> casts for large secs, days, years counts to avoid overflow with extreme values,
>> cast to long long for I/O mainly snprintf.
>>
>> Not yet tested due to unrelated build issues on other files in latest git pull.
>>
> If you're going to bother to spend the time (I agree with Corinna that it is a
> purely academic exercise, of no practical value), it probably should be changed
> to use int_least types to be friendliest to the smallest-int targets, and it
> must have additional work to not be wrong on small-int targets.  For example,
> era as it is presently sized strictly does not work in the doomsday view, as it
> is only an int but can have a maximum theoretical value of about 730.7e6 (needs
> 30 bits)--it won't work properly for a 16-bit target.

There's a couple of considerations here - compatibility with 64 bit time_t Unix
platforms which correctly handle values up to 32 bit INT_MAX tm_year + 1900 and
ensuring 32 bit time_t works okay until 2038-01-19 03:14:07+0000.

> But in addition to the theoretical size problems in the variables, there is
> another little-known and very real consideration that really renders those
> problems moot, and that is leap seconds. gmtime() is defined as using UTC, to
> which leap seconds are applied, so this is a real consideration, not just
> theoretical.  (If it were GPS time things would be different, but even with UT1
> there would be issues.)  Long-time timekeeping is actually very complicated,
> especially if you're trying to use seconds as the basis for the measurement. 
> (Using days and the proleptic Gregorian calendar is an improvement of sorts over
> seconds, but still has problems.  If you've never heard of the proleptic
> Gregorian calendar, you're in the majority, but even the fact that the concept
> is defined underscores the problems with time reporting over longer timeframes.)

Our primary compliance seems to be to the POSIX time scale, which ignores
leap-seconds, and gmtime() implements, so any leap second handling must happen
in the tz code, perhaps using zoneinfo right to handle TAI in the past.

Until someone can come up with a decent model of earth rotation or a better
approach to timekeeping, solutions must be application specific to be practical.

I've heard of, used, and interconverted dates from many calendars and time
scales, bought or collected and used most calendar and time scale conversion
code out there, but for personal use, I go back to basic rules and work from there.

> Have you done a calculation to show where the present code (without this
> proposed patch applied) first breaks down?

They break down when they put results from 64 bit time_t into years, days, and
seconds variables that are larger than int and smaller than time_t by factors of
only 86400 or 31536000.
Unfortunately every function uses different adhoc code to deal with the same
entities and do its job, so each has different issues in different places, not
all overflows. I'd like to be sure the calculations are being done properly for
64 bit time_t to validate the functions and fix other issues.

> The test case checking for year 2e9 is clearly insane.  With the unpredictable
> nature of leap seconds, there is no way to go that far ahead and get it right. 
> At the recent rate--27 leap seconds since 1972--there might be as many as 1.3e9
> by the year 0x7FFFFFFF. That's potentially about 41 years worth of leap
> seconds.  Even if the year were limited to 10000 (7983 years from now), it is
> potentially 4790 leap seconds.  (That's 1:20:50, which makes seconds, minutes,
> hours, and possibly even the day of the week, wrong.)  This is even ignoring the
> rate at which earth's rotation is slowing, which serves to increase the
> leap-second insertion rate as the years go on.

Unfortunately the short term variations caused by natural variations and
disasters including weather make leap second prediction an environmental
forecasting problem involving the earth's interior, volcanoes, glaciers and ice
shelfs, hurricanes, and droughts, which all change the earth's mass distribution
and rotation.

> As I've quantified this somewhat in forming this reply, I'm thinking that
> perhaps we should just leave the code the way it is and simply add a note to the
> man page which points out roughly the year at which the code might break but
> that going even that far is introducing a leap-second-based error which is
> impossible to account for (either forwards a long time, or backwards past 1972
> when leap seconds start for UTC).  Making changes to fix the places where the
> calculations can overflow is a false help, as while it addresses one issue, it
> does not address the larger issue that trying to use the function over such a
> larger time is an invalid use of it.  (A main reason for suggesting that it be
> left alone is to avoid penalizing small-int targets with performance penalties
> for a non-real help. If changes can turn out to not penalize it, cleaning up a
> little then would not hurt.)

Perhaps the proposal is premature, I should take the work offline, and maybe fix
Cygwin to at least handle time consistently within normal ranges. Then other
platform users can see if they want to test and enable the Cygwin patches.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada



More information about the Newlib mailing list