Bug 30200 - time sometimes appears to go backwards
Summary: time sometimes appears to go backwards
Status: UNCONFIRMED
Alias: None
Product: glibc
Classification: Unclassified
Component: time (show other bugs)
Version: 2.35
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-03-05 17:21 UTC by Bruno Haible
Modified: 2023-03-21 20:41 UTC (History)
6 users (show)

See Also:
Host: x86_64-linux-gnu
Target: x86_64-linux-gnu
Build: x86_64-linux-gnu
Last reconfirmed:


Attachments
test program (251 bytes, text/x-csrc)
2023-03-05 17:21 UTC, Bruno Haible
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bruno Haible 2023-03-05 17:21:15 UTC
Created attachment 14731 [details]
test program

The attached test program invokes first timespec_get, and then time(NULL).
With a small probability (ca. 0.1% to 0.3%), the second call returns a
seconds value that is smaller than the one from the first call.

Since ISO C23 § 7.29.2.6 says
  "If base is TIME_UTC , the tv_sec member is set to the number of seconds
   since an implementation-defined epoch, truncated to a whole value and
   the tv_nsec member is set to the integral number of nanoseconds,
   rounded to the resolution of the system clock."

and ISO C23 § 7.29.2.5 says
  "The time function returns the implementation’s best approximation to
   the current calendar time."
and POSIX <https://pubs.opengroup.org/onlinepubs/9699919799/functions/time.html>
says
  "The time() function shall return the value of time in seconds since
   the Epoch."

my expectation is that
  - both ts1.tv_sec and tt2 denote seconds since the epoch,
  - tt2 is not smaller than ts1.tv_sec
    (because if it were, tt2+1 would be a better approximation to the
    current time than tt2).

The test program, however, reports ts1.tv_sec > tt2 around 12 to 40 times
in 10000 runs.

How to reproduce:
$ gcc -Wall -g -O2 foo.c
$ for i in `seq 10000`; do ./a.out ; done

Example output:
ts1 = { tv_sec = 1678035679, tv_nsec = 343581 }
tt2            = 1678035678
Aborted
ts1 = { tv_sec = 1678035679, tv_nsec = 1099393 }
tt2            = 1678035678
Aborted
ts1 = { tv_sec = 1678035679, tv_nsec = 2502625 }
tt2            = 1678035678
Aborted
ts1 = { tv_sec = 1678035680, tv_nsec = 601354 }
tt2            = 1678035679
Aborted
ts1 = { tv_sec = 1678035680, tv_nsec = 1543640 }
tt2            = 1678035679
Aborted
ts1 = { tv_sec = 1678035680, tv_nsec = 2320893 }
tt2            = 1678035679
Aborted
ts1 = { tv_sec = 1678035681, tv_nsec = 630264 }
tt2            = 1678035680
Aborted
ts1 = { tv_sec = 1678035681, tv_nsec = 1556696 }
tt2            = 1678035680
Aborted
ts1 = { tv_sec = 1678035681, tv_nsec = 2380812 }
tt2            = 1678035680
Aborted
ts1 = { tv_sec = 1678035682, tv_nsec = 806262 }
tt2            = 1678035681
Aborted
ts1 = { tv_sec = 1678035682, tv_nsec = 1696936 }
tt2            = 1678035681
Aborted
ts1 = { tv_sec = 1678035682, tv_nsec = 2484177 }
tt2            = 1678035681
Aborted
ts1 = { tv_sec = 1678035683, tv_nsec = 242097 }
tt2            = 1678035682
Aborted
ts1 = { tv_sec = 1678035683, tv_nsec = 972766 }
tt2            = 1678035682
Aborted
ts1 = { tv_sec = 1678035683, tv_nsec = 1801143 }
tt2            = 1678035682
Aborted
ts1 = { tv_sec = 1678035684, tv_nsec = 221852 }
tt2            = 1678035683
Aborted
ts1 = { tv_sec = 1678035684, tv_nsec = 1041359 }
tt2            = 1678035683
Aborted
ts1 = { tv_sec = 1678035684, tv_nsec = 1822872 }
tt2            = 1678035683
Aborted
Comment 1 Bruno Haible 2023-03-05 17:23:46 UTC
Glibc version: 2.35
Kernel version: Linux 5.15.0-53-generic x86_64
(Ubuntu 22.04)
Comment 2 girish 2023-03-05 17:55:47 UTC
Getting the same output:

$ for i in `seq 10000`; do ./a.out ; done
ts1 = { tv_sec = 1678038427, tv_nsec = 701070 }
tt2            = 1678038426
Aborted (core dumped)
ts1 = { tv_sec = 1678038428, tv_nsec = 362023 }
tt2            = 1678038427
Aborted (core dumped)
ts1 = { tv_sec = 1678038429, tv_nsec = 823507 }
tt2            = 1678038428
Aborted (core dumped)
ts1 = { tv_sec = 1678038430, tv_nsec = 629951 }
tt2            = 1678038429
Aborted (core dumped)
ts1 = { tv_sec = 1678038431, tv_nsec = 543252 }
tt2            = 1678038430
Aborted (core dumped)
ts1 = { tv_sec = 1678038432, tv_nsec = 665985 }
tt2            = 1678038431
Aborted (core dumped)
ts1 = { tv_sec = 1678038433, tv_nsec = 263170 }
tt2            = 1678038432
Aborted (core dumped)
ts1 = { tv_sec = 1678038434, tv_nsec = 1015063 }
tt2            = 1678038433
Aborted (core dumped)
ts1 = { tv_sec = 1678038435, tv_nsec = 616891 }
tt2            = 1678038434
Aborted (core dumped)
ts1 = { tv_sec = 1678038436, tv_nsec = 421832 }
tt2            = 1678038435
Aborted (core dumped)
ts1 = { tv_sec = 1678038437, tv_nsec = 552004 }
tt2            = 1678038436
Aborted (core dumped)
ts1 = { tv_sec = 1678038438, tv_nsec = 742045 }
tt2            = 1678038437
Aborted (core dumped)
ts1 = { tv_sec = 1678038439, tv_nsec = 82622 }
tt2            = 1678038438
Aborted (core dumped)

System:
(GNU libc) 2.36
Linux fedora 6.1.13-200.fc37.x86_64
Comment 3 Andreas Schwab 2023-03-05 18:28:30 UTC
That's the difference the kernel returns for CLOCK_REALTIME_COARSE (for time) and CLOCK_REALTIME (for timespec_get).
Comment 4 Andreas Schwab 2023-03-05 18:58:13 UTC
According to the kernel docs, CLOCK_REALTIME_COARSE can be up to 10ms in the past (depending on CONFIG_HZ).  This is the same the old time syscall would return.
Comment 5 Bruno Haible 2023-03-05 19:04:48 UTC
For comparison, the test program succeeds (even 100000 consecutive runs) on
- musl libc (Alpine Linux 3.17)
- FreeBSD 13.1
- NetBSD 9.0
- OpenBSD 7.2
- macOS 12.5
- Solaris 11.4
- GNU/Hurd
Comment 6 Adhemerval Zanella 2023-03-06 12:15:33 UTC
(In reply to Bruno Haible from comment #5)
> For comparison, the test program succeeds (even 100000 consecutive runs) on
> - musl libc (Alpine Linux 3.17)
> - FreeBSD 13.1
> - NetBSD 9.0
> - OpenBSD 7.2
> - macOS 12.5
> - Solaris 11.4
> - GNU/Hurd

If I recall correctly, we used CLOCK_REALTIME_COARSE for time to mimic Linux time syscall kernel behavior (which is implemented in a couple of old ABI and used historically).  Recently on 64 time_t time, we changed time to use clock_gettime and kept the same clock for compatibility.

I agree that it makes sense to use the same clock for time and timespec_get, however, using CLOCK_REALTIME would make the vDSO optimization for x86_64 and powerpc64 unusable (not a big deal IMHO). And another question is whether we will need to add a compact symbol to time (and I also do not think it would make sense either).
Comment 7 Florian Weimer 2023-03-06 12:25:22 UTC
(In reply to Adhemerval Zanella from comment #6)
> I agree that it makes sense to use the same clock for time and timespec_get,
> however, using CLOCK_REALTIME would make the vDSO optimization for x86_64
> and powerpc64 unusable (not a big deal IMHO). And another question is
> whether we will need to add a compact symbol to time (and I also do not
> think it would make sense either).

CLOCK_REALTIME still has vDSO acceleration in many cases on x86-64, right? It's just a bit slower, and the difference can matter to some logging frameworks if I recall correctly.

Bruno, the standard sources you cite suggest that we should round the seconds since epoch, and not (mostly) truncate it. So the standards seem to permit such divergence already (although the rounding it as suggested would move the less precise timestamp forward, not backwards).
Comment 8 Bruno Haible 2023-03-06 12:45:24 UTC
(In reply to Florian Weimer from comment #7)
> the standards seem to permit such divergence already

The standards even permit the use of two different epochs, e.g. 1970-01-01 for time() and 2001-09-17 for timespec_get(). But it would be practical nonsense to do so.

There are three ways to get the clock with 1-second resolution using simple ISO C / POSIX APIs:
- time(NULL)
- gettimeofday().tv_sec
- timespec_get(TIME_UTC).tv_sec
If using one of these then another one of these produces backwards progression, how are the applications supposed to behave in a sane way?

In other words, this bug report is not so much about standards compliance, but about doing something that is practically sensible.
Comment 9 Adhemerval Zanella 2023-03-06 13:08:58 UTC
(In reply to Florian Weimer from comment #7)
> (In reply to Adhemerval Zanella from comment #6)
> > I agree that it makes sense to use the same clock for time and timespec_get,
> > however, using CLOCK_REALTIME would make the vDSO optimization for x86_64
> > and powerpc64 unusable (not a big deal IMHO). And another question is
> > whether we will need to add a compact symbol to time (and I also do not
> > think it would make sense either).
> 
> CLOCK_REALTIME still has vDSO acceleration in many cases on x86-64, right?
> It's just a bit slower, and the difference can matter to some logging
> frameworks if I recall correctly.

Yes, it will follow the usual route of calling __clock_gettime, which in turn will call the vDSO.


> There are three ways to get the clock with 1-second resolution using simple ISO C / POSIX APIs:
> - time(NULL)
> - gettimeofday().tv_sec
> - timespec_get(TIME_UTC).tv_sec
> If using one of these then another one of these produces backwards progression, how are the applications supposed to behave in a sane way?
> 
> In other words, this bug report is not so much about standards compliance, but > about doing something that is practically sensible.

I tend to agree, I see no reason but performance implications to have different clocks for similar interfaces.
Comment 10 Florian Weimer 2023-03-06 13:14:53 UTC
(In reply to Bruno Haible from comment #8)
> There are three ways to get the clock with 1-second resolution using simple
> ISO C / POSIX APIs:
> - time(NULL)
> - gettimeofday().tv_sec
> - timespec_get(TIME_UTC).tv_sec
> If using one of these then another one of these produces backwards
> progression, how are the applications supposed to behave in a sane way?

Is this actually causing application problems, or is this just a synthetic test case that doesn't take different precision and rounding issue into account?
Comment 11 eggert 2023-03-19 23:43:45 UTC
(In reply to Florian Weimer from comment #10)
> Is this actually causing application problems...?

In theory any sequence of timestamps is something that a (perverse) sysadmin could inflict on the user by constantly futzing with the system clock, and hence applications that generate out-of-order timestamps are allowed by POSIX.

In practice out-of-order timestamps are problematic. Users will likely be confused, for example, if a log of a thread's actions contain out-of-order timestamps even when the system clock was not reset.

I looked at a few applications.

* Git can issue out-of-order timestamps. It's complicated enough that I judged it simpler to fix the problem than worry about how serious it is, and filed a proposed patch to do that <https://lore.kernel.org/git/20230319064353.686226-1-eggert@cs.ucla.edu/T/#u>.

* GNU Make can issue out-of-order timestamps in a debugging log - not a serious problem but it could be annoying. I filed a proposed patch <https://lists.gnu.org/r/bug-make/2023-03/msg00081.html>.

* Gnulib can generate out-of-order timestamps when parsing old-fashioned time strings, causing glitches in Gnulib-using commands like 'touch 01010000 file'. This is not serious (hardly anybody uses those old POSIX-specified strings that do not specify year numbers) and I don't know whether it's a POSIX violation. I fixed that problem <https://lists.gnu.org/r/bug-gnulib/2023-03/msg00028.html> and the fix should propagate into future releases of Coreutils etc.

* GNU Emacs proper does not have the problem with Glibc (it may have it with MS-Windows but that’s out of scope here). One Emacs auxiliary program ‘movemail’ may have the issue since it does call ‘time’. I haven’t investigated further.

The result of this brief investigation:

* There are problems.

* Effects observed so far are small (e.g., logs out of order).

* GNU Make (my first worry, due to its extensive use of timestamps) seems to be mostly OK.

* Any big effects are surely either rare, or hard to trace back to the problem source (or both); if big effects were common and easily traced back, we’d know already.

As an application developer I’d rather see ‘time’ fixed to match ‘gettimeofday’ / CLOCK_REALTIME / etc., as I worry that there might be larger, rare effects that this little survey didn’t catch.
Comment 12 Adhemerval Zanella 2023-03-21 15:06:58 UTC
Thanks for the detailed (In reply to eggert from comment #11)
> (In reply to Florian Weimer from comment #10)
> > Is this actually causing application problems...?
> 
> In theory any sequence of timestamps is something that a (perverse) sysadmin
> could inflict on the user by constantly futzing with the system clock, and
> hence applications that generate out-of-order timestamps are allowed by
> POSIX.
> 
> In practice out-of-order timestamps are problematic. Users will likely be
> confused, for example, if a log of a thread's actions contain out-of-order
> timestamps even when the system clock was not reset.
> 
> I looked at a few applications.
> 
> * Git can issue out-of-order timestamps. It's complicated enough that I
> judged it simpler to fix the problem than worry about how serious it is, and
> filed a proposed patch to do that
> <https://lore.kernel.org/git/20230319064353.686226-1-eggert@cs.ucla.edu/T/
> #u>.
> 
> * GNU Make can issue out-of-order timestamps in a debugging log - not a
> serious problem but it could be annoying. I filed a proposed patch
> <https://lists.gnu.org/r/bug-make/2023-03/msg00081.html>.
> 
> * Gnulib can generate out-of-order timestamps when parsing old-fashioned
> time strings, causing glitches in Gnulib-using commands like 'touch 01010000
> file'. This is not serious (hardly anybody uses those old POSIX-specified
> strings that do not specify year numbers) and I don't know whether it's a
> POSIX violation. I fixed that problem
> <https://lists.gnu.org/r/bug-gnulib/2023-03/msg00028.html> and the fix
> should propagate into future releases of Coreutils etc.
> 
> * GNU Emacs proper does not have the problem with Glibc (it may have it with
> MS-Windows but that’s out of scope here). One Emacs auxiliary program
> ‘movemail’ may have the issue since it does call ‘time’. I haven’t
> investigated further.
> 
> The result of this brief investigation:
> 
> * There are problems.
> 
> * Effects observed so far are small (e.g., logs out of order).
> 
> * GNU Make (my first worry, due to its extensive use of timestamps) seems to
> be mostly OK.
> 
> * Any big effects are surely either rare, or hard to trace back to the
> problem source (or both); if big effects were common and easily traced back,
> we’d know already.
> 
> As an application developer I’d rather see ‘time’ fixed to match
> ‘gettimeofday’ / CLOCK_REALTIME / etc., as I worry that there might be
> larger, rare effects that this little survey didn’t catch.

Thanks for the detailed analysis Paul, and I completely agree with you that
this is problematic and I really can't see why Florian sees the performance
the penalty is worth all this trouble (he even hinted that if we ever get 
this upstream he would need to revert to RHEL and related distros).  Specially
because if the user really wants to, it can simply do:

#define time(__ptr) \
  {( \
    struct timespec __ts; \
    clock_gettime (CLOCK_REALTIME_COARSE, &__ts); \
    __ts.tv_sec; \
  )}

On most architectures that provide vDSO, this should only be marginally slower
than time.

Also, the problem is not really specific for glibc 2.31+.  Rather, on glibc 
2.30 and older, the time syscall is used if present (which always uses the
coarse timer), otherwise gettimeofday is used. The time syscall is supported
on:

$ python sysdeps/unix/sysv/linux/glibcsyscalls.py query-syscall time
time:
  defined: hppa i386 m68k microblaze mips/mips32 powerpc/powerpc32 powerpc/powerpc64 s390/s390-32 sh sparc/sparc32 x86_64/64 x86_64/x32
  undefined: aarch64 alpha arc arm csky ia64 loongarch mips/mips64/n32 mips/mips64/n64 nios2 or1k riscv/rv32 riscv/rv64 s390/s390-64 sparc/sparc64

So on 2.30 we have a different problem where different architectures used
different timers for 'time' and it was fixed on 2.31 to always use 
CLOCK_REALTIME_COARSE.  So this problem *always* existed (which also makes
ponder if this bug is really urgent as you and Bruno are trying to pose by
pushing multiple fixes on different projects).
Comment 13 Bruno Haible 2023-03-21 15:29:51 UTC
(In reply to Adhemerval Zanella from comment #12)
> So this problem *always* existed

From "This problem always existed" you cannot imply "This problem never caused trouble, otherwise it would have been reported".

In https://sourceware.org/pipermail/libc-alpha/2023-March/146219.html I explain:

On a typical system, there are
  - hundreds of programs that call gettimeofday() or timespec_get(),
  - hundreds of programs that call time().
Each time a program calls gettimeofday() or timespec_get(), and then,
within 1 to 3 milliseconds, a program (the same or a different one)
calls time(), there is the potential for trouble because the time
appears to go backwards. This trouble can occur with probability
between 0.1% and 0.3%.

This is the kind of trouble that does occur, but that people don't report because
- it occurs only 1 in 300 to 1000 times,
- it is nearly impossible to track down to glibc, since it will appear as an interoperation bug between two different programs in the first place.
Comment 14 Adhemerval Zanella 2023-03-21 16:02:36 UTC
(In reply to Bruno Haible from comment #13)
> (In reply to Adhemerval Zanella from comment #12)
> > So this problem *always* existed
> 
> From "This problem always existed" you cannot imply "This problem never
> caused trouble, otherwise it would have been reported".

I will quote myself "I completely agree with you [Paul] that this is problematic"
to make it clear I agree this is a problem.  I even sent a fix to make it use
the same clock for all time interface that does not specify the clock id [1],
although I ended up retiring it because Florian explicit say he would revert it
on RH distros because of performance implications. 

[1] https://sourceware.org/pipermail/libc-alpha/2023-March/146088.html

> 
> In https://sourceware.org/pipermail/libc-alpha/2023-March/146219.html I
> explain:
> 
> On a typical system, there are
>   - hundreds of programs that call gettimeofday() or timespec_get(),
>   - hundreds of programs that call time().
> Each time a program calls gettimeofday() or timespec_get(), and then,
> within 1 to 3 milliseconds, a program (the same or a different one)
> calls time(), there is the potential for trouble because the time
> appears to go backwards. This trouble can occur with probability
> between 0.1% and 0.3%.
> 
> This is the kind of trouble that does occur, but that people don't report
> because
> - it occurs only 1 in 300 to 1000 times,
> - it is nearly impossible to track down to glibc, since it will appear as an
> interoperation bug between two different programs in the first place.
Comment 15 Bruno Haible 2023-03-21 20:00:35 UTC
(In reply to Adhemerval Zanella from comment #14)
> Florian explicit say he would revert it
> on RH distros because of performance implications. 

If the question cannot be decided due to different distro priorities (for some distros, logging speed is super important, while for other distros, correctness and interoperability is the focus), then how about changing the patch so that it adds a glibc configure option? Such as --enable/disable-fast-time or --enable/disable-correct-time?
Comment 16 Adhemerval Zanella 2023-03-21 20:03:08 UTC
(In reply to Bruno Haible from comment #15)
> (In reply to Adhemerval Zanella from comment #14)
> > Florian explicit say he would revert it
> > on RH distros because of performance implications. 
> 
> If the question cannot be decided due to different distro priorities (for
> some distros, logging speed is super important, while for other distros,
> correctness and interoperability is the focus), then how about changing the
> patch so that it adds a glibc configure option? Such as
> --enable/disable-fast-time or --enable/disable-correct-time?

A configure switch for this specific issue is the last thing I would like to do, it adds another build permutation and will only move the issue to distribution maintainers.
Comment 17 Florian Weimer 2023-03-21 20:25:35 UTC
(In reply to Bruno Haible from comment #15)
> (In reply to Adhemerval Zanella from comment #14)
> > Florian explicit say he would revert it
> > on RH distros because of performance implications. 

I think I said, or should have said, that we would have to revert it based on customer/partner feedback if performance problems are reported (which would only happen a couple of years down the road).

> If the question cannot be decided due to different distro priorities (for
> some distros, logging speed is super important, while for other distros,
> correctness and interoperability is the focus), then how about changing the
> patch so that it adds a glibc configure option? Such as
> --enable/disable-fast-time or --enable/disable-correct-time?

This won't help you with the gnulib testsuite.

In any case, if you don't like the time-related services the kernel provides, I think you really should take this up with the kernel developers. We are just handing down what the kernel provides to us.
Comment 18 Adhemerval Zanella 2023-03-21 20:41:06 UTC
(In reply to Florian Weimer from comment #17)
> (In reply to Bruno Haible from comment #15)
> > (In reply to Adhemerval Zanella from comment #14)
> > > Florian explicit say he would revert it
> > > on RH distros because of performance implications. 
> 
> I think I said, or should have said, that we would have to revert it based
> on customer/partner feedback if performance problems are reported (which
> would only happen a couple of years down the road).

And it basically would require all the gnulib hackery to fix this issue again if this is ever done.  It also means that if gnulib is used users will need to add even extra hackery to get this time performance difference.

> 
> > If the question cannot be decided due to different distro priorities (for
> > some distros, logging speed is super important, while for other distros,
> > correctness and interoperability is the focus), then how about changing the
> > patch so that it adds a glibc configure option? Such as
> > --enable/disable-fast-time or --enable/disable-correct-time?
> 
> This won't help you with the gnulib testsuite.
> 
> In any case, if you don't like the time-related services the kernel
> provides, I think you really should take this up with the kernel developers.
> We are just handing down what the kernel provides to us.

I don't think we should follow kernel blindly here, especially if this does generate potential issues as Paul has summarized.  And users always have the option to use clock_gettime (CLOCK_REALTIME_COARSE).