Created attachment 14731 [details] test program The attached test program invokes first timespec_get, and then time(NULL). With a small probability (ca. 0.1% to 0.3%), the second call returns a seconds value that is smaller than the one from the first call. Since ISO C23 § 7.29.2.6 says "If base is TIME_UTC , the tv_sec member is set to the number of seconds since an implementation-defined epoch, truncated to a whole value and the tv_nsec member is set to the integral number of nanoseconds, rounded to the resolution of the system clock." and ISO C23 § 7.29.2.5 says "The time function returns the implementation’s best approximation to the current calendar time." and POSIX <https://pubs.opengroup.org/onlinepubs/9699919799/functions/time.html> says "The time() function shall return the value of time in seconds since the Epoch." my expectation is that - both ts1.tv_sec and tt2 denote seconds since the epoch, - tt2 is not smaller than ts1.tv_sec (because if it were, tt2+1 would be a better approximation to the current time than tt2). The test program, however, reports ts1.tv_sec > tt2 around 12 to 40 times in 10000 runs. How to reproduce: $ gcc -Wall -g -O2 foo.c $ for i in `seq 10000`; do ./a.out ; done Example output: ts1 = { tv_sec = 1678035679, tv_nsec = 343581 } tt2 = 1678035678 Aborted ts1 = { tv_sec = 1678035679, tv_nsec = 1099393 } tt2 = 1678035678 Aborted ts1 = { tv_sec = 1678035679, tv_nsec = 2502625 } tt2 = 1678035678 Aborted ts1 = { tv_sec = 1678035680, tv_nsec = 601354 } tt2 = 1678035679 Aborted ts1 = { tv_sec = 1678035680, tv_nsec = 1543640 } tt2 = 1678035679 Aborted ts1 = { tv_sec = 1678035680, tv_nsec = 2320893 } tt2 = 1678035679 Aborted ts1 = { tv_sec = 1678035681, tv_nsec = 630264 } tt2 = 1678035680 Aborted ts1 = { tv_sec = 1678035681, tv_nsec = 1556696 } tt2 = 1678035680 Aborted ts1 = { tv_sec = 1678035681, tv_nsec = 2380812 } tt2 = 1678035680 Aborted ts1 = { tv_sec = 1678035682, tv_nsec = 806262 } tt2 = 1678035681 Aborted ts1 = { tv_sec = 1678035682, tv_nsec = 1696936 } tt2 = 1678035681 Aborted ts1 = { tv_sec = 1678035682, tv_nsec = 2484177 } tt2 = 1678035681 Aborted ts1 = { tv_sec = 1678035683, tv_nsec = 242097 } tt2 = 1678035682 Aborted ts1 = { tv_sec = 1678035683, tv_nsec = 972766 } tt2 = 1678035682 Aborted ts1 = { tv_sec = 1678035683, tv_nsec = 1801143 } tt2 = 1678035682 Aborted ts1 = { tv_sec = 1678035684, tv_nsec = 221852 } tt2 = 1678035683 Aborted ts1 = { tv_sec = 1678035684, tv_nsec = 1041359 } tt2 = 1678035683 Aborted ts1 = { tv_sec = 1678035684, tv_nsec = 1822872 } tt2 = 1678035683 Aborted
Glibc version: 2.35 Kernel version: Linux 5.15.0-53-generic x86_64 (Ubuntu 22.04)
Getting the same output: $ for i in `seq 10000`; do ./a.out ; done ts1 = { tv_sec = 1678038427, tv_nsec = 701070 } tt2 = 1678038426 Aborted (core dumped) ts1 = { tv_sec = 1678038428, tv_nsec = 362023 } tt2 = 1678038427 Aborted (core dumped) ts1 = { tv_sec = 1678038429, tv_nsec = 823507 } tt2 = 1678038428 Aborted (core dumped) ts1 = { tv_sec = 1678038430, tv_nsec = 629951 } tt2 = 1678038429 Aborted (core dumped) ts1 = { tv_sec = 1678038431, tv_nsec = 543252 } tt2 = 1678038430 Aborted (core dumped) ts1 = { tv_sec = 1678038432, tv_nsec = 665985 } tt2 = 1678038431 Aborted (core dumped) ts1 = { tv_sec = 1678038433, tv_nsec = 263170 } tt2 = 1678038432 Aborted (core dumped) ts1 = { tv_sec = 1678038434, tv_nsec = 1015063 } tt2 = 1678038433 Aborted (core dumped) ts1 = { tv_sec = 1678038435, tv_nsec = 616891 } tt2 = 1678038434 Aborted (core dumped) ts1 = { tv_sec = 1678038436, tv_nsec = 421832 } tt2 = 1678038435 Aborted (core dumped) ts1 = { tv_sec = 1678038437, tv_nsec = 552004 } tt2 = 1678038436 Aborted (core dumped) ts1 = { tv_sec = 1678038438, tv_nsec = 742045 } tt2 = 1678038437 Aborted (core dumped) ts1 = { tv_sec = 1678038439, tv_nsec = 82622 } tt2 = 1678038438 Aborted (core dumped) System: (GNU libc) 2.36 Linux fedora 6.1.13-200.fc37.x86_64
That's the difference the kernel returns for CLOCK_REALTIME_COARSE (for time) and CLOCK_REALTIME (for timespec_get).
According to the kernel docs, CLOCK_REALTIME_COARSE can be up to 10ms in the past (depending on CONFIG_HZ). This is the same the old time syscall would return.
For comparison, the test program succeeds (even 100000 consecutive runs) on - musl libc (Alpine Linux 3.17) - FreeBSD 13.1 - NetBSD 9.0 - OpenBSD 7.2 - macOS 12.5 - Solaris 11.4 - GNU/Hurd
(In reply to Bruno Haible from comment #5) > For comparison, the test program succeeds (even 100000 consecutive runs) on > - musl libc (Alpine Linux 3.17) > - FreeBSD 13.1 > - NetBSD 9.0 > - OpenBSD 7.2 > - macOS 12.5 > - Solaris 11.4 > - GNU/Hurd If I recall correctly, we used CLOCK_REALTIME_COARSE for time to mimic Linux time syscall kernel behavior (which is implemented in a couple of old ABI and used historically). Recently on 64 time_t time, we changed time to use clock_gettime and kept the same clock for compatibility. I agree that it makes sense to use the same clock for time and timespec_get, however, using CLOCK_REALTIME would make the vDSO optimization for x86_64 and powerpc64 unusable (not a big deal IMHO). And another question is whether we will need to add a compact symbol to time (and I also do not think it would make sense either).
(In reply to Adhemerval Zanella from comment #6) > I agree that it makes sense to use the same clock for time and timespec_get, > however, using CLOCK_REALTIME would make the vDSO optimization for x86_64 > and powerpc64 unusable (not a big deal IMHO). And another question is > whether we will need to add a compact symbol to time (and I also do not > think it would make sense either). CLOCK_REALTIME still has vDSO acceleration in many cases on x86-64, right? It's just a bit slower, and the difference can matter to some logging frameworks if I recall correctly. Bruno, the standard sources you cite suggest that we should round the seconds since epoch, and not (mostly) truncate it. So the standards seem to permit such divergence already (although the rounding it as suggested would move the less precise timestamp forward, not backwards).
(In reply to Florian Weimer from comment #7) > the standards seem to permit such divergence already The standards even permit the use of two different epochs, e.g. 1970-01-01 for time() and 2001-09-17 for timespec_get(). But it would be practical nonsense to do so. There are three ways to get the clock with 1-second resolution using simple ISO C / POSIX APIs: - time(NULL) - gettimeofday().tv_sec - timespec_get(TIME_UTC).tv_sec If using one of these then another one of these produces backwards progression, how are the applications supposed to behave in a sane way? In other words, this bug report is not so much about standards compliance, but about doing something that is practically sensible.
(In reply to Florian Weimer from comment #7) > (In reply to Adhemerval Zanella from comment #6) > > I agree that it makes sense to use the same clock for time and timespec_get, > > however, using CLOCK_REALTIME would make the vDSO optimization for x86_64 > > and powerpc64 unusable (not a big deal IMHO). And another question is > > whether we will need to add a compact symbol to time (and I also do not > > think it would make sense either). > > CLOCK_REALTIME still has vDSO acceleration in many cases on x86-64, right? > It's just a bit slower, and the difference can matter to some logging > frameworks if I recall correctly. Yes, it will follow the usual route of calling __clock_gettime, which in turn will call the vDSO. > There are three ways to get the clock with 1-second resolution using simple ISO C / POSIX APIs: > - time(NULL) > - gettimeofday().tv_sec > - timespec_get(TIME_UTC).tv_sec > If using one of these then another one of these produces backwards progression, how are the applications supposed to behave in a sane way? > > In other words, this bug report is not so much about standards compliance, but > about doing something that is practically sensible. I tend to agree, I see no reason but performance implications to have different clocks for similar interfaces.
(In reply to Bruno Haible from comment #8) > There are three ways to get the clock with 1-second resolution using simple > ISO C / POSIX APIs: > - time(NULL) > - gettimeofday().tv_sec > - timespec_get(TIME_UTC).tv_sec > If using one of these then another one of these produces backwards > progression, how are the applications supposed to behave in a sane way? Is this actually causing application problems, or is this just a synthetic test case that doesn't take different precision and rounding issue into account?
(In reply to Florian Weimer from comment #10) > Is this actually causing application problems...? In theory any sequence of timestamps is something that a (perverse) sysadmin could inflict on the user by constantly futzing with the system clock, and hence applications that generate out-of-order timestamps are allowed by POSIX. In practice out-of-order timestamps are problematic. Users will likely be confused, for example, if a log of a thread's actions contain out-of-order timestamps even when the system clock was not reset. I looked at a few applications. * Git can issue out-of-order timestamps. It's complicated enough that I judged it simpler to fix the problem than worry about how serious it is, and filed a proposed patch to do that <https://lore.kernel.org/git/20230319064353.686226-1-eggert@cs.ucla.edu/T/#u>. * GNU Make can issue out-of-order timestamps in a debugging log - not a serious problem but it could be annoying. I filed a proposed patch <https://lists.gnu.org/r/bug-make/2023-03/msg00081.html>. * Gnulib can generate out-of-order timestamps when parsing old-fashioned time strings, causing glitches in Gnulib-using commands like 'touch 01010000 file'. This is not serious (hardly anybody uses those old POSIX-specified strings that do not specify year numbers) and I don't know whether it's a POSIX violation. I fixed that problem <https://lists.gnu.org/r/bug-gnulib/2023-03/msg00028.html> and the fix should propagate into future releases of Coreutils etc. * GNU Emacs proper does not have the problem with Glibc (it may have it with MS-Windows but that’s out of scope here). One Emacs auxiliary program ‘movemail’ may have the issue since it does call ‘time’. I haven’t investigated further. The result of this brief investigation: * There are problems. * Effects observed so far are small (e.g., logs out of order). * GNU Make (my first worry, due to its extensive use of timestamps) seems to be mostly OK. * Any big effects are surely either rare, or hard to trace back to the problem source (or both); if big effects were common and easily traced back, we’d know already. As an application developer I’d rather see ‘time’ fixed to match ‘gettimeofday’ / CLOCK_REALTIME / etc., as I worry that there might be larger, rare effects that this little survey didn’t catch.
Thanks for the detailed (In reply to eggert from comment #11) > (In reply to Florian Weimer from comment #10) > > Is this actually causing application problems...? > > In theory any sequence of timestamps is something that a (perverse) sysadmin > could inflict on the user by constantly futzing with the system clock, and > hence applications that generate out-of-order timestamps are allowed by > POSIX. > > In practice out-of-order timestamps are problematic. Users will likely be > confused, for example, if a log of a thread's actions contain out-of-order > timestamps even when the system clock was not reset. > > I looked at a few applications. > > * Git can issue out-of-order timestamps. It's complicated enough that I > judged it simpler to fix the problem than worry about how serious it is, and > filed a proposed patch to do that > <https://lore.kernel.org/git/20230319064353.686226-1-eggert@cs.ucla.edu/T/ > #u>. > > * GNU Make can issue out-of-order timestamps in a debugging log - not a > serious problem but it could be annoying. I filed a proposed patch > <https://lists.gnu.org/r/bug-make/2023-03/msg00081.html>. > > * Gnulib can generate out-of-order timestamps when parsing old-fashioned > time strings, causing glitches in Gnulib-using commands like 'touch 01010000 > file'. This is not serious (hardly anybody uses those old POSIX-specified > strings that do not specify year numbers) and I don't know whether it's a > POSIX violation. I fixed that problem > <https://lists.gnu.org/r/bug-gnulib/2023-03/msg00028.html> and the fix > should propagate into future releases of Coreutils etc. > > * GNU Emacs proper does not have the problem with Glibc (it may have it with > MS-Windows but that’s out of scope here). One Emacs auxiliary program > ‘movemail’ may have the issue since it does call ‘time’. I haven’t > investigated further. > > The result of this brief investigation: > > * There are problems. > > * Effects observed so far are small (e.g., logs out of order). > > * GNU Make (my first worry, due to its extensive use of timestamps) seems to > be mostly OK. > > * Any big effects are surely either rare, or hard to trace back to the > problem source (or both); if big effects were common and easily traced back, > we’d know already. > > As an application developer I’d rather see ‘time’ fixed to match > ‘gettimeofday’ / CLOCK_REALTIME / etc., as I worry that there might be > larger, rare effects that this little survey didn’t catch. Thanks for the detailed analysis Paul, and I completely agree with you that this is problematic and I really can't see why Florian sees the performance the penalty is worth all this trouble (he even hinted that if we ever get this upstream he would need to revert to RHEL and related distros). Specially because if the user really wants to, it can simply do: #define time(__ptr) \ {( \ struct timespec __ts; \ clock_gettime (CLOCK_REALTIME_COARSE, &__ts); \ __ts.tv_sec; \ )} On most architectures that provide vDSO, this should only be marginally slower than time. Also, the problem is not really specific for glibc 2.31+. Rather, on glibc 2.30 and older, the time syscall is used if present (which always uses the coarse timer), otherwise gettimeofday is used. The time syscall is supported on: $ python sysdeps/unix/sysv/linux/glibcsyscalls.py query-syscall time time: defined: hppa i386 m68k microblaze mips/mips32 powerpc/powerpc32 powerpc/powerpc64 s390/s390-32 sh sparc/sparc32 x86_64/64 x86_64/x32 undefined: aarch64 alpha arc arm csky ia64 loongarch mips/mips64/n32 mips/mips64/n64 nios2 or1k riscv/rv32 riscv/rv64 s390/s390-64 sparc/sparc64 So on 2.30 we have a different problem where different architectures used different timers for 'time' and it was fixed on 2.31 to always use CLOCK_REALTIME_COARSE. So this problem *always* existed (which also makes ponder if this bug is really urgent as you and Bruno are trying to pose by pushing multiple fixes on different projects).
(In reply to Adhemerval Zanella from comment #12) > So this problem *always* existed From "This problem always existed" you cannot imply "This problem never caused trouble, otherwise it would have been reported". In https://sourceware.org/pipermail/libc-alpha/2023-March/146219.html I explain: On a typical system, there are - hundreds of programs that call gettimeofday() or timespec_get(), - hundreds of programs that call time(). Each time a program calls gettimeofday() or timespec_get(), and then, within 1 to 3 milliseconds, a program (the same or a different one) calls time(), there is the potential for trouble because the time appears to go backwards. This trouble can occur with probability between 0.1% and 0.3%. This is the kind of trouble that does occur, but that people don't report because - it occurs only 1 in 300 to 1000 times, - it is nearly impossible to track down to glibc, since it will appear as an interoperation bug between two different programs in the first place.
(In reply to Bruno Haible from comment #13) > (In reply to Adhemerval Zanella from comment #12) > > So this problem *always* existed > > From "This problem always existed" you cannot imply "This problem never > caused trouble, otherwise it would have been reported". I will quote myself "I completely agree with you [Paul] that this is problematic" to make it clear I agree this is a problem. I even sent a fix to make it use the same clock for all time interface that does not specify the clock id [1], although I ended up retiring it because Florian explicit say he would revert it on RH distros because of performance implications. [1] https://sourceware.org/pipermail/libc-alpha/2023-March/146088.html > > In https://sourceware.org/pipermail/libc-alpha/2023-March/146219.html I > explain: > > On a typical system, there are > - hundreds of programs that call gettimeofday() or timespec_get(), > - hundreds of programs that call time(). > Each time a program calls gettimeofday() or timespec_get(), and then, > within 1 to 3 milliseconds, a program (the same or a different one) > calls time(), there is the potential for trouble because the time > appears to go backwards. This trouble can occur with probability > between 0.1% and 0.3%. > > This is the kind of trouble that does occur, but that people don't report > because > - it occurs only 1 in 300 to 1000 times, > - it is nearly impossible to track down to glibc, since it will appear as an > interoperation bug between two different programs in the first place.
(In reply to Adhemerval Zanella from comment #14) > Florian explicit say he would revert it > on RH distros because of performance implications. If the question cannot be decided due to different distro priorities (for some distros, logging speed is super important, while for other distros, correctness and interoperability is the focus), then how about changing the patch so that it adds a glibc configure option? Such as --enable/disable-fast-time or --enable/disable-correct-time?
(In reply to Bruno Haible from comment #15) > (In reply to Adhemerval Zanella from comment #14) > > Florian explicit say he would revert it > > on RH distros because of performance implications. > > If the question cannot be decided due to different distro priorities (for > some distros, logging speed is super important, while for other distros, > correctness and interoperability is the focus), then how about changing the > patch so that it adds a glibc configure option? Such as > --enable/disable-fast-time or --enable/disable-correct-time? A configure switch for this specific issue is the last thing I would like to do, it adds another build permutation and will only move the issue to distribution maintainers.
(In reply to Bruno Haible from comment #15) > (In reply to Adhemerval Zanella from comment #14) > > Florian explicit say he would revert it > > on RH distros because of performance implications. I think I said, or should have said, that we would have to revert it based on customer/partner feedback if performance problems are reported (which would only happen a couple of years down the road). > If the question cannot be decided due to different distro priorities (for > some distros, logging speed is super important, while for other distros, > correctness and interoperability is the focus), then how about changing the > patch so that it adds a glibc configure option? Such as > --enable/disable-fast-time or --enable/disable-correct-time? This won't help you with the gnulib testsuite. In any case, if you don't like the time-related services the kernel provides, I think you really should take this up with the kernel developers. We are just handing down what the kernel provides to us.
(In reply to Florian Weimer from comment #17) > (In reply to Bruno Haible from comment #15) > > (In reply to Adhemerval Zanella from comment #14) > > > Florian explicit say he would revert it > > > on RH distros because of performance implications. > > I think I said, or should have said, that we would have to revert it based > on customer/partner feedback if performance problems are reported (which > would only happen a couple of years down the road). And it basically would require all the gnulib hackery to fix this issue again if this is ever done. It also means that if gnulib is used users will need to add even extra hackery to get this time performance difference. > > > If the question cannot be decided due to different distro priorities (for > > some distros, logging speed is super important, while for other distros, > > correctness and interoperability is the focus), then how about changing the > > patch so that it adds a glibc configure option? Such as > > --enable/disable-fast-time or --enable/disable-correct-time? > > This won't help you with the gnulib testsuite. > > In any case, if you don't like the time-related services the kernel > provides, I think you really should take this up with the kernel developers. > We are just handing down what the kernel provides to us. I don't think we should follow kernel blindly here, especially if this does generate potential issues as Paul has summarized. And users always have the option to use clock_gettime (CLOCK_REALTIME_COARSE).