[Y2038][time namespaces] Question regarding CLOCK_REALTIME support plans in Linux time namespaces

Petr Špaček petr.spacek@nic.cz
Wed Nov 25 17:06:47 GMT 2020


On 20. 11. 20 1:14, Thomas Gleixner wrote:
> On Thu, Nov 19 2020 at 13:37, Carlos O'Donell wrote:
>> On 11/6/20 7:47 PM, Thomas Gleixner wrote:
>>> Would CONFIG_DEBUG_DISTORTED_CLOCK_REALTIME be a way to go? IOW,
>>> something which is clearly in the debug section of the kernel which wont
>>> get turned on by distros (*cough*) and comes with a description that any
>>> bug reports against it vs. time correctness are going to be ignored.
>>
>> Yes. I would be requiring CONFIG_DEBUG_DISTORTED_CLOCK_REALTIME.
>>
>> Let me be clear though, the distros have *+debug kernels for which this
>> CONFIG_DEBUG_* could get turned on? In Fedora *+debug kernels we enable all
>> sorts of things like CONFIG_DEBUG_OBJECTS_* and CONFIG_DEBUG_SPINLOCK etc.
>> etc. etc.
> 
> That's why I wrote '(*cough*)'. It's entirely clear to me that this
> would be enabled for whatever raisins.
> 
>> I would push Fedora/RHEL to ship this in the *+debug kernels. That way I can have
>> this on for local test/build cycle. Would you be OK with that?
> 
> Distros ship a lot of weird things. Though that config would be probably
> saner than some of the horrors shipped in enterprise production kernels.
> 
>> We could have it disabled by default but enabled via proc like
>> unprivileged_userns_clone was at one point?
> 
> Yes, that'd be mandatory. But see below.
> 
>> I want to avoid accidental use in Fedora *+debug kernels unless the
>> developer is actively going to run tests that require time
>> manipulation e.g. thousands of DNSSEC tests with timeouts [1].
> 
> ...
> 
>> In case of DNSSEC protocol conversations have real time values in them
>> which cause "expiration", thus packet captures are useful only if real
>> time clock reflects values during the original conversation. In our case
>> packet captures come from real Internet, i.e. we do not have private
>> keys used to sign the packets, so we cannot change time values.
>>
>> This use-case also implies support for settime(): During the course of a
>> test we shorten time windows where "nothing happens" and server and
>> client are waiting for an event, e.g. for cache expiration on
>> client. This window can be hours long so it really _does_ make a
>> difference. Oh yes, and for these time jumps we need to move monotonic
>> time as well.
> 
> I hope you are aware that the time namespace offsets have to be set
> _before_ the process starts and can't be changed afterwards,
> i.e. settime() is not an option.
> 
> That might limit the usability for your use case and this can't be
> changed at all because there might be armed timers and other time
> related things which would start to go into full confusion mode.
> 
> The supported use case is container life migration and that _is_ very
> careful about restoring time and armed timers and if their user space
> tools screw it up then they can keep the bits and pieces.
> 
> So in order to utilize that you'd have to checkpoint the container,
> manipulate the offsets and restore it.
> 
> The point is that on changing the time offset after the fact the kernel
> would have to chase _all_ armed timers which belong to that namespace
> and are related to the affected clock and readjust them to the new
> distortion of namespace time. Otherwise they might expire way too late
> (which is kinda ok from a correctness POV, but not what you expect) or
> too early, which is clearly a NONO. Finding them is not trivial because
> some of them are part of a syscall and on stack.
> 
> What's worse is that if the host's CLOCK_REALTIME is set, then it'd have
> to go through _all_ time namespaces, adjust the offsets, find all timers
> of all tasks in each namespace.
> 
> Contrary to that the real clock_settime(CLOCK_REALTIME) is not a big
> problem, simply because all it takes is to change the time and then kick
> all CPUs to reevaluate their first expiring timer. If the clock jumped
> backward then they rearm their hardware and are done, if it jumped
> forward they expire the ones which are affected and all is good.
> 
> The original posix timer implementation did not have seperate time bases
> and on clock_settime() _all_ armed CLOCK_REALTIME timers in the system
> had to be chased down, reevaluated and readjusted. Guess how well that
> worked and what kind of limitation that implied.
> 
> Aside of this, there are other things, e.g. file times, packet
> timestamps etc. which are based on CLOCK_REALTIME. What to do about
> them? Translate these to/from name space time or not? There is a long
> list of other horrors which are related to that.
> 
> So _you_ might say, that you don't care about file times, RTC, timers
> expiring at the wrong time, packet timestamps and whatever.
> 
> But then the next test dude comes around and want's to test exactly
> these interfaces and we have to slap the time namespace conversions for
> REALTIME and TAI all over the place because we already support the
> minimal thing.
> 
> Can you see why this is a slippery slope and why I'm extremly reluctant
> to even provide the minimal 'distort realtime when the namespace starts'
> support?
> 
>> Hopefully this ilustrates that real time name space is not "request for
>> ponny" :-)
> 
> I can understand your pain and why you want to distort time, but please
> understand that timekeeping is complex. The primary focus must be
> correctness, scalability and maintainability which is already hard
> enough to achieve. Just for the perspective: It took us only 8 years to
> get the kernel halfways 2038 ready (filesystems still outstanding).
> 
> So from my point of view asking for distorted time still _is_ a request
> for ponies.
> 
> The fixed offsets for clock MONOTONIC/BOOTTIME are straight forward,
> absolutely make sense and they have a limited scope of exposure. clock
> REALTIME/TAI are very different beasts which entail a slew of horrors.
> Adding settime() to the mix makes it exponentially harder.

Point taken, I can see it is complex as hell. Maybe settime() would not be necessary if checkpoint+restore operation is cheap enough, assuming time jumps can be achieved by manipulating images. I will eventually explore criu.org to find out.

Thank you for your time!

-- 
Petr Špaček  @  CZ.NIC


More information about the Libc-alpha mailing list