[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a8f068ff-9512-5530-2f10-d2741495c4a9@nic.cz>
Date: Wed, 25 Nov 2020 18:06:47 +0100
From: Petr Špaček <petr.spacek@....cz>
To: Thomas Gleixner <tglx@...utronix.de>,
Carlos O'Donell <carlos@...hat.com>,
Zack Weinberg <zackw@...ix.com>, Cyril Hrubis <chrubis@...e.cz>
Cc: Dmitry Safonov <dima@...sta.com>, Andrei Vagin <avagin@...il.com>,
GNU C Library <libc-alpha@...rceware.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [Y2038][time namespaces] Question regarding CLOCK_REALTIME
support plans in Linux time namespaces
On 20. 11. 20 1:14, Thomas Gleixner wrote:
> On Thu, Nov 19 2020 at 13:37, Carlos O'Donell wrote:
>> On 11/6/20 7:47 PM, Thomas Gleixner wrote:
>>> Would CONFIG_DEBUG_DISTORTED_CLOCK_REALTIME be a way to go? IOW,
>>> something which is clearly in the debug section of the kernel which wont
>>> get turned on by distros (*cough*) and comes with a description that any
>>> bug reports against it vs. time correctness are going to be ignored.
>>
>> Yes. I would be requiring CONFIG_DEBUG_DISTORTED_CLOCK_REALTIME.
>>
>> Let me be clear though, the distros have *+debug kernels for which this
>> CONFIG_DEBUG_* could get turned on? In Fedora *+debug kernels we enable all
>> sorts of things like CONFIG_DEBUG_OBJECTS_* and CONFIG_DEBUG_SPINLOCK etc.
>> etc. etc.
>
> That's why I wrote '(*cough*)'. It's entirely clear to me that this
> would be enabled for whatever raisins.
>
>> I would push Fedora/RHEL to ship this in the *+debug kernels. That way I can have
>> this on for local test/build cycle. Would you be OK with that?
>
> Distros ship a lot of weird things. Though that config would be probably
> saner than some of the horrors shipped in enterprise production kernels.
>
>> We could have it disabled by default but enabled via proc like
>> unprivileged_userns_clone was at one point?
>
> Yes, that'd be mandatory. But see below.
>
>> I want to avoid accidental use in Fedora *+debug kernels unless the
>> developer is actively going to run tests that require time
>> manipulation e.g. thousands of DNSSEC tests with timeouts [1].
>
> ...
>
>> In case of DNSSEC protocol conversations have real time values in them
>> which cause "expiration", thus packet captures are useful only if real
>> time clock reflects values during the original conversation. In our case
>> packet captures come from real Internet, i.e. we do not have private
>> keys used to sign the packets, so we cannot change time values.
>>
>> This use-case also implies support for settime(): During the course of a
>> test we shorten time windows where "nothing happens" and server and
>> client are waiting for an event, e.g. for cache expiration on
>> client. This window can be hours long so it really _does_ make a
>> difference. Oh yes, and for these time jumps we need to move monotonic
>> time as well.
>
> I hope you are aware that the time namespace offsets have to be set
> _before_ the process starts and can't be changed afterwards,
> i.e. settime() is not an option.
>
> That might limit the usability for your use case and this can't be
> changed at all because there might be armed timers and other time
> related things which would start to go into full confusion mode.
>
> The supported use case is container life migration and that _is_ very
> careful about restoring time and armed timers and if their user space
> tools screw it up then they can keep the bits and pieces.
>
> So in order to utilize that you'd have to checkpoint the container,
> manipulate the offsets and restore it.
>
> The point is that on changing the time offset after the fact the kernel
> would have to chase _all_ armed timers which belong to that namespace
> and are related to the affected clock and readjust them to the new
> distortion of namespace time. Otherwise they might expire way too late
> (which is kinda ok from a correctness POV, but not what you expect) or
> too early, which is clearly a NONO. Finding them is not trivial because
> some of them are part of a syscall and on stack.
>
> What's worse is that if the host's CLOCK_REALTIME is set, then it'd have
> to go through _all_ time namespaces, adjust the offsets, find all timers
> of all tasks in each namespace.
>
> Contrary to that the real clock_settime(CLOCK_REALTIME) is not a big
> problem, simply because all it takes is to change the time and then kick
> all CPUs to reevaluate their first expiring timer. If the clock jumped
> backward then they rearm their hardware and are done, if it jumped
> forward they expire the ones which are affected and all is good.
>
> The original posix timer implementation did not have seperate time bases
> and on clock_settime() _all_ armed CLOCK_REALTIME timers in the system
> had to be chased down, reevaluated and readjusted. Guess how well that
> worked and what kind of limitation that implied.
>
> Aside of this, there are other things, e.g. file times, packet
> timestamps etc. which are based on CLOCK_REALTIME. What to do about
> them? Translate these to/from name space time or not? There is a long
> list of other horrors which are related to that.
>
> So _you_ might say, that you don't care about file times, RTC, timers
> expiring at the wrong time, packet timestamps and whatever.
>
> But then the next test dude comes around and want's to test exactly
> these interfaces and we have to slap the time namespace conversions for
> REALTIME and TAI all over the place because we already support the
> minimal thing.
>
> Can you see why this is a slippery slope and why I'm extremly reluctant
> to even provide the minimal 'distort realtime when the namespace starts'
> support?
>
>> Hopefully this ilustrates that real time name space is not "request for
>> ponny" :-)
>
> I can understand your pain and why you want to distort time, but please
> understand that timekeeping is complex. The primary focus must be
> correctness, scalability and maintainability which is already hard
> enough to achieve. Just for the perspective: It took us only 8 years to
> get the kernel halfways 2038 ready (filesystems still outstanding).
>
> So from my point of view asking for distorted time still _is_ a request
> for ponies.
>
> The fixed offsets for clock MONOTONIC/BOOTTIME are straight forward,
> absolutely make sense and they have a limited scope of exposure. clock
> REALTIME/TAI are very different beasts which entail a slew of horrors.
> Adding settime() to the mix makes it exponentially harder.
Point taken, I can see it is complex as hell. Maybe settime() would not be necessary if checkpoint+restore operation is cheap enough, assuming time jumps can be achieved by manipulating images. I will eventually explore criu.org to find out.
Thank you for your time!
--
Petr Špaček @ CZ.NIC
Powered by blists - more mailing lists