netdev - Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2de9275f-b344-4a76-897b-52d5f4bdca59@opensynergy.com>
Date: Fri, 28 Jun 2024 18:38:15 +0200
From: Peter Hilber <peter.hilber@...nsynergy.com>
To: David Woodhouse <dwmw2@...radead.org>, linux-kernel@...r.kernel.org,
 virtualization@...ts.linux.dev, linux-arm-kernel@...ts.infradead.org,
 linux-rtc@...r.kernel.org, "Ridoux, Julien" <ridouxj@...zon.com>,
 virtio-dev@...ts.linux.dev, "Luu, Ryan" <rluu@...zon.com>
Cc: "Christopher S. Hall" <christopher.s.hall@...el.com>,
 Jason Wang <jasowang@...hat.com>, John Stultz <jstultz@...gle.com>,
 "Michael S. Tsirkin" <mst@...hat.com>, netdev@...r.kernel.org,
 Richard Cochran <richardcochran@...il.com>, Stephen Boyd <sboyd@...nel.org>,
 Thomas Gleixner <tglx@...utronix.de>, Xuan Zhuo
 <xuanzhuo@...ux.alibaba.com>, Marc Zyngier <maz@...nel.org>,
 Mark Rutland <mark.rutland@....com>,
 Daniel Lezcano <daniel.lezcano@...aro.org>,
 Alessandro Zummo <a.zummo@...ertech.it>,
 Alexandre Belloni <alexandre.belloni@...tlin.com>
Subject: Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

On 28.06.24 14:15, David Woodhouse wrote:
> On Fri, 2024-06-28 at 13:33 +0200, Peter Hilber wrote:
>> On 27.06.24 16:52, David Woodhouse wrote:
>>> I already added a flags field, so this might look something like:
>>>
>>>         /*
>>>          * Smearing flags. The UTC clock exposed through this structure
>>>          * is only ever true UTC, but a guest operating system may
>>>          * choose to offer a monotonic smeared clock to its users. This
>>>          * merely offers a hint about what kind of smearing to perform,
>>>          * for consistency with systems in the nearby environment.
>>>          */
>>> #define VMCLOCK_FLAGS_SMEAR_UTC_SLS (1<<5) /* draft-kuhn-leapsecond-00.txt */
>>>
>>> (UTC-SLS is probably a bad example but are there formal definitions for
>>> anything else?)
>>
>> I think it could also be more generic, like flags for linear smearing,
>> cosine smearing(?), and smear_start_sec and smear_end_sec fields (relative
>> to the leap second start). That could also represent UTC-SLS, and
>> noon-to-noon, and it would be well-defined.
>>
>> This should reduce the likelihood that the guest doesn't know the smearing
>> variant.
> 
> I'm wary of making it too generic. That would seem to encourage a
> *proliferation* of false "UTC-like" clocks.
> 
> It's bad enough that we do smearing at all, let alone that we don't
> have a single definition of how to do it.
> 
> I made the smearing hint a full uint8_t instead of using bits in flags,
> in the end. That gives us a full 255 ways of lying to users about what
> the time is, so we're unlikely to run out. And it's easy enough to add
> a new VMCLOCK_SMEARING_XXX type to the 'registry' for any new methods
> that get invented.
> 
> 

My concern is that the registry update may come after a driver has already
been implemented, so that it may be hard to ensure that the smearing which
has been chosen is actually implemented.

>>>>> +       /*
>>>>> +        * This field changes to another non-repeating value when the CPU
>>>>> +        * counter is disrupted, for example on live migration.
>>>>> +        */
>>>>> +       uint64_t disruption_marker;
>>>>
>>>> The field could also change when the clock is stepped (leap seconds
>>>> excepted), or when the clock frequency is slewed.
>>>
>>> I'm not sure. The concept of the disruption marker is that it tells the
>>> guest to throw away any calibration of the counter that the guest has
>>> done for *itself* (with NTP, other PTP devices, etc.).
>>>
>>> One mode for this device would be not to populate the clock fields at
>>> all, but *only* to signal disruption when it occurs. So the guest can
>>> abort transactions until it's resynced its clocks (to avoid incurring
>>> fines if breaking databases, etc.).
>>>
>>> Exposing the host timekeeping through the structure means that the
>>> migrated guest can keep working because it can trust the timekeeping
>>> performed by the (new) host and exposed to it.
>>>
>>> If the counter is actually varying in frequency over time, and the host
>>> is slewing the clock frequency that it reports, that *isn't* a step
>>> change and doesn't mean that the guest should throw away any
>>> calibration that it's been doing for itself. One hopes that the guest
>>> would have detected the *same* frequency change, and be adapting for
>>> itself. So I don't think that should indicate a disruption.
>>>
>>> I think the same is even true if the clock is stepped by the host. The
>>> actual *counter* hasn't changed, so the guest is better off ignoring
>>> the vacillating host and continuing to derive its idea of time from the
>>> hardware counter itself, as calibrated against some external NTP/PTP
>>> sources. Surely we actively *don't* to tell the guest to throw its own
>>> calibrations away, in this case?
>>
>> In case the guest is also considering other time sources, it might indeed
>> not be a good idea to mix host clock changes into the hardware counter
>> disruption marker.
>>
>> But if the vmclock is the authoritative source of time, it can still be
>> helpful to know about such changes, maybe through another marker.
> 
> Could that be the existing seq_count field?
> 
> Skewing the counter_period_frac_sec as the underlying oscillator speeds
> up and slows down is perfectly normal and expected, and we already
> expect the seq_count to change when that happens.
> 
> Maybe step changes are different, but arguably if the time advertised
> by the host steps *outside* the error bounds previously advertised,
> that's just broken?

But the error bounds could be large or missing. I am trying to address use
cases where the host steps or slews the clock as well.

> 
> Depending on how the clock information is fed, a change in seq_count
> may even result in non-monotonicity. If the underlying oscillator has
> sped up and the structure is updated accordingly, the time calculated
> the moment *before* that update may appear later than the time
> calculated immediately after it.
> 
> It's up to the guest operating system to feed that information into its
> own timekeeping system and skew towards correctness instead of stepping
> the time it reports to its users.
> 

The guest can anyway infer from the other information that the clock
changed, so my proposal might not be that useful. Maybe it can be added in
a future version if there is any need.