[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <01434319-3828-0d50-4274-8f95b18dafc1@linux.intel.com>
Date: Wed, 25 Jan 2023 11:44:05 -0500
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: John Stultz <jstultz@...gle.com>
Cc: Stephane Eranian <eranian@...gle.com>, peterz@...radead.org,
mingo@...hat.com, tglx@...utronix.de, sboyd@...nel.org,
linux-kernel@...r.kernel.org, namhyung@...nel.org,
ak@...ux.intel.com
Subject: Re: [PATCH 1/3] timekeeping: NMI safe converter from a given time to
monotonic
On 2023-01-24 5:40 p.m., John Stultz wrote:
> On Tue, Jan 24, 2023 at 2:08 PM Liang, Kan <kan.liang@...ux.intel.com> wrote:
>> On 2023-01-24 3:33 p.m., John Stultz wrote:
>>> On Tue, Jan 24, 2023 at 12:13 PM Liang, Kan <kan.liang@...ux.intel.com> wrote:
>>>> On 2023-01-24 1:43 p.m., John Stultz wrote:
>>>> I read the code correctly, it should happen only when adjusting NTP or
>>>> suspending/resuming. If so, I think the drawback should not impact the
>>>> normal analysis work. I will call out the drwabacks in the comments
>>>> where the function is used.
>>> So the adjustments are done at tick time depending on the current NTP
>>> "error" (basically what the kernel tracks as the delta from its sense
>>> of what NTP has told us).
>>>
>>> Not just at the time when ntp makes an adjustment.
>>>
>>> So the window for it to happen is every timekeeping update (which is ~HZ).
>>
>> You mean the tkf->base is updated in each tick?
>
> Right, every call to timekeeping_update(), which is called for every
> major state change in time along with regular updates via
> timekeeping_advance() which is called from update_wall_time() which
> gets called from the various tick handlers.
>
>> If so, that should be a problem.
>>
>> Does the NTP "error" consistently happen on all the platforms? Or just
>> on some platforms which we can check and limit the usage?
>
> Basically any platform where adjtimex was ever called. Both offset
> and freq adjustments will trigger this. The trouble is that the
> adjustments requested are finer grained than what the actual
> clocksources mult values can be adjusted for. So the kernel tracks the
> error internally and will oscillate the clocksource multiplier value
> to keep the long-term accuracy close to what was requested.
>
>> There are two configurations for PEBS, single PEBS, and large PEBS.
>> For the single PEBS mode, HW triggers a NMI for each record. The TSC is
>> the time which the record is generated, and we convert it in the same
>> NMI. I don't think the NTP "error" can impact it.
>
> If the clocksource is being adjusted in between reads (imagine the
> freq is dropped), when using the fast accessors, time may seem to go
> backwards between those reads.
>
> In fact, if the same TSC value is used for two calls in a row, you
> could see time go backwards or forwards between them if the adjustment
> happened inbetween.
>
> The regular (non-fast) clocksource accessors use the seqcount to
> prevent reads from occuring while the adjustment is made from being
> used (you'll note we always re-read the clocksource within the
> seqcount loop) but this makes it not NMI safe.
>
>
>> But for the large PEBS mode, HW triggers a NMI only when a fixed number
>> of records are generated. It's very likely that the base has been
>> updated between the records. That could bring troubles, since we can
>> only fall back to the NMI handler time of the last record. I'm afraid we
>> have to disable the large PEBS mode.
>>
>
> Right, if I recall, this is why the interpolation is done in the
> existing get_device_system_crosststamp(). It's obviously never going
> to be perfect as we dont' keep the timekeeper state for every tick,
> but it might be better than the bailout you have if you didn't luck
> into the current interval.
>
> My apologies, as my intention isn't to dissuade you here, but just to
> make sure the complexities (and unfortunately there are a lot) are
> well worked out and well tested.
>
Thank you very much for your patience and all the details, I really
appreciate it. I think I understand the problem and the concerns now. I
will reevaluate the current design and try to find a proper solution.
Thanks,
Kan
Powered by blists - more mailing lists