linux-kernel - Re: [PATCH 1/3] timekeeping: NMI safe converter from a given time to monotonic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1fb59dfa-1ab9-51ad-98c6-89431aa56918@linux.intel.com>
Date:   Tue, 24 Jan 2023 15:12:50 -0500
From:   "Liang, Kan" <kan.liang@...ux.intel.com>
To:     John Stultz <jstultz@...gle.com>
Cc:     peterz@...radead.org, mingo@...hat.com, tglx@...utronix.de,
        sboyd@...nel.org, linux-kernel@...r.kernel.org, eranian@...gle.com,
        namhyung@...nel.org, ak@...ux.intel.com
Subject: Re: [PATCH 1/3] timekeeping: NMI safe converter from a given time to
 monotonic



On 2023-01-24 1:43 p.m., John Stultz wrote:
> On Tue, Jan 24, 2023 at 7:09 AM Liang, Kan <kan.liang@...ux.intel.com> wrote:
>> On 2023-01-24 2:01 a.m., John Stultz wrote:
>>> On Mon, Jan 23, 2023 at 10:27 AM <kan.liang@...ux.intel.com> wrote:
>>>> +               /*
>>>> +                * Check whether the given timestamp is on the current
>>>> +                * timekeeping interval.
>>>> +                */
>>>> +               now = tk_clock_read(tkr);
>>>> +               interval_start = tkr->cycle_last;
>>>> +               if (!cycle_between(interval_start, cycles, now))
>>>> +                       return -EOPNOTSUPP;
>>>
>>> So. I've not fully thought this out, but it seems like it would be
>>> quite likely that you'd run into the case where the cycle_last value
>>> is updated and your earlier TSC timestamp isn't valid for the current
>>> interval. The get_device_system_crosststamp() logic has a big chunk of
>>> complex code to try to handle this case by interpolating the cycle
>>> value back in time. How well does just failing in this case work out?
>>>
>>
>> For the case, perf fallback to the time captured in the NMI handler, via
>> ktime_get_mono_fast_ns().
> 
> This feels like *very* subtle behavior. Maybe I'm misunderstanding,
> but the goal seems to be to have more accurate timestamps on the hw
> events, and using the captured tsc timestamp avoids the measuring
> latency reading the time again. But if every timekeeping update
> interval (~tick) you transparently get a delayed value due to the
> fallback, it makes it hard to understand which timestamps are better
> or worse. The latency between two reads may be real or it may be just
> bad luck. This doesn't intuitively seem like a great benefit over more
> consistent latency of just using the ktime_get_mono_fast()
> timestamping.

Your understand is correct. We want a more accurate timestamp for the
analysis work.

As my understanding, the timekeeping update should not be very often. If
I read the code correctly, it should happen only when adjusting NTP or
suspending/resuming. If so, I think the drawback should not impact the
normal analysis work. I will call out the drwabacks in the comments
where the function is used.

> 
>> The TSC in PEBS is captured by HW when the sample was generated. There
>> should be a small delta compared with the time captured in the NMI
>> handler. But I think the delta should be acceptable as a backup solution
>> for the most analysis cases. Also, I don't think the case (the
>> cycle_last value is updated during the monitoring) should occur very
>> often either. So I drop the history support to simplify the function.
> 
> So the reads and this function are *always* used in NMI context?   Has
> this been stressed with things like SMIs to see how it does if
> interrupted in those cases?

Yes, it's *always* and only used in NMI context.

> 
> My worry is that (as I bored everyone earlier), the
> ktime_get_*_fast_ns() interfaces already have some sharp edges and
> need a fair amount of thought as to when they should be used. This is
> sort of compounding that adding an interface that has further special
> cases where it can fail, making it difficult to fully understand and
> easier to accidentally misuse.
> 
> My other concern is that interfaces always get stretched and used
> beyond anything they were initially planned for (see the
> ktime_get_*fast_ns() interfaces here as an example! :), and in this
> case the logic seems to have lots of implicit dependencies on the
> facts of your specific use case, so it seems a bit fragile should
> folks on other architectures with other constraints try to use it.
> 
> So I just want to push a bit to think how you might be able to
> extend/generalize the existing get_system_crosststamp for your
> purposes, or alternatively find a way to simplify the logic's behavior
> so its less tied to specific constraints ("this works most of the time
> from NMI, but otherwise no promises").  Or at least some better
> documentation around the code, its uses and its constraints? ( "NMI
> safe" is not the same as "Only safe to use from NMI" :)

Since our usage is fixed (only in NMI), I prefer the latter. I think
extending/generalizing the existing function only makes the function
extremely complex and low efficient. The new function should have the
same constraints as the existing ktime_get_mono_fast_ns(). Since perf
can live with the ktime_get_mono_fast_ns(), there should be no problem
with the new function for the constraints. I will add more comments to
clarify the usage and constraints to avoid the abuse of the new function.

Thanks,
Kan