linux-kernel - Re: [PATCH 1/3] timekeeping: NMI safe converter from a given time to monotonic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANDhNCqxOurpjKQaEEgxdFYV8kQQHVUNZUy1B5AewSi8bys8iQ@mail.gmail.com>
Date:   Tue, 24 Jan 2023 14:40:37 -0800
From:   John Stultz <jstultz@...gle.com>
To:     "Liang, Kan" <kan.liang@...ux.intel.com>
Cc:     Stephane Eranian <eranian@...gle.com>, peterz@...radead.org,
        mingo@...hat.com, tglx@...utronix.de, sboyd@...nel.org,
        linux-kernel@...r.kernel.org, namhyung@...nel.org,
        ak@...ux.intel.com
Subject: Re: [PATCH 1/3] timekeeping: NMI safe converter from a given time to monotonic

On Tue, Jan 24, 2023 at 2:08 PM Liang, Kan <kan.liang@...ux.intel.com> wrote:
> On 2023-01-24 3:33 p.m., John Stultz wrote:
> > On Tue, Jan 24, 2023 at 12:13 PM Liang, Kan <kan.liang@...ux.intel.com> wrote:
> >> On 2023-01-24 1:43 p.m., John Stultz wrote:
> >> I read the code correctly, it should happen only when adjusting NTP or
> >> suspending/resuming. If so, I think the drawback should not impact the
> >> normal analysis work. I will call out the drwabacks in the comments
> >> where the function is used.
> > So the adjustments are done at tick time depending on the current NTP
> > "error" (basically what the kernel tracks as the delta from its sense
> > of what NTP has told us).
> >
> > Not just at the time when ntp makes an adjustment.
> >
> > So the window for it to happen is every timekeeping update (which is ~HZ).
>
> You mean the tkf->base is updated in each tick?

Right, every call to timekeeping_update(), which is called for every
major state change in time along with regular updates via
timekeeping_advance() which is called from update_wall_time() which
gets called from the various tick handlers.

> If so, that should be a problem.
>
> Does the NTP "error" consistently happen on all the platforms? Or just
> on some platforms which we can check and limit the usage?

Basically any platform where adjtimex was ever called.  Both offset
and freq adjustments will trigger this. The trouble is that the
adjustments requested are finer grained than what the actual
clocksources mult values can be adjusted for. So the kernel tracks the
error internally and will oscillate the clocksource multiplier value
to keep the long-term accuracy close to what was requested.

> There are two configurations for PEBS, single PEBS, and large PEBS.
> For the single PEBS mode, HW triggers a NMI for each record. The TSC is
> the time which the record is generated, and we convert it in the same
> NMI. I don't think the NTP "error" can impact it.

If the clocksource is being adjusted in between reads (imagine the
freq is dropped), when using the fast accessors, time may seem to go
backwards between those reads.

In fact, if the same TSC value is used for two calls in a row, you
could see time go backwards or forwards between them if the adjustment
happened inbetween.

The regular (non-fast) clocksource accessors use the seqcount to
prevent reads from occuring while the adjustment is made from being
used (you'll note we always re-read the clocksource within the
seqcount loop) but this makes it not NMI safe.

> But for the large PEBS mode, HW triggers a NMI only when a fixed number
> of records are generated. It's very likely that the base has been
> updated between the records. That could bring troubles, since we can
> only fall back to the NMI handler time of the last record. I'm afraid we
> have to disable the large PEBS mode.
>

Right, if I recall, this is why the interpolation is done in the
existing get_device_system_crosststamp(). It's obviously never going
to be perfect as we dont' keep the timekeeper state for every tick,
but it might be better than the bailout you have if you didn't luck
into the current interval.

My apologies, as my intention isn't to dissuade you here, but just to
make sure the complexities (and unfortunately there are a lot) are
well worked out and well tested.

thanks
-john