[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL715WJiUYpxRRqs3FNiLiS8b6=4Pm5K0u==S6t5NYi0p=vutw@mail.gmail.com>
Date: Wed, 23 Aug 2023 11:28:17 -0700
From: Mingwei Zhang <mizhang@...gle.com>
To: Like Xu <like.xu.linux@...il.com>
Cc: Sean Christopherson <seanjc@...gle.com>,
Roman Kagan <rkagan@...zon.de>,
Jim Mattson <jmattson@...gle.com>,
Paolo Bonzini <pbonzini@...hat.com>,
Eric Hankland <ehankland@...gle.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] KVM: x86: vPMU: truncate counter value to allowed width
On Tue, Aug 22, 2023 at 2:30 AM Like Xu <like.xu.linux@...il.com> wrote:
>
> On 1/7/2023 5:26 am, Sean Christopherson wrote:
> > Ugh, yeah, de0f619564f4 created a bit of a mess. The underlying issue that it
> > was solving is that perf_event_read_value() and friends might sleep (yay mutex),
> > and so can't be called from KVM's fastpath (IRQs disabled).
> Updating pmu counters for emulated instructions cause troubles.
>
> >
> > However, detecting overflow requires reading perf_event_read_value() to gather
> > the accumulated count from the hardware event in order to add it to the emulated
> > count from software. E.g. if pmc->counter is X and the perf event counter is Y,
> > KVM needs to factor in Y because X+Y+1 might overflow even if X+1 does not.
> >
> > Trying to snapshot the previous counter value is a bit of a mess. It could probably
> > made to work, but it's hard to reason about what the snapshot actually contains
> > and when it should be cleared, especially when factoring in the wrapping logic.
> >
> > Rather than snapshot the previous counter, I think it makes sense to:
> >
> > 1) Track the number of emulated counter events
>
> If events are counted separately, the challenge here is to correctly time
> the emulation of counter overflows, which can occur on both sides of the
> counter values out of sync.
>
> > 2) Accumulate and reset the counts from perf_event and emulated_counter into
> > pmc->counter when pausing the PMC
> > 3) Pause and reprogram the PMC on writes (instead of the current approach of
> > blindly updating the sample period)
>
> Updating the sample period is the only interface for KVM to configure hw
> behaviour on hw-ctr. I note that perf_event_set_count() will be proposed,
> and I'm pessimistic about this change.
>
> > 4) Pause the counter when stopping the perf_event to ensure pmc->counter is
> > fresh (instead of manually updating pmc->counter)
> >
> > IMO, that yields more intuitive logic, and makes it easier to reason about
> > correctness since the behavior is easily define: pmc->counter holds the counts
> > that have been gathered and processed, perf_event and emulated_counter hold
> > outstanding counts on top. E.g. on a WRMSR to the counter, both the emulated
> > counter and the hardware counter are reset, because whatever counts existed
> > previously are irrelevant.
>
> If we take the hardware view, a counter, emulated or not, just increments
> and overflows at the threshold. The missing logic here is when the counter
> is truncated when writing high bit-width values, and how to deal with the
> value of pmc->prev_counter was before pmc->counter was truncated.
>
> >
> > Pausing the counter_might_ make WRMSR slower, but we need to get this all
> > functionally correct before worrying too much about performance.
>
> Performance, security and correctness should all be considered at the beginning.
>
+1 on the performance part.
I did several rounds of performance testing, pausing the counter is
fast, but restarting the counter is *super* slow. The extra overhead
might just make vPMU useless especially when the guest workload takes
full CPU/memory resources in a VM (like SPEC2017 does).
> >
> > Diff below for what I'm thinking (needs to be split into multiple patches). It's
> > *very* lightly tested.
>
> It saddens me that no one has come up with an actual low-level counter-test for
> this issue.
>
> >
> > I'm about to disappear for a week, I'll pick this back up when I get return. In
> > the meantime, any testing and/or input would be much appreciated!
>
> How about accepting Roman's original fix and then exercising the rewriting genius ?
+1
I think the best option would be to just apply the fix in a short term
and put the refactoring of the emulated counter in the next series.
Powered by blists - more mailing lists