linux-kernel - Re: [RFC PATCH 23/41] KVM: x86/pmu: Implement the save/restore of PMU state for Intel CPU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAL715WKm0X9NJxq8SNGD5EJomzY4DDSiwLb1wMMgcgHqeZ64BA@mail.gmail.com>
Date: Sat, 27 Apr 2024 23:01:41 -0700
From: Mingwei Zhang <mizhang@...gle.com>
To: "Mi, Dapeng" <dapeng1.mi@...ux.intel.com>
Cc: Sean Christopherson <seanjc@...gle.com>, Kan Liang <kan.liang@...ux.intel.com>, 
	maobibo <maobibo@...ngson.cn>, Xiong Zhang <xiong.y.zhang@...ux.intel.com>, pbonzini@...hat.com, 
	peterz@...radead.org, kan.liang@...el.com, zhenyuw@...ux.intel.com, 
	jmattson@...gle.com, kvm@...r.kernel.org, linux-perf-users@...r.kernel.org, 
	linux-kernel@...r.kernel.org, zhiyuan.lv@...el.com, eranian@...gle.com, 
	irogers@...gle.com, samantha.alt@...el.com, like.xu.linux@...il.com, 
	chao.gao@...el.com
Subject: Re: [RFC PATCH 23/41] KVM: x86/pmu: Implement the save/restore of PMU
 state for Intel CPU

On Sat, Apr 27, 2024 at 5:59 PM Mi, Dapeng <dapeng1.mi@...ux.intel.com> wrote:
>
>
> On 4/27/2024 11:04 AM, Mingwei Zhang wrote:
> > On Fri, Apr 26, 2024 at 12:46 PM Sean Christopherson <seanjc@...gle.com> wrote:
> >> On Fri, Apr 26, 2024, Kan Liang wrote:
> >>>> Optimization 4
> >>>> allows the host side to immediately profiling this part instead of
> >>>> waiting for vcpu to reach to PMU context switch locations. Doing so
> >>>> will generate more accurate results.
> >>> If so, I think the 4 is a must to have. Otherwise, it wouldn't honer the
> >>> definition of the exclude_guest. Without 4, it brings some random blind
> >>> spots, right?
> >> +1, I view it as a hard requirement.  It's not an optimization, it's about
> >> accuracy and functional correctness.
> > Well. Does it have to be a _hard_ requirement? no? The irq handler
> > triggered by "perf record -a" could just inject a "state". Instead of
> > immediately preempting the guest PMU context, perf subsystem could
> > allow KVM defer the context switch when it reaches the next PMU
> > context switch location.
> >
> > This is the same as the preemption kernel logic. Do you want me to
> > stop the work immediately? Yes (if you enable preemption), or No, let
> > me finish my job and get to the scheduling point.
> >
> > Implementing this might be more difficult to debug. That's my real
> > concern. If we do not enable preemption, the PMU context switch will
> > only happen at the 2 pairs of locations. If we enable preemption, it
> > could happen at any time.
>
> IMO I don't prefer to add a switch to enable/disable the preemption. I
> think current implementation is already complicated enough and
> unnecessary to introduce an new parameter to confuse users. Furthermore,
> the switch could introduce an uncertainty and may mislead the perf user
> to read the perf stats incorrectly.  As for debug, it won't bring any
> difference as long as no host event is created.
>
That's ok. It is about opinions and brainstorming. Adding a parameter
to disable preemption is from the cloud usage perspective. The
conflict of opinions is which one you prioritize: guest PMU or the
host PMU? If you stand on the guest vPMU usage perspective, do you
want anyone on the host to shoot a profiling command and generate
turbulence? no. If you stand on the host PMU perspective and you want
to profile VMM/KVM, you definitely want accuracy and no delay at all.

Thanks.
-Mingwei
>
> >
> >> What _is_ an optimization is keeping guest state loaded while KVM is in its
> >> run loop, i.e. initial mediated/passthrough PMU support could land upstream with
> >> unconditional switches at entry/exit.  The performance of KVM would likely be
> >> unacceptable for any production use cases, but that would give us motivation to
> >> finish the job, and it doesn't result in random, hard to diagnose issues for
> >> userspace.
> > That's true. I agree with that.
> >
> >>>> Do we want to preempt that? I think it depends. For regular cloud
> >>>> usage, we don't. But for any other usages where we want to prioritize
> >>>> KVM/VMM profiling over guest vPMU, it is useful.
> >>>>
> >>>> My current opinion is that optimization 4 is something nice to have.
> >>>> But we should allow people to turn it off just like we could choose to
> >>>> disable preempt kernel.
> >>> The exclude_guest means everything but the guest. I don't see a reason
> >>> why people want to turn it off and get some random blind spots.