lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 2 Oct 2023 12:02:47 -0700
From:   Mingwei Zhang <mizhang@...gle.com>
To:     David Dunn <daviddunn@...gle.com>
Cc:     Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Sean Christopherson <seanjc@...gle.com>,
        Dapeng Mi <dapeng1.mi@...ux.intel.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Kan Liang <kan.liang@...ux.intel.com>,
        Like Xu <likexu@...cent.com>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Ian Rogers <irogers@...gle.com>,
        Adrian Hunter <adrian.hunter@...el.com>, kvm@...r.kernel.org,
        linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
        Zhenyu Wang <zhenyuw@...ux.intel.com>,
        Zhang Xiong <xiong.y.zhang@...el.com>,
        Lv Zhiyuan <zhiyuan.lv@...el.com>,
        Yang Weijiang <weijiang.yang@...el.com>,
        Dapeng Mi <dapeng1.mi@...el.com>,
        Jim Mattson <jmattson@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [Patch v4 07/13] perf/x86: Add constraint for guest perf metrics event

On Mon, Oct 2, 2023 at 8:23 AM David Dunn <daviddunn@...gle.com> wrote:
>
> On Mon, Oct 2, 2023 at 6:30 AM Ingo Molnar <mingo@...nel.org> wrote:
> >
> >
> > The host OS shouldn't offer facilities that severely limit its own capabilities,
> > when there's a better solution. We don't give the FPU to apps exclusively either,
> > it would be insanely stupid for a platform to do that.
> >
>
> If you think of the guest VM as a usermode application (which it
> effectively is), the analogous situation is that there is no way to
> tell the usermode application which portions of the FPU state might be
> used by the kernel without context switching.  Although the kernel can
> and does use FPU state, it doesn't zero out a portion of that state
> whenever the kernel needs to use the FPU.
>
> Today there is no way for a guest to dynamically adjust which PMU
> state is valid or invalid.  And this changes based on usage by other
> commands run on the host.  As observed by perf subsystem running in
> the guest kernel, this looks like counters that simply zero out and
> stop counting at random.
>
> I think the request here is that there be a way for KVM to be able to
> tell the guest kernel (running the perf subsystem) that it has a
> functional HW PMU.  And for that to be true.  This doesn't mean taking
> away the use of the PMU any more than exposing the FPU to usermode
> applications means taking away the FPU from the kernel.  But it does
> mean that when entering the KVM run loop, the host perf system needs
> to context switch away the host PMU state and allow KVM to load the
> guest PMU state.  And much like the FPU situation, the portion of the
> host kernel that runs between the context switch to the KVM thread and
> VMENTER to the guest cannot use the PMU.
>
> This obviously should be a policy set by the host owner.  They are
> deliberately giving up the ability to profile that small portion of
> the host (KVM VCPU thread cannot be profiled) in return to providing a
> full set of perf functionality to the guest kernel.
>

+1

I was pretty confused until I read this one. Pass-through vPMU for the
guest VM does not conflict with the host PMU software. All we need is
to accept the feasibility that host PMU software (perf subsystem in
Linux) can co-exist with pass-through vPMU in KVM. They could both
work directly on the hardware PMU, operating the registers etc...

To achieve this, I think what we really ask for the perf subsystem in
Linux are two things:
 - full context switch for hardware PMU. Currently, perf subsystem is
the exclusive owner of this piece of hardware. So this code is missing
 - NMI sharing or NMI control transfer. Either KVM could raise its own
NMI handler and get control transferred or Linux promotes the existing
NMI handler to serve two entities in the kernel.

Once the above is achieved, KVM and perf subsystem in Linux could
harmoniously share the hardware PMU as I believe, instead of forcing
the former as a client of the latter.

To step back a little bit, we are not asking about the feasibility,
since KVM and perf subsystem sharing hardware PMU is a reality because
of TDX/SEV-SNP. So, I think all that is just a draft proposal to make
the sharing clean and efficient.

Thanks.
-Mingwei

> Dave Dunn

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ