linux-kernel - Re: [Patch v4 07/13] perf/x86: Add constraint for guest perf metrics event

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c69a1eb1-e07a-8270-ca63-54949ded433d@gmail.com>
Date:   Sun, 8 Oct 2023 18:04:28 +0800
From:   Like Xu <like.xu.linux@...il.com>
To:     Sean Christopherson <seanjc@...gle.com>,
        Peter Zijlstra <peterz@...radead.org>
Cc:     Ingo Molnar <mingo@...nel.org>,
        Dapeng Mi <dapeng1.mi@...ux.intel.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Kan Liang <kan.liang@...ux.intel.com>,
        Like Xu <likexu@...cent.com>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Ian Rogers <irogers@...gle.com>,
        Adrian Hunter <adrian.hunter@...el.com>, kvm@...r.kernel.org,
        linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
        Zhenyu Wang <zhenyuw@...ux.intel.com>,
        Zhang Xiong <xiong.y.zhang@...el.com>,
        Lv Zhiyuan <zhiyuan.lv@...el.com>,
        Yang Weijiang <weijiang.yang@...el.com>,
        Dapeng Mi <dapeng1.mi@...el.com>,
        David Dunn <daviddunn@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Mingwei Zhang <mizhang@...gle.com>,
        Jim Mattson <jmattson@...gle.com>
Subject: Re: [Patch v4 07/13] perf/x86: Add constraint for guest perf metrics
 event

Hi all,

On 5/10/2023 6:05 am, Sean Christopherson wrote:
> So I'll add a self-NAK to the idea of completely disabling the host PMU, I think
> that would burn us quite badly at some point.

I seem to have missed a party, so allow me to add a few more comments
to better facilitate future discussions in this direction:

(1) PMU counters on TEE

The SGX/SEV is already part of the upstream, but what kind of performance
data will be obtained by sampling enclaves or sev-guest with hardware pmu
counters on host (will the perf-report show these data missing holes or
pure encrypted data), we don't have a clear idea nor have we established
the right expectations. But on AMD profiling a SEV-SNP guest is supported:

"Fingerprinting attack protection is also not supported in the current
generation of these technologies. Fingerprinting attacks attempt to
determine what code the VM is running by monitoring its access patterns,
performance counter information, etc." (AMD SEV-SNP White Paper, 2020)

(2) PMU Guest/Host Co-existence Development

The introduction of pt_mode in the KVM was misleading, leading subsequent
developers to believe that static slicing of pmu facility usage was allowed.

On user scenarios, the host/perf should treat pmu resource requests from
vCPUs with regularity (which can be unequal under the host's authority IMO)
while allowing the host to be able to profile any software entity (including
hypervisor and guest-code, including TEE code in debug mode). Functionality
takes precedence over performance.

The semantics of exclude_guest/host should be tied to the hw-event isolation
settings on the hardware interfaces, not to the human-defined sw-context.
The perf subsystem is the arbiter of pmu resource allocation on the host,
and any attempt to change the status quo (or maintenance scope) will not
succeed. Therefore, vPMU developers are required to be familiar with the
implementation details of both perf and kvm, and try not to add perf APIs
dedicated to serving KVM blindly.

Getting host and guests to share limited PMU resources harmoniously is not
particularly difficult compared to real rocket science in the kernel, so
please don't be intimidated.

(3) Performance Concern in Co-existence

I wonder if it would be possible to add a knob to turn off the perf counter
multiplexing mechanism on the host, so that in coexistence scenarios, the
number of VM exits on the vCPU would not be increased by counter rotations
due to timer expiration.

For normal counters shared between guest and host, the number of counter
msr switches requiring a vm-entry level will be relatively small.
(The number of counters is growing; for LBR, it is possible to share LBR
select values to avoid frequent switching, but of course this requires the
implementation of a software filtering mechanism when the host/guest read
the LBR records, and some additional PMI; for DS-based PEBS, host and guest
PEBS buffers are automatically segregated based on linear address).

There is a lot of room for optimisation here, and in real scenarios where
triggering a large number of register switches in the host/guest PMU is
to be expected and observed easily (accompanied by a large number of pmi
appearances).

If we are really worried about the virtualisation overhead of vPMU, then
virtio-pmu might be an option. In this technology direction, the back-end
pmu can add more performance events of interest to the VM (including host
un-core and off-core events, host-side software events, etc.) In terms of
implementation, the semantics of the MSRLIST instruction can be re-used,
along with compatibility with the different PMU hardware interfaces on ARM
and Risc-v, which is also very friendly to production environments based on
its virtio nature.

(4) New vPMU Feature Development

We should not put KVM's current vPMU support into maintenance-only mode.
Users want more PMU features in the guest, like AMD vIBS, Intel pmu higher
versions, Intel topdown and Arch lbr, more on the way. The maturity of
different features' patch sets aren't the same, but we can't ignore these
real needs because of available time for key maintainers, apathy towards
contributors, mindset avoidance and laziness, and preference for certain
technology stacks. These technical challenges will attract an influx of
open source heroes to push the technology forward, which is good in the
long run.

(5) More to think about

Similar to the guest PMU feature, the debugging feature may face the same
state. For example, what happens when you debug code inside the host and
guest at the same time (host debugs hypevisor/guest code and guest debugs
guest code only) ?

Forgive my ignorance and offence, but we don't want to see a KVM subsystem
controlled and driven by Google's demands.

Please feel free to share comments to move forward.

Thanks,
Like Xu