linux-kernel - Re: [Patch v4 07/13] perf/x86: Add constraint for guest perf metrics event

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <03b7da03-78a1-95b1-3969-634b5c9a5a56@amd.com>
Date:   Mon, 9 Oct 2023 22:33:41 +0530
From:   Manali Shukla <manali.shukla@....com>
To:     Sean Christopherson <seanjc@...gle.com>,
        Peter Zijlstra <peterz@...radead.org>
Cc:     Ingo Molnar <mingo@...nel.org>,
        Dapeng Mi <dapeng1.mi@...ux.intel.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Kan Liang <kan.liang@...ux.intel.com>,
        Like Xu <likexu@...cent.com>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Ian Rogers <irogers@...gle.com>,
        Adrian Hunter <adrian.hunter@...el.com>, kvm@...r.kernel.org,
        linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
        Zhenyu Wang <zhenyuw@...ux.intel.com>,
        Zhang Xiong <xiong.y.zhang@...el.com>,
        Lv Zhiyuan <zhiyuan.lv@...el.com>,
        Yang Weijiang <weijiang.yang@...el.com>,
        Dapeng Mi <dapeng1.mi@...el.com>,
        David Dunn <daviddunn@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Mingwei Zhang <mizhang@...gle.com>,
        Jim Mattson <jmattson@...gle.com>,
        Like Xu <like.xu.linux@...il.com>
Subject: Re: [Patch v4 07/13] perf/x86: Add constraint for guest perf metrics
 event

On 10/8/2023 3:34 PM, Like Xu wrote:
> Hi all,
> 
> On 5/10/2023 6:05 am, Sean Christopherson wrote:
>> So I'll add a self-NAK to the idea of completely disabling the host PMU, I think
>> that would burn us quite badly at some point.
> 
> I seem to have missed a party, so allow me to add a few more comments
> to better facilitate future discussions in this direction:
> 
> (1) PMU counters on TEE
> 
> The SGX/SEV is already part of the upstream, but what kind of performance
> data will be obtained by sampling enclaves or sev-guest with hardware pmu
> counters on host (will the perf-report show these data missing holes or
> pure encrypted data), we don't have a clear idea nor have we established
> the right expectations. But on AMD profiling a SEV-SNP guest is supported:
> 
> "Fingerprinting attack protection is also not supported in the current
> generation of these technologies. Fingerprinting attacks attempt to
> determine what code the VM is running by monitoring its access patterns,
> performance counter information, etc." (AMD SEV-SNP White Paper, 2020)
> 
> (2) PMU Guest/Host Co-existence Development
> 
> The introduction of pt_mode in the KVM was misleading, leading subsequent
> developers to believe that static slicing of pmu facility usage was allowed.
> 
> On user scenarios, the host/perf should treat pmu resource requests from
> vCPUs with regularity (which can be unequal under the host's authority IMO)
> while allowing the host to be able to profile any software entity (including
> hypervisor and guest-code, including TEE code in debug mode). Functionality
> takes precedence over performance.
> 
> The semantics of exclude_guest/host should be tied to the hw-event isolation
> settings on the hardware interfaces, not to the human-defined sw-context.
> The perf subsystem is the arbiter of pmu resource allocation on the host,
> and any attempt to change the status quo (or maintenance scope) will not
> succeed. Therefore, vPMU developers are required to be familiar with the
> implementation details of both perf and kvm, and try not to add perf APIs
> dedicated to serving KVM blindly.
> 
> Getting host and guests to share limited PMU resources harmoniously is not
> particularly difficult compared to real rocket science in the kernel, so
> please don't be intimidated.
> 
> (3) Performance Concern in Co-existence
> 
> I wonder if it would be possible to add a knob to turn off the perf counter
> multiplexing mechanism on the host, so that in coexistence scenarios, the
> number of VM exits on the vCPU would not be increased by counter rotations
> due to timer expiration.
> 
> For normal counters shared between guest and host, the number of counter
> msr switches requiring a vm-entry level will be relatively small.
> (The number of counters is growing; for LBR, it is possible to share LBR
> select values to avoid frequent switching, but of course this requires the
> implementation of a software filtering mechanism when the host/guest read
> the LBR records, and some additional PMI; for DS-based PEBS, host and guest
> PEBS buffers are automatically segregated based on linear address).
> 
> There is a lot of room for optimisation here, and in real scenarios where
> triggering a large number of register switches in the host/guest PMU is
> to be expected and observed easily (accompanied by a large number of pmi
> appearances).
> 
> If we are really worried about the virtualisation overhead of vPMU, then
> virtio-pmu might be an option. In this technology direction, the back-end
> pmu can add more performance events of interest to the VM (including host
> un-core and off-core events, host-side software events, etc.) In terms of
> implementation, the semantics of the MSRLIST instruction can be re-used,
> along with compatibility with the different PMU hardware interfaces on ARM
> and Risc-v, which is also very friendly to production environments based on
> its virtio nature.
> 
> (4) New vPMU Feature Development
> 
> We should not put KVM's current vPMU support into maintenance-only mode.
> Users want more PMU features in the guest, like AMD vIBS, Intel pmu higher
> versions, Intel topdown and Arch lbr, more on the way. The maturity of
> different features' patch sets aren't the same, but we can't ignore these
> real needs because of available time for key maintainers, apathy towards
> contributors, mindset avoidance and laziness, and preference for certain
> technology stacks. These technical challenges will attract an influx of
> open source heroes to push the technology forward, which is good in the
> long run.
> 
> (5) More to think about
> 
> Similar to the guest PMU feature, the debugging feature may face the same
> state. For example, what happens when you debug code inside the host and
> guest at the same time (host debugs hypevisor/guest code and guest debugs
> guest code only) ?
> 
> Forgive my ignorance and offence, but we don't want to see a KVM subsystem
> controlled and driven by Google's demands.
> 
> Please feel free to share comments to move forward.
> 
> Thanks,
> Like Xu

Hi all,

I would like to add following things to the discussion just for the awareness of
everyone.

Fully virtualized PMC support is coming to an upcoming AMD SoC and we are
working on prototyping it.

As part of virtualized PMC design, the PERF_CTL registers are defined as Swap
type C: guest PMC states are loaded at VMRUN automatically but host PMC states
are not saved by hardware. If hypervisor is using the performance counters, it
is hypervisor's responsibility to save PERF_CTL registers to host save area
prior to VMRUN and restore them after VMEXIT. In order to tackle PMC overflow
interrupts in guest itself, NMI virtualization or AVIC can be used, so that
interrupt on PMC overflow in guest will not leak to host.

- Manali