[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e34d49b8-4aa2-455a-a623-53a630d484ef@gmail.com>
Date: Thu, 7 Mar 2024 19:07:46 +0800
From: Like Xu <like.xu.linux@...il.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
Mingwei Zhang <mizhang@...gle.com>, Zhenyu Wang <zhenyuw@...ux.intel.com>,
Zhang Xiong <xiong.y.zhang@...el.com>, Lv Zhiyuan <zhiyuan.lv@...el.com>,
Dapeng Mi <dapeng1.mi@...el.com>, Jim Mattson <jmattson@...gle.com>,
Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [PATCH] KVM: x86/pmu: Disable support for adaptive PEBS
On 7/3/2024 8:58 am, Sean Christopherson wrote:
> Drop support for virtualizing adaptive PEBS, as KVM's implementation is
> architecturally broken without an obvious/easy path forward, and because
> exposing adaptive PEBS can leak host LBRs to the guest, i.e. can leak
> host kernel addresses to the guest.
>
> Bug #1 is that KVM doesn't doesn't account for the upper 32 bits of
> IA32_FIXED_CTR_CTRL when (re)programming fixed counters, e.g
> fixed_ctrl_field() drops the upper bits, reprogram_fixed_counters()
> stores local variables as u8s and truncates the upper bits too, etc.
>
> Bug #2 is that, because KVM _always_ sets precise_ip to a non-zero value
> for PEBS events, perf will _always_ generate an adaptive record, even if
> the guest requested a basic record. Note, KVM will also enable adaptive
> PEBS in individual *counter*, even if adaptive PEBS isn't exposed to the
> guest, but this is benign as MSR_PEBS_DATA_CFG is guaranteed to be zero,
> i.e. the guest will only ever see Basic records.
>
> Bug #3 is in perf. intel_pmu_disable_fixed() doesn't clear the upper
> bits either, i.e. leaves ICL_FIXED_0_ADAPTIVE set, and
> intel_pmu_enable_fixed() effectively doesn't clear ICL_FIXED_0_ADAPTIVE
> either. I.e. perf _always_ enables ADAPTIVE counters, regardless of what
> KVM requests.
The three issues above all point to a fix in one direction: to pass the value
of vcpu's EVT_SELx.Adaptive_Record[34] or FCx_Adaptive_Record to the
perf/core in a way and let the PEBS assist take effect as expected by vPEBS.
One place to address this is in the intel_guest_get_msrs() again:
- update vPMC[x].pPMC.fcctl_or_evtsel.ADAPTIVE = vPMC[x].use_adaptive
, since guest PEBS is disabled if host PEBS is enabled.
>
> Bug #4 is that adaptive PEBS *might* effectively bypass event filters set
> by the host, as "Updated Memory Access Info Group" records information
> that might be disallowed by userspace via KVM_SET_PMU_EVENT_FILTER.
This could be seen as a missing feature, that is, whether PMU_EVENT_FILTER
can control PEBS events even if they share the same event encoding.
Furthermore, if LBR_FMT is cleared only by VMM, could the guest use
adaptive pebs to obtain valid guest_lbr records. It's open in the virt context
since real hardware doesn't have this issue.
>
> Bug #5 is that KVM doesn't ensure LBR MSRs hold guest values (or at least
> zeros) when entering a vCPU with adaptive PEBS, which allows the guest
> to read host LBRs, i.e. host RIPs/addresses, by enabling "LBR Entries"
> records.
>
> Disable adaptive PEBS support as an immediate fix due to the severity of
> the LBR leak in particular, and because fixing all of the bugs will be
> non-trivial, e.g. not suitable for backporting to stable kernels.
>
> Note! This will break live migration, but trying to make KVM play nice
> with live migration would be quite complicated, wouldn't be guaranteed to
> work (i.e. KVM might still kill/confuse the guest), and it's not clear
> that there are any publicly available VMMs that support adaptive PEBS,
> let alone live migrate VMs that support adaptive PEBS, e.g. QEMU doesn't
> support PEBS in any capacity.
>
> Link: https://lore.kernel.org/all/20240306230153.786365-1-seanjc@google.com
> Link: https://lore.kernel.org/all/ZeepGjHCeSfadANM@google.com
> Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS")
> Cc: stable@...r.kernel.org
> Cc: Like Xu <like.xu.linux@...il.com>
> Cc: Mingwei Zhang <mizhang@...gle.com>
> Cc: Zhenyu Wang <zhenyuw@...ux.intel.com>
> Cc: Zhang Xiong <xiong.y.zhang@...el.com>
> Cc: Lv Zhiyuan <zhiyuan.lv@...el.com>
> Cc: Dapeng Mi <dapeng1.mi@...el.com>
> Cc: Jim Mattson <jmattson@...gle.com>
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
Acked-by: Like Xu <likexu@...cent.com>
> ---
> arch/x86/kvm/vmx/vmx.c | 24 ++++++++++++++++++++++--
> 1 file changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 7a74388f9ecf..641a7d5bf584 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7864,8 +7864,28 @@ static u64 vmx_get_perf_capabilities(void)
>
> if (vmx_pebs_supported()) {
> perf_cap |= host_perf_cap & PERF_CAP_PEBS_MASK;
> - if ((perf_cap & PERF_CAP_PEBS_FORMAT) < 4)
> - perf_cap &= ~PERF_CAP_PEBS_BASELINE;
> +
> + /*
> + * Disallow adaptive PEBS as it is functionally broken, can be
> + * used by the guest to read *host* LBRs, and can be used to
> + * bypass userspace event filters. To correctly and safely
> + * support adaptive PEBS, KVM needs to:
> + *
> + * 1. Account for the ADAPTIVE flag when (re)programming fixed
> + * counters.
> + *
> + * 2. Gain support from perf (or take direct control of counter
> + * programming) to support events without adaptive PEBS
> + * enabled for the hardware counter.
> + *
> + * 3. Ensure LBR MSRs cannot hold host data on VM-Entry with
> + * adaptive PEBS enabled and MSR_PEBS_DATA_CFG.LBRS=1.
> + *
> + * 4. Document which PMU events are effectively exposed to the
> + * guest via adaptive PEBS, and make adaptive PEBS mutually
> + * exclusive with KVM_SET_PMU_EVENT_FILTER if necessary.
> + */
> + perf_cap &= ~PERF_CAP_PEBS_BASELINE;
> }
>
> return perf_cap;
>
> base-commit: 0c64952fec3ea01cb5b09f00134200f3e7ab40d5
Powered by blists - more mailing lists