[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1c76cb00-1fe1-4fd0-b7b9-86ddca6115ba@citrix.com>
Date: Fri, 21 Nov 2025 02:40:50 +0000
From: Andrew Cooper <andrew.cooper3@...rix.com>
To: Sean Christopherson <seanjc@...gle.com>, Amit Shah <amit@...nel.org>
Cc: linux-kernel@...r.kernel.org, kvm@...r.kernel.org, x86@...nel.org,
linux-doc@...r.kernel.org, amit.shah@....com, thomas.lendacky@....com,
bp@...en8.de, tglx@...utronix.de, peterz@...radead.org, jpoimboe@...nel.org,
pawan.kumar.gupta@...ux.intel.com, corbet@....net, mingo@...hat.com,
dave.hansen@...ux.intel.com, hpa@...or.com, pbonzini@...hat.com,
daniel.sneddon@...ux.intel.com, kai.huang@...el.com, sandipan.das@....com,
boris.ostrovsky@...cle.com, Babu.Moger@....com, david.kaplan@....com,
dwmw@...zon.co.uk
Subject: Re: [PATCH v6 1/1] x86: kvm: svm: set up ERAPS support for guests
On 20/11/2025 8:11 pm, Sean Christopherson wrote:
> KVM: SVM:
>
> On Fri, Nov 07, 2025, Amit Shah wrote:
>> From: Amit Shah <amit.shah@....com>
>>
>> AMD CPUs with the Enhanced Return Address Predictor (ERAPS) feature
> Enhanced Return Address Predictor Security. The 'S' matters.
>
>> Zen5+) obviate the need for FILL_RETURN_BUFFER sequences right after
>> VMEXITs. The feature adds guest/host tags to entries in the RSB (a.k.a.
>> RAP). This helps with speculation protection across the VM boundary,
>> and it also preserves host and guest entries in the RSB that can improve
>> software performance (which would otherwise be flushed due to the
>> FILL_RETURN_BUFFER sequences). This feature also extends the size of
>> the RSB from the older standard (of 32 entries) to a new default
>> enumerated in CPUID leaf 0x80000021:EBX bits 23:16 -- which is 64
>> entries in Zen5 CPUs.
>>
>> The hardware feature is always-on, and the host context uses the full
>> default RSB size without any software changes necessary. The presence
>> of this feature allows software (both in host and guest contexts) to
>> drop all RSB filling routines in favour of the hardware doing it.
>>
>> There are two guest/host configurations that need to be addressed before
>> allowing a guest to use this feature: nested guests, and hosts using
>> shadow paging (or when NPT is disabled):
>>
>> 1. Nested guests: the ERAPS feature adds host/guest tagging to entries
>> in the RSB, but does not distinguish between the guest ASIDs. To
>> prevent the case of an L2 guest poisoning the RSB to attack the L1
>> guest, the CPU exposes a new VMCB bit (CLEAR_RAP). The next
>> VMRUN with a VMCB that has this bit set causes the CPU to flush the
>> RSB before entering the guest context. Set the bit in VMCB01 after a
>> nested #VMEXIT to ensure the next time the L1 guest runs, its RSB
>> contents aren't polluted by the L2's contents. Similarly, before
>> entry into a nested guest, set the bit for VMCB02, so that the L1
>> guest's RSB contents are not leaked/used in the L2 context.
>>
>> 2. Hosts that disable NPT: the ERAPS feature flushes the RSB entries on
>> several conditions, including CR3 updates. Emulating hardware
>> behaviour on RSB flushes is not worth the effort for NPT=off case,
>> nor is it worthwhile to enumerate and emulate every trigger the
>> hardware uses to flush RSB entries. Instead of identifying and
>> replicating RSB flushes that hardware would have performed had NPT
>> been ON, do not let NPT=off VMs use the ERAPS features.
> The emulation requirements are not limited to shadow paging. From the APM:
>
> The ERAPS feature eliminates the need to execute CALL instructions to clear
> the return address predictor in most cases. On processors that support ERAPS,
> return addresses from CALL instructions executed in host mode are not used in
> guest mode, and vice versa. Additionally, the return address predictor is
> cleared in all cases when the TLB is implicitly invalidated (see Section 5.5.3 “TLB
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Management,” on page 159) and in the following cases:
>
> • MOV CR3 instruction
> • INVPCID other than single address invalidation (operation type 0)
I already asked AMD for clarification here. AIUI, INVLPGB should be
included in this list, and that begs the question what else is missed
from the documentation.
>
> Yes, KVM only intercepts MOV CR3 and INVPCID when NPT is disabled (or INVPCID is
> unsupported per guest CPUID), but that is an implementation detail, the instructions
> are still reachable via emulator, and KVM needs to emulate implicit TLB flush
> behavior.
The Implicit flushes cover CR0.PG, CR4.{PSE,PGE,PCIDE,PKE}, SMI, RSM,
writes to MTRR MSR, #INIT, A20M, and "other model specific MSRs, see NDA
docs".
The final part is very unhelpful in practice, and necessitates a RAS
flush on any emulated WRMSR, unless AMD are going to start handing out
the multi-coloured documents... The really fastpath MSRs are
unintercepted and won't suffer this overhead.
~Andrew
Powered by blists - more mailing lists