lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1c76cb00-1fe1-4fd0-b7b9-86ddca6115ba@citrix.com>
Date: Fri, 21 Nov 2025 02:40:50 +0000
From: Andrew Cooper <andrew.cooper3@...rix.com>
To: Sean Christopherson <seanjc@...gle.com>, Amit Shah <amit@...nel.org>
Cc: linux-kernel@...r.kernel.org, kvm@...r.kernel.org, x86@...nel.org,
 linux-doc@...r.kernel.org, amit.shah@....com, thomas.lendacky@....com,
 bp@...en8.de, tglx@...utronix.de, peterz@...radead.org, jpoimboe@...nel.org,
 pawan.kumar.gupta@...ux.intel.com, corbet@....net, mingo@...hat.com,
 dave.hansen@...ux.intel.com, hpa@...or.com, pbonzini@...hat.com,
 daniel.sneddon@...ux.intel.com, kai.huang@...el.com, sandipan.das@....com,
 boris.ostrovsky@...cle.com, Babu.Moger@....com, david.kaplan@....com,
 dwmw@...zon.co.uk
Subject: Re: [PATCH v6 1/1] x86: kvm: svm: set up ERAPS support for guests

On 20/11/2025 8:11 pm, Sean Christopherson wrote:
> KVM: SVM:
>
> On Fri, Nov 07, 2025, Amit Shah wrote:
>> From: Amit Shah <amit.shah@....com>
>>
>> AMD CPUs with the Enhanced Return Address Predictor (ERAPS) feature
> Enhanced Return Address Predictor Security.  The 'S' matters.
>
>> Zen5+) obviate the need for FILL_RETURN_BUFFER sequences right after
>> VMEXITs.  The feature adds guest/host tags to entries in the RSB (a.k.a.
>> RAP).  This helps with speculation protection across the VM boundary,
>> and it also preserves host and guest entries in the RSB that can improve
>> software performance (which would otherwise be flushed due to the
>> FILL_RETURN_BUFFER sequences).  This feature also extends the size of
>> the RSB from the older standard (of 32 entries) to a new default
>> enumerated in CPUID leaf 0x80000021:EBX bits 23:16 -- which is 64
>> entries in Zen5 CPUs.
>>
>> The hardware feature is always-on, and the host context uses the full
>> default RSB size without any software changes necessary.  The presence
>> of this feature allows software (both in host and guest contexts) to
>> drop all RSB filling routines in favour of the hardware doing it.
>>
>> There are two guest/host configurations that need to be addressed before
>> allowing a guest to use this feature: nested guests, and hosts using
>> shadow paging (or when NPT is disabled):
>>
>> 1. Nested guests: the ERAPS feature adds host/guest tagging to entries
>>    in the RSB, but does not distinguish between the guest ASIDs.  To
>>    prevent the case of an L2 guest poisoning the RSB to attack the L1
>>    guest, the CPU exposes a new VMCB bit (CLEAR_RAP).  The next
>>    VMRUN with a VMCB that has this bit set causes the CPU to flush the
>>    RSB before entering the guest context.  Set the bit in VMCB01 after a
>>    nested #VMEXIT to ensure the next time the L1 guest runs, its RSB
>>    contents aren't polluted by the L2's contents.  Similarly, before
>>    entry into a nested guest, set the bit for VMCB02, so that the L1
>>    guest's RSB contents are not leaked/used in the L2 context.
>>
>> 2. Hosts that disable NPT: the ERAPS feature flushes the RSB entries on
>>    several conditions, including CR3 updates.  Emulating hardware
>>    behaviour on RSB flushes is not worth the effort for NPT=off case,
>>    nor is it worthwhile to enumerate and emulate every trigger the
>>    hardware uses to flush RSB entries.  Instead of identifying and
>>    replicating RSB flushes that hardware would have performed had NPT
>>    been ON, do not let NPT=off VMs use the ERAPS features.
> The emulation requirements are not limited to shadow paging.  From the APM:
>
>   The ERAPS feature eliminates the need to execute CALL instructions to clear
>   the return address predictor in most cases. On processors that support ERAPS,
>   return addresses from CALL instructions executed in host mode are not used in
>   guest mode, and vice versa. Additionally, the return address predictor is
>   cleared in all cases when the TLB is implicitly invalidated (see Section 5.5.3 “TLB
>   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   Management,” on page 159) and in the following cases:
>
>   • MOV CR3 instruction
>   • INVPCID other than single address invalidation (operation type 0)

I already asked AMD for clarification here.  AIUI, INVLPGB should be
included in this list, and that begs the question what else is missed
from the documentation.

>
> Yes, KVM only intercepts MOV CR3 and INVPCID when NPT is disabled (or INVPCID is
> unsupported per guest CPUID), but that is an implementation detail, the instructions
> are still reachable via emulator, and KVM needs to emulate implicit TLB flush
> behavior.

The Implicit flushes cover CR0.PG, CR4.{PSE,PGE,PCIDE,PKE}, SMI, RSM,
writes to MTRR MSR, #INIT, A20M, and "other model specific MSRs, see NDA
docs".

The final part is very unhelpful in practice, and necessitates a RAS
flush on any emulated WRMSR, unless AMD are going to start handing out
the multi-coloured documents...  The really fastpath MSRs are
unintercepted and won't suffer this overhead.

~Andrew

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ