linux-kernel - Re: [PATCH v3 1/4] KVM: VMX: Flush CPU buffers as needed if L1D cache flush is skipped

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALMp9eSVt22PW+WyfNvnGcOciDQ8MkX9vDmDZ+-Q2QJUH_EvHw@mail.gmail.com>
Date: Mon, 27 Oct 2025 16:58:10 -0700
From: Jim Mattson <jmattson@...gle.com>
To: Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>
Cc: Sean Christopherson <seanjc@...gle.com>, Brendan Jackman <jackmanb@...gle.com>, 
	Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/4] KVM: VMX: Flush CPU buffers as needed if L1D cache
 flush is skipped

On Mon, Oct 27, 2025 at 4:17 PM Pawan Gupta
<pawan.kumar.gupta@...ux.intel.com> wrote:
>
> On Mon, Oct 27, 2025 at 03:03:23PM -0700, Jim Mattson wrote:
> > On Tue, Oct 21, 2025 at 6:20 PM Pawan Gupta
> > <pawan.kumar.gupta@...ux.intel.com> wrote:
> > >
> > > ...
> > > Thinking more on this, the software sequence is only invoked when the
> > > system doesn't have the L1D flushing feature added by a microcode update.
> > > In such a case system is not expected to have a flushing VERW either, which
> > > was introduced after L1TF. Also, the admin needs to have a very good reason
> > > for not updating the microcode for 5+ years :-)
> >
> > KVM started reporting MD_CLEAR to userspace in Linux v5.2, but it
> > didn't report L1D_FLUSH to userspace until Linux v6.4, so there are
> > plenty of virtual CPUs with a flushing VERW that don't have the L1D
> > flushing feature.
>
> Shouldn't only the L0 hypervisor be doing the L1D_FLUSH?
>
> kvm_get_arch_capabilities()
> {
> ...
>         /*
>          * If we're doing cache flushes (either "always" or "cond")
>          * we will do one whenever the guest does a vmlaunch/vmresume.
>          * If an outer hypervisor is doing the cache flush for us
>          * (ARCH_CAP_SKIP_VMENTRY_L1DFLUSH), we can safely pass that
>          * capability to the guest too, and if EPT is disabled we're not
>          * vulnerable.  Overall, only VMENTER_L1D_FLUSH_NEVER will
>          * require a nested hypervisor to do a flush of its own.
>          */
>         if (l1tf_vmx_mitigation != VMENTER_L1D_FLUSH_NEVER)
>                 data |= ARCH_CAP_SKIP_VMENTRY_L1DFLUSH;
>

Unless L0 has chosen L1D_FLUSH_NEVER. :)

On GCE's L1TF-vulnerable hosts, we actually do an L1D flush at ASI
entry rather than VM-entry. ASI entries are two orders of magnitude
less frequent than VM-entries, so we get comparable protection to
L1D_FLUSH_ALWAYS at a fraction of the cost.

At the moment, we still do an L1D flush on emulated VM-entry, but
that's just because we have historically advertised
IA32_ARCH_CAPABILITIES.SKIP_L1DFL_VMENTRY to L1.