[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251028004928.muga5w3ix5dryyt2@desk>
Date: Mon, 27 Oct 2025 17:49:28 -0700
From: Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>
To: Jim Mattson <jmattson@...gle.com>
Cc: Sean Christopherson <seanjc@...gle.com>,
Brendan Jackman <jackmanb@...gle.com>,
Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/4] KVM: VMX: Flush CPU buffers as needed if L1D
cache flush is skipped
On Mon, Oct 27, 2025 at 05:19:57PM -0700, Pawan Gupta wrote:
> On Mon, Oct 27, 2025 at 04:58:10PM -0700, Jim Mattson wrote:
> > On Mon, Oct 27, 2025 at 4:17 PM Pawan Gupta
> > <pawan.kumar.gupta@...ux.intel.com> wrote:
> > >
> > > On Mon, Oct 27, 2025 at 03:03:23PM -0700, Jim Mattson wrote:
> > > > On Tue, Oct 21, 2025 at 6:20 PM Pawan Gupta
> > > > <pawan.kumar.gupta@...ux.intel.com> wrote:
> > > > >
> > > > > ...
> > > > > Thinking more on this, the software sequence is only invoked when the
> > > > > system doesn't have the L1D flushing feature added by a microcode update.
> > > > > In such a case system is not expected to have a flushing VERW either, which
> > > > > was introduced after L1TF. Also, the admin needs to have a very good reason
> > > > > for not updating the microcode for 5+ years :-)
> > > >
> > > > KVM started reporting MD_CLEAR to userspace in Linux v5.2, but it
> > > > didn't report L1D_FLUSH to userspace until Linux v6.4, so there are
> > > > plenty of virtual CPUs with a flushing VERW that don't have the L1D
> > > > flushing feature.
> > >
> > > Shouldn't only the L0 hypervisor be doing the L1D_FLUSH?
> > >
> > > kvm_get_arch_capabilities()
> > > {
> > > ...
> > > /*
> > > * If we're doing cache flushes (either "always" or "cond")
> > > * we will do one whenever the guest does a vmlaunch/vmresume.
> > > * If an outer hypervisor is doing the cache flush for us
> > > * (ARCH_CAP_SKIP_VMENTRY_L1DFLUSH), we can safely pass that
> > > * capability to the guest too, and if EPT is disabled we're not
> > > * vulnerable. Overall, only VMENTER_L1D_FLUSH_NEVER will
> > > * require a nested hypervisor to do a flush of its own.
> > > */
> > > if (l1tf_vmx_mitigation != VMENTER_L1D_FLUSH_NEVER)
> > > data |= ARCH_CAP_SKIP_VMENTRY_L1DFLUSH;
> > >
> >
> > Unless L0 has chosen L1D_FLUSH_NEVER. :)
> >
> > On GCE's L1TF-vulnerable hosts, we actually do an L1D flush at ASI
> > entry rather than VM-entry. ASI entries are two orders of magnitude
> > less frequent than VM-entries, so we get comparable protection to
> > L1D_FLUSH_ALWAYS at a fraction of the cost.
> >
> > At the moment, we still do an L1D flush on emulated VM-entry, but
> > that's just because we have historically advertised
> > IA32_ARCH_CAPABILITIES.SKIP_L1DFL_VMENTRY to L1.
>
> Thanks for the background.
>
> I still don't see the problem, CPUs that are vulnerable to L1TF are also
> vulnerable to MDS. So, they don't set mmio_stale_data_clear, instead they
Sorry I meant cpu_buf_vm_clear instead of mmio_stale_data_clear (I was
looking at a slightly older kernel).
> set X86_FEATURE_CLEAR_CPU_BUF and execute VERW in __vmx_vcpu_run()
> regardless of whether L1D_FLUSH was done.
>
> But, I agree it is best to decouple L1D flush and MMIO Stale Data to be
> avoid any confusion.
Powered by blists - more mailing lists