linux-kernel - Re: [PATCH v15 21/23] KVM: MMU: Disable fast path for private memslots

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Zj5ETYPTUo9T4Nuf@google.com>
Date: Fri, 10 May 2024 08:59:09 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Michael Roth <michael.roth@....com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, linux-coco@...ts.linux.dev, 
	linux-mm@...ck.org, linux-crypto@...r.kernel.org, x86@...nel.org, 
	linux-kernel@...r.kernel.org, tglx@...utronix.de, mingo@...hat.com, 
	jroedel@...e.de, thomas.lendacky@....com, hpa@...or.com, ardb@...nel.org, 
	vkuznets@...hat.com, jmattson@...gle.com, luto@...nel.org, 
	dave.hansen@...ux.intel.com, slp@...hat.com, pgonda@...gle.com, 
	peterz@...radead.org, srinivas.pandruvada@...ux.intel.com, 
	rientjes@...gle.com, dovmurik@...ux.ibm.com, tobin@....com, bp@...en8.de, 
	vbabka@...e.cz, kirill@...temov.name, ak@...ux.intel.com, tony.luck@...el.com, 
	sathyanarayanan.kuppuswamy@...ux.intel.com, alpergun@...gle.com, 
	jarkko@...nel.org, ashish.kalra@....com, nikunj.dadhania@....com, 
	pankaj.gupta@....com, liam.merwick@...cle.com, papaluri@....com, 
	Isaku Yamahata <isaku.yamahata@...el.com>
Subject: Re: [PATCH v15 21/23] KVM: MMU: Disable fast path for private memslots

On Fri, May 10, 2024, Michael Roth wrote:
> On Fri, May 10, 2024 at 03:50:26PM +0200, Paolo Bonzini wrote:
> > On Fri, May 10, 2024 at 3:47 PM Sean Christopherson <seanjc@...gle.com> wrote:
> > >
> > > > +      * Since software-protected VMs don't have a notion of a shared vs.
> > > > +      * private that's separate from what KVM is tracking, the above
> > > > +      * KVM_EXIT_MEMORY_FAULT condition wouldn't occur, so avoid the
> > > > +      * special handling for that case for now.
> > >
> > > Very technically, it can occur if userspace _just_ modified the attributes.  And
> > > as I've said multiple times, at least for now, I want to avoid special casing
> > > SW-protected VMs unless it is *absolutely* necessary, because their sole purpose
> > > is to allow testing flows that are impossible to excercise without SNP/TDX hardware.
> > 
> > Yep, it is not like they have to be optimized.
> 
> Ok, I thought there were maybe some future plans to use sw-protected VMs
> to get some added protections from userspace. But even then there'd
> probably still be extra considerations for how to handle access tracking
> so white-listing them probably isn't right anyway.
> 
> I was also partly tempted to take this route because it would cover this
> TDX patch as well:
> 
>   https://lore.kernel.org/lkml/91c797997b57056224571e22362321a23947172f.1705965635.git.isaku.yamahata@intel.com/

Hmm, I'm pretty sure that patch is trying to fix the exact same issue you are
fixing, just in a less precise way.  S-EPT entries only support RWX=0 and RWX=111b,
i.e. it should be impossible to have a write-fault to a present S-EPT entry.

And if TDX is running afoul of this code:

	if (!fault->present)
		return !kvm_ad_enabled();

then KVM should do the sane thing and require A/D support be enabled for TDX.

And if it's something else entirely, that changelog has some explaining to do.

> and avoid any weirdness about checking kvm_mem_is_private() without
> checking mmu_invalidate_seq, but I think those cases all end up
> resolving themselves eventually and added some comments around that.

Yep, checking state that is protected by mmu_invalidate_seq outside of mmu_lock
is definitely allowed, e.g. the entire fast page fault path operates outside of
mmu_lock and thus outside of mmu_invalidate_seq's purview.

It's a-ok because the SPTE are done with an atomic CMPXCHG, and so KVM only needs
to ensure its page tables aren't outright _freed_.  If the zap triggered by the
attributes change "wins", then the fast #PF path will fail the CMPXCHG and be an
expensive NOP.  If the fast #PF wins, the zap will pave over the fast #PF fix,
and the IPI+flush that is needed for all zaps, to ensure vCPUs don't have stale
references, does the rest.

And if there's an attributes race that causes the fast #PF to bail early, the vCPU
will see the correct state on the next page fault.