linux-kernel - Re: [PATCH v15 13/20] KVM: SEV: Implement gmem hook for initializing private pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZkvrFw1QMtImegQD@google.com>
Date: Mon, 20 May 2024 17:30:15 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Kai Huang <kai.huang@...el.com>
Cc: "isaku.yamahata@...il.com" <isaku.yamahata@...il.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>, 
	Rick P Edgecombe <rick.p.edgecombe@...el.com>, "michael.roth@....com" <michael.roth@....com>, 
	"pankaj.gupta@....com" <pankaj.gupta@....com>, "tglx@...utronix.de" <tglx@...utronix.de>, 
	"tobin@....com" <tobin@....com>, "liam.merwick@...cle.com" <liam.merwick@...cle.com>, 
	"alpergun@...gle.com" <alpergun@...gle.com>, Tony Luck <tony.luck@...el.com>, 
	"jmattson@...gle.com" <jmattson@...gle.com>, "luto@...nel.org" <luto@...nel.org>, 
	"ak@...ux.intel.com" <ak@...ux.intel.com>, "pbonzini@...hat.com" <pbonzini@...hat.com>, 
	"pgonda@...gle.com" <pgonda@...gle.com>, 
	"srinivas.pandruvada@...ux.intel.com" <srinivas.pandruvada@...ux.intel.com>, "slp@...hat.com" <slp@...hat.com>, 
	"rientjes@...gle.com" <rientjes@...gle.com>, "peterz@...radead.org" <peterz@...radead.org>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, 
	"dovmurik@...ux.ibm.com" <dovmurik@...ux.ibm.com>, 
	"thomas.lendacky@....com" <thomas.lendacky@....com>, "x86@...nel.org" <x86@...nel.org>, "bp@...en8.de" <bp@...en8.de>, 
	"vkuznets@...hat.com" <vkuznets@...hat.com>, "vbabka@...e.cz" <vbabka@...e.cz>, 
	"ashish.kalra@....com" <ashish.kalra@....com>, 
	"linux-coco@...ts.linux.dev" <linux-coco@...ts.linux.dev>, 
	"nikunj.dadhania@....com" <nikunj.dadhania@....com>, Jorg Rodel <jroedel@...e.de>, 
	"mingo@...hat.com" <mingo@...hat.com>, 
	"sathyanarayanan.kuppuswamy@...ux.intel.com" <sathyanarayanan.kuppuswamy@...ux.intel.com>, 
	"hpa@...or.com" <hpa@...or.com>, "kirill@...temov.name" <kirill@...temov.name>, 
	"jarkko@...nel.org" <jarkko@...nel.org>, "ardb@...nel.org" <ardb@...nel.org>, 
	"linux-crypto@...r.kernel.org" <linux-crypto@...r.kernel.org>, "linux-mm@...ck.org" <linux-mm@...ck.org>, 
	"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>
Subject: Re: [PATCH v15 13/20] KVM: SEV: Implement gmem hook for initializing
 private pages

On Tue, May 21, 2024, Kai Huang wrote:
> On 21/05/2024 11:15 am, Sean Christopherson wrote:
> > On Tue, May 21, 2024, Kai Huang wrote:
> > > On 21/05/2024 5:35 am, Sean Christopherson wrote:
> > > > On Mon, May 20, 2024, Kai Huang wrote:
> > > > > I am wondering whether this can be done in the KVM page fault handler?
> > > > 
> > > > No, because the state of a pfn in the RMP is tied to the guest_memfd inode,
> > > > not to the file descriptor, i.e. not to an individual VM.
> > > 
> > > It's strange that as state of a PFN of SNP doesn't bind to individual VM, at
> > > least for the private pages.  The command rpm_make_private() indeed reflects
> > > the mapping between PFN <-> <GFN, SSID>.
> > 
> > s/SSID/ASID
> > 
> > KVM allows a single ASID to be bound to multiple "struct kvm" instances, e.g.
> > for intra-host migration.  If/when trusted I/O is a thing, presumably KVM will
> > also need to share the ASID with other entities, e.g. IOMMUFD.
> 
> But is this the case for SNP?  I thought due to the nature of private pages,
> they cannot be shared between VMs?  So to me this RMP entry mapping for PFN
> <-> GFN for private page should just be per-VM.

Sorry to redirect, but please read this mail (and probably surrounding mails).
It hopefully explains most of the question you have.

https://lore.kernel.org/all/ZLGiEfJZTyl7M8mS@google.com

> > Regardless of whether or not guest_memfd supports page migration, KVM needs to
> > track the state of the physical page in guest_memfd, e.g. if it's been assigned
> > to the ASID versus if it's still in a shared state.
> 
> I am not certain this can impact whether we want to do RMP commands via
> guest_memfd() hooks or TDP MMU hooks?
>
> > > If we truly want to zap private mappings for SNP, IIUC it can be done by
> > > distinguishing whether a VM needs to use a separate private table, which is
> > > TDX-only.
> > 
> > I wouldn't say we "want" to zap private mappings for SNP, rather that it's a lot
> > less work to keep KVM's existing behavior (literally do nothing) than it is to
> > rework the MMU and whatnot to not zap SPTEs.
> 
> My thinking too.
> 
> > And there's no big motivation to avoid zapping because SNP VMs are unlikely
> > to delete memslots.
> 
> I think we should also consider MMU notifier?

No, private mappings have no host userspace mappings, i.e. are completely exempt
from MMU notifier events.  guest_memfd() can still invalidate mappings, but that
only occurs if userspace punches a hole, which is destructive.

> > If it turns out that it's easy to preserve SNP mappings after TDX lands, then we
> > can certainly go that route, but AFAIK there's no reason to force the issue.
> 
> No I am certainly not saying we should do SNP after TDX.  Sorry I didn't
> closely monitor the status of this SNP patchset.
> 
> My intention is just wanting to make the TDP MMU common code change more
> useful (since we need that for TDX anyway), i.e., not effectively just for
> TDX if possible:
> 
> Currently the TDP MMU hooks are called depending whether the page table type
> is private (or mirrored whatever), but I think conceptually, we should
> decide whether to call TDP MMU hooks based on whether faulting GPA is
> private, _AND_ when the hook is available.
> 
> https://lore.kernel.org/lkml/5e8119c0-31f5-4aa9-a496-4ae10bd745a3@intel.com/
> 
> If invoking SNP RMP commands is feasible in TDP MMU hooks,

Feasible.  Yes.  Desirable?  No.  Either KVM tracks the state of the physical page
using the guest_memfd inode, or KVM _guarantees_ the NPT mappings _never_ get
dropped, including during intra-host migration.  E.g. to support intra-host
migration of TDX VMs, KVM is pretty much forced to transer the S-EPT tables as-is,
which is ugly and painful (though performant).  We could do the same for NPT, but
there would need to be massive performance benefits to justify the complexity.

> then I think there's value of letting SNP code to use them too.  And we can
> simply split one patch out to only add the TDP MMU hooks for SNP to land
> first.