linux-kernel - Re: [PATCH 10/16] KVM: x86/tdp_mmu: Support TDX private mapping for TDP MMU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240517081440.GM168153@ls.amr.corp.intel.com>
Date: Fri, 17 May 2024 01:14:40 -0700
From: Isaku Yamahata <isaku.yamahata@...el.com>
To: "Huang, Kai" <kai.huang@...el.com>
Cc: Isaku Yamahata <isaku.yamahata@...el.com>,
	"Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"pbonzini@...hat.com" <pbonzini@...hat.com>,
	"seanjc@...gle.com" <seanjc@...gle.com>,
	"isaku.yamahata@...il.com" <isaku.yamahata@...il.com>,
	"sagis@...gle.com" <sagis@...gle.com>,
	"Aktas, Erdem" <erdemaktas@...gle.com>,
	"dmatlack@...gle.com" <dmatlack@...gle.com>,
	"Zhao, Yan Y" <yan.y.zhao@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	isaku.yamahata@...ux.intel.com
Subject: Re: [PATCH 10/16] KVM: x86/tdp_mmu: Support TDX private mapping for
 TDP MMU

On Fri, May 17, 2024 at 02:36:43PM +1200,
"Huang, Kai" <kai.huang@...el.com> wrote:

> On 17/05/2024 7:42 am, Isaku Yamahata wrote:
> > On Thu, May 16, 2024 at 04:36:48PM +0000,
> > "Edgecombe, Rick P" <rick.p.edgecombe@...el.com> wrote:
> > 
> > > On Thu, 2024-05-16 at 13:04 +0000, Huang, Kai wrote:
> > > > On Thu, 2024-05-16 at 02:57 +0000, Edgecombe, Rick P wrote:
> > > > > On Thu, 2024-05-16 at 14:07 +1200, Huang, Kai wrote:
> > > > > > 
> > > > > > I meant it seems we should just strip shared bit away from the GPA in
> > > > > > handle_ept_violation() and pass it as 'cr2_or_gpa' here, so fault->addr
> > > > > > won't have the shared bit.
> > > > > > 
> > > > > > Do you see any problem of doing so?
> > > > > 
> > > > > We would need to add it back in "raw_gfn" in kvm_tdp_mmu_map().
> > > > 
> > > > I don't see any big difference?
> > > > 
> > > > Now in this patch the raw_gfn is directly from fault->addr:
> > > > 
> > > >          raw_gfn = gpa_to_gfn(fault->addr);
> > > > 
> > > >          tdp_mmu_for_each_pte(iter, mmu, is_private, raw_gfn, raw_gfn+1) {
> > > >                  ...
> > > >          }
> > > > 
> > > > But there's nothing wrong to get the raw_gfn from the fault->gfn.  In
> > > > fact, the zapping code just does this:
> > > > 
> > > >          /*
> > > >           * start and end doesn't have GFN shared bit.  This function zaps
> > > >           * a region including alias.  Adjust shared bit of [start, end) if
> > > >           * the root is shared.
> > > >           */
> > > >          start = kvm_gfn_for_root(kvm, root, start);
> > > >          end = kvm_gfn_for_root(kvm, root, end);
> > > > 
> > > > So there's nothing wrong to just do the same thing in both functions.
> > > > 
> > > > The point is fault->gfn has shared bit stripped away at the beginning, and
> > > > AFAICT there's no useful reason to keep shared bit in fault->addr.  The
> > > > entire @fault is a temporary structure on the stack during fault handling
> > > > anyway.
> > > 
> > > I would like to avoid code churn at this point if there is not a real clear
> > > benefit. >>
> > > One small benefit of keeping the shared bit in the fault->addr is that it is
> > > sort of consistent with how that field is used in other scenarios in KVM. In
> > > shadow paging it's not even the GPA. So it is simply the "fault address" and has
> > > to be interpreted in different ways in the fault handler. For TDX the fault
> > > address *does* include the shared bit. And the EPT needs to be faulted in at
> > > that address.
> 
> It's about how we want to define the semantic of fault->addr (forget about
> shadow MMU because for it fault->addr has different meaning from TDP):
> 
> 1) It represents the faulting address which points to the actual guest
> memory (has no shared bit).
> 
> 2) It represents the faulting address which is truly used as the hardware
> page table walk.
> 
> The fault->gfn always represents the location of actual guest memory (w/o
> shared bit) in both cases.
> 
> I originally thought 2) isn't consistent for both SNP and TDX, but thinking
> more, I now think actually both the two semantics are consistent for SNP and
> TDX, which is important in order to avoid confusion.
> 
> Anyway it's a trivial because fault->addr is only used for fault handling
> path.
> 
> So yes fine to me we choose to use 2) here.  But IMHO we should explicitly
> add a comment to 'struct kvm_page_fault' that the @addr represents 2).

Ok. I'm fine with 2).


> And I think more important thing is how we handle this "gfn" and "raw_gfn"
> in tdp_iter and 'struct kvm_mmu_page'.  See below.
> 
> > > 
> > > If we strip the shared bit when setting fault->addr we have to reconstruct it
> > > when we do the actual shared mapping. There is no way around that. Which helper
> > > does it, isn't important I think. Doing the reconstruction inside
> > > tdp_mmu_for_each_pte() could be neat, except that it doesn't know about the
> > > shared bit position.
> > > 
> > > The zapping code's use of kvm_gfn_for_root() is different because the gfn comes
> > > without the shared bit. It's not stripped and then added back. Those are
> > > operations that target GFNs really.
> > > 
> > > I think the real problem is that we are gleaning whether the fault is to private
> > > or shared memory from different things. Sometimes from fault->is_private,
> > > sometimes the presence of the shared bits, and sometimes the role bit. I think
> > > this is confusing, doubly so because we are using some of these things to infer
> > > unrelated things (mirrored vs private).
> > 
> > It's confusing we don't check it in uniform way.
> > 
> > 
> > > My guess is that you have noticed this and somehow zeroed in on the shared_mask.
> > > I think we should straighten out the mirrored/private semantics and see what the
> > > results look like. How does that sound to you?
> > 
> > I had closer look of the related code.  I think we can (mostly) uniformly use
> > gpa/gfn without shared mask.  Here is the proposal.  We need a real patch to see
> > how the outcome looks like anyway.  I think this is like what Kai is thinking
> > about.
> > 
> > 
> > - rename role.is_private => role.is_mirrored_pt
> > 
> > - sp->gfn: gfn without shared bit.
> 
> I think we can treat 'tdp_iter' and 'struct kvm_mmu_page' in the same way,
> because conceptually they both reflects the page table.

Agreed that iter->gfn and sp->gfn should be in same way.


> So I think both of them can have "gfn" or "raw_gfn", and "shared_gfn_mask".
> 
> Or have both "raw_gfn" or "gfn" but w/o "shared_gfn_mask". This may be more
> straightforward because we can just use them when needed w/o needing to play
> with gfn_shared_mask.
> 
> > 
> > - fault->address: without gfn_shared_mask
> >    Actually it doesn't matter much.  We can use gpa with gfn_shared_mask.
> 
> See above.  I think it makes sense to include the shared bit here.  It's
> trivial anyway though.

Ok, let's make fault->addr include shared mask.


> > - Update struct tdp_iter
> >    struct tdp_iter
> >      gfn: gfn without shared bit
> 
> Or "raw_gfn"?
> 
> Which may be more straightforward because it can be just from:
> 
> 	raw_gfn = gpa_to_gfn(fault->addr);
> 	gfn = fault->gfn;
> 
> 	tdp_mmu_for_each_pte(..., raw_gfn, raw_gfn + 1, gfn)
> 
> Which is the reason to make fault->addr include the shared bit AFAICT.

If we can eliminate raw_gfn and kvm_gfn_for_root(), it's better.


> > 
> >      /* Add new members */
> > 
> >      /* Indicates which PT to walk. */
> >      bool mirrored_pt;
> 
> I don't think you need this?  It's only used to select the root for page
> table walk.  Once it's done, we already have the @sptep to operate on.
> 
> And I think you can just get @mirrored_pt from the sptep:
> 
> 	mirrored_pt = sptep_to_sp(sptep)->role.mirrored_pt;
> 
> Instead, I think we should keep the @is_private to indicate whether the GFN
> is private or not, which should be distinguished with 'mirrored_pt', which
> the root page table (and the @sptep) already reflects.
> 
> Of course if the @root/@...ep is mirrored_pt, the is_private should be
> always true, like:
> 
> 	WARN_ON_ONCE(sptep_to_sp(sptep)->role.is_mirrored_pt
> 			&& !is_private);
> 
> Am I missing anything?

You said it not correct to use role. So I tried to find a way to pass down
is_mirrored and avoid to use role.

Did you change your mind? or you're fine with new name is_mirrored?

https://lore.kernel.org/kvm/4ba18e4e-5971-4683-82eb-63c985e98e6b@intel.com/
  > I don't think using kvm_mmu_page.role is correct.



> > 
> >      // This is used tdp_iter_refresh_sptep()
> >      // shared gfn_mask if mirrored_pt
> >      // 0 if !mirrored_pt
> >      gfn_shared_mask >
> > - Pass mirrored_pt and gfn_shared_mask to
> >    tdp_iter_start(..., mirrored_pt, gfn_shared_mask)
> 
> As mentioned above, I am not sure whether we need @mirrored_pt, because it
> already have the @root.  Instead we should pass is_private, which indicates
> the GFN is private.

If we can use role, we don't need iter.mirrored_pt isn't needed.


> > - trace point: update to include mirroredd_pt. Or Leave it as is for now.
> > 
> > - pr_err() that log gfn in handle_changed_spte()
> >    Update to include mirrored_pt. Or Leave it as is for now.
> > 
> > - Update spte handler (handle_changed_spte(), handle_removed_pt()...),
> >    use iter->mirror_pt or pass down mirror_pt.
> > 
> 
> IIUC only sp->role.is_mirrored_pt is needed, tdp_iter->is_mirrored_pt isn't
> necessary.  But when the @sp is created, we need to initialize whether it is
> mirrored_pt.
> 
> Am I missing anything?

Because you didn't like to use role, I tried to find other way.
-- 
Isaku Yamahata <isaku.yamahata@...el.com>