linux-kernel - Re: [PATCH 10/16] KVM: x86/tdp_mmu: Support TDX private mapping for TDP MMU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ffd24fa5-b573-4334-95c6-42429fd9ee38@intel.com>
Date: Fri, 17 May 2024 14:36:43 +1200
From: "Huang, Kai" <kai.huang@...el.com>
To: Isaku Yamahata <isaku.yamahata@...el.com>, "Edgecombe, Rick P"
	<rick.p.edgecombe@...el.com>
CC: "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "pbonzini@...hat.com"
	<pbonzini@...hat.com>, "seanjc@...gle.com" <seanjc@...gle.com>,
	"isaku.yamahata@...il.com" <isaku.yamahata@...il.com>, "sagis@...gle.com"
	<sagis@...gle.com>, "Aktas, Erdem" <erdemaktas@...gle.com>,
	"dmatlack@...gle.com" <dmatlack@...gle.com>, "Zhao, Yan Y"
	<yan.y.zhao@...el.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, <isaku.yamahata@...ux.intel.com>
Subject: Re: [PATCH 10/16] KVM: x86/tdp_mmu: Support TDX private mapping for
 TDP MMU



On 17/05/2024 7:42 am, Isaku Yamahata wrote:
> On Thu, May 16, 2024 at 04:36:48PM +0000,
> "Edgecombe, Rick P" <rick.p.edgecombe@...el.com> wrote:
> 
>> On Thu, 2024-05-16 at 13:04 +0000, Huang, Kai wrote:
>>> On Thu, 2024-05-16 at 02:57 +0000, Edgecombe, Rick P wrote:
>>>> On Thu, 2024-05-16 at 14:07 +1200, Huang, Kai wrote:
>>>>>
>>>>> I meant it seems we should just strip shared bit away from the GPA in
>>>>> handle_ept_violation() and pass it as 'cr2_or_gpa' here, so fault->addr
>>>>> won't have the shared bit.
>>>>>
>>>>> Do you see any problem of doing so?
>>>>
>>>> We would need to add it back in "raw_gfn" in kvm_tdp_mmu_map().
>>>
>>> I don't see any big difference?
>>>
>>> Now in this patch the raw_gfn is directly from fault->addr:
>>>
>>>          raw_gfn = gpa_to_gfn(fault->addr);
>>>
>>>          tdp_mmu_for_each_pte(iter, mmu, is_private, raw_gfn, raw_gfn+1) {
>>>                  ...
>>>          }
>>>
>>> But there's nothing wrong to get the raw_gfn from the fault->gfn.  In
>>> fact, the zapping code just does this:
>>>
>>>          /*
>>>           * start and end doesn't have GFN shared bit.  This function zaps
>>>           * a region including alias.  Adjust shared bit of [start, end) if
>>>           * the root is shared.
>>>           */
>>>          start = kvm_gfn_for_root(kvm, root, start);
>>>          end = kvm_gfn_for_root(kvm, root, end);
>>>
>>> So there's nothing wrong to just do the same thing in both functions.
>>>
>>> The point is fault->gfn has shared bit stripped away at the beginning, and
>>> AFAICT there's no useful reason to keep shared bit in fault->addr.  The
>>> entire @fault is a temporary structure on the stack during fault handling
>>> anyway.
>>
>> I would like to avoid code churn at this point if there is not a real clear
>> benefit. >>
>> One small benefit of keeping the shared bit in the fault->addr is that it is
>> sort of consistent with how that field is used in other scenarios in KVM. In
>> shadow paging it's not even the GPA. So it is simply the "fault address" and has
>> to be interpreted in different ways in the fault handler. For TDX the fault
>> address *does* include the shared bit. And the EPT needs to be faulted in at
>> that address.

It's about how we want to define the semantic of fault->addr (forget 
about shadow MMU because for it fault->addr has different meaning from TDP):

1) It represents the faulting address which points to the actual guest 
memory (has no shared bit).

2) It represents the faulting address which is truly used as the 
hardware page table walk.

The fault->gfn always represents the location of actual guest memory 
(w/o shared bit) in both cases.

I originally thought 2) isn't consistent for both SNP and TDX, but 
thinking more, I now think actually both the two semantics are 
consistent for SNP and TDX, which is important in order to avoid confusion.

Anyway it's a trivial because fault->addr is only used for fault 
handling path.

So yes fine to me we choose to use 2) here.  But IMHO we should 
explicitly add a comment to 'struct kvm_page_fault' that the @addr 
represents 2).

And I think more important thing is how we handle this "gfn" and 
"raw_gfn" in tdp_iter and 'struct kvm_mmu_page'.  See below.

>>
>> If we strip the shared bit when setting fault->addr we have to reconstruct it
>> when we do the actual shared mapping. There is no way around that. Which helper
>> does it, isn't important I think. Doing the reconstruction inside
>> tdp_mmu_for_each_pte() could be neat, except that it doesn't know about the
>> shared bit position.
>>
>> The zapping code's use of kvm_gfn_for_root() is different because the gfn comes
>> without the shared bit. It's not stripped and then added back. Those are
>> operations that target GFNs really.
>>
>> I think the real problem is that we are gleaning whether the fault is to private
>> or shared memory from different things. Sometimes from fault->is_private,
>> sometimes the presence of the shared bits, and sometimes the role bit. I think
>> this is confusing, doubly so because we are using some of these things to infer
>> unrelated things (mirrored vs private).
> 
> It's confusing we don't check it in uniform way.
> 
> 
>> My guess is that you have noticed this and somehow zeroed in on the shared_mask.
>> I think we should straighten out the mirrored/private semantics and see what the
>> results look like. How does that sound to you?
> 
> I had closer look of the related code.  I think we can (mostly) uniformly use
> gpa/gfn without shared mask.  Here is the proposal.  We need a real patch to see
> how the outcome looks like anyway.  I think this is like what Kai is thinking
> about.
> 
> 
> - rename role.is_private => role.is_mirrored_pt
> 
> - sp->gfn: gfn without shared bit.

I think we can treat 'tdp_iter' and 'struct kvm_mmu_page' in the same 
way, because conceptually they both reflects the page table.

So I think both of them can have "gfn" or "raw_gfn", and "shared_gfn_mask".

Or have both "raw_gfn" or "gfn" but w/o "shared_gfn_mask". This may be 
more straightforward because we can just use them when needed w/o 
needing to play with gfn_shared_mask.

> 
> - fault->address: without gfn_shared_mask
>    Actually it doesn't matter much.  We can use gpa with gfn_shared_mask.

See above.  I think it makes sense to include the shared bit here.  It's 
trivial anyway though.

> 
> - Update struct tdp_iter
>    struct tdp_iter
>      gfn: gfn without shared bit

Or "raw_gfn"?

Which may be more straightforward because it can be just from:

	raw_gfn = gpa_to_gfn(fault->addr);
	gfn = fault->gfn;

	tdp_mmu_for_each_pte(..., raw_gfn, raw_gfn + 1, gfn)

Which is the reason to make fault->addr include the shared bit AFAICT.

> 
>      /* Add new members */
> 
>      /* Indicates which PT to walk. */
>      bool mirrored_pt;

I don't think you need this?  It's only used to select the root for page 
table walk.  Once it's done, we already have the @sptep to operate on.

And I think you can just get @mirrored_pt from the sptep:

	mirrored_pt = sptep_to_sp(sptep)->role.mirrored_pt;

Instead, I think we should keep the @is_private to indicate whether the 
GFN is private or not, which should be distinguished with 'mirrored_pt', 
which the root page table (and the @sptep) already reflects.

Of course if the @root/@...ep is mirrored_pt, the is_private should be 
always true, like:

	WARN_ON_ONCE(sptep_to_sp(sptep)->role.is_mirrored_pt
			&& !is_private);

Am I missing anything?

> 
>      // This is used tdp_iter_refresh_sptep()
>      // shared gfn_mask if mirrored_pt
>      // 0 if !mirrored_pt
>      gfn_shared_mask >
> - Pass mirrored_pt and gfn_shared_mask to
>    tdp_iter_start(..., mirrored_pt, gfn_shared_mask)

As mentioned above, I am not sure whether we need @mirrored_pt, because 
it already have the @root.  Instead we should pass is_private, which 
indicates the GFN is private.

> 
>    and update tdp_iter_refresh_sptep()
>    static void tdp_iter_refresh_sptep(struct tdp_iter *iter)
>          ...
>          iter->sptep = iter->pt_path[iter->level - 1] +
>                  SPTE_INDEX((iter->gfn << PAGE_SHIFT) | iter->gfn_shared_mask, iter->level);
> 
>    Change for_each_tdp_mte_min_level() accordingly.
>    Also the iteretor to call this.
>     
>    #define for_each_tdp_pte_min_level(kvm, iter, root, min_level, start, end)      \
>            for (tdp_iter_start(&iter, root, min_level, start,                      \
>                 mirrored_root, mirrored_root ? kvm_gfn_shared_mask(kvm) : 0);      \
>                 iter.valid && iter.gfn < kvm_gfn_for_root(kvm, root, end);         \
>                 tdp_iter_next(&iter))

See above.

> 
> - trace point: update to include mirroredd_pt. Or Leave it as is for now.
> 
> - pr_err() that log gfn in handle_changed_spte()
>    Update to include mirrored_pt. Or Leave it as is for now.
> 
> - Update spte handler (handle_changed_spte(), handle_removed_pt()...),
>    use iter->mirror_pt or pass down mirror_pt.
> 

IIUC only sp->role.is_mirrored_pt is needed, tdp_iter->is_mirrored_pt 
isn't necessary.  But when the @sp is created, we need to initialize 
whether it is mirrored_pt.

Am I missing anything?