[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4ba18e4e-5971-4683-82eb-63c985e98e6b@intel.com>
Date: Thu, 16 May 2024 13:21:40 +1200
From: "Huang, Kai" <kai.huang@...el.com>
To: Isaku Yamahata <isaku.yamahata@...el.com>
CC: Sean Christopherson <seanjc@...gle.com>, Rick Edgecombe
<rick.p.edgecombe@...el.com>, <pbonzini@...hat.com>, <kvm@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <isaku.yamahata@...il.com>,
<erdemaktas@...gle.com>, <sagis@...gle.com>, <yan.y.zhao@...el.com>,
<dmatlack@...gle.com>, <isaku.yamahata@...ux.intel.com>
Subject: Re: [PATCH 08/16] KVM: x86/mmu: Bug the VM if kvm_zap_gfn_range() is
called for TDX
On 16/05/2024 12:15 pm, Isaku Yamahata wrote:
> On Thu, May 16, 2024 at 10:17:50AM +1200,
> "Huang, Kai" <kai.huang@...el.com> wrote:
>
>> On 16/05/2024 4:22 am, Isaku Yamahata wrote:
>>> On Wed, May 15, 2024 at 08:34:37AM -0700,
>>> Sean Christopherson <seanjc@...gle.com> wrote:
>>>
>>>>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>>>>> index d5cf5b15a10e..808805b3478d 100644
>>>>> --- a/arch/x86/kvm/mmu/mmu.c
>>>>> +++ b/arch/x86/kvm/mmu/mmu.c
>>>>> @@ -6528,8 +6528,17 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
>>>>> flush = kvm_rmap_zap_gfn_range(kvm, gfn_start, gfn_end);
>>>>> - if (tdp_mmu_enabled)
>>>>> + if (tdp_mmu_enabled) {
>>>>> + /*
>>>>> + * kvm_zap_gfn_range() is used when MTRR or PAT memory
>>>>> + * type was changed. TDX can't handle zapping the private
>>>>> + * mapping, but it's ok because KVM doesn't support either of
>>>>> + * those features for TDX. In case a new caller appears, BUG
>>>>> + * the VM if it's called for solutions with private aliases.
>>>>> + */
>>>>> + KVM_BUG_ON(kvm_gfn_shared_mask(kvm), kvm);
>>>>
>>>> Please stop using kvm_gfn_shared_mask() as a proxy for "is this TDX". Using a
>>>> generic name quite obviously doesn't prevent TDX details for bleeding into common
>>>> code, and dancing around things just makes it all unnecessarily confusing.
>>>>
>>>> If we can't avoid bleeding TDX details into common code, my vote is to bite the
>>>> bullet and simply check vm_type.
>>>
>>> TDX has several aspects related to the TDP MMU.
>>> 1) Based on the faulting GPA, determine which KVM page table to walk.
>>> (private-vs-shared)
>>> 2) Need to call TDX SEAMCALL to operate on Secure-EPT instead of direct memory
>>> load/store. TDP MMU needs hooks for it.
>>> 3) The tables must be zapped from the leaf. not the root or the middle.
>>>
>>> For 1) and 2), what about something like this? TDX backend code will set
>>> kvm->arch.has_mirrored_pt = true; I think we will use kvm_gfn_shared_mask() only
>>> for address conversion (shared<->private).
>>>
>>> For 1), maybe we can add struct kvm_page_fault.walk_mirrored_pt
>>> (or whatever preferable name)?
>>>
>>> For 3), flag of memslot handles it.
>>>
>>> ---
>>> arch/x86/include/asm/kvm_host.h | 3 +++
>>> 1 file changed, 3 insertions(+)
>>>
>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>> index aabf1648a56a..218b575d24bd 100644
>>> --- a/arch/x86/include/asm/kvm_host.h
>>> +++ b/arch/x86/include/asm/kvm_host.h
>>> @@ -1289,6 +1289,7 @@ struct kvm_arch {
>>> u8 vm_type;
>>> bool has_private_mem;
>>> bool has_protected_state;
>>> + bool has_mirrored_pt;
>>> struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
>>> struct list_head active_mmu_pages;
>>> struct list_head zapped_obsolete_pages;
>>> @@ -2171,8 +2172,10 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
>>> #ifdef CONFIG_KVM_PRIVATE_MEM
>>> #define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
>>> +#define kvm_arch_has_mirrored_pt(kvm) ((kvm)->arch.has_mirrored_pt)
>>> #else
>>> #define kvm_arch_has_private_mem(kvm) false
>>> +#define kvm_arch_has_mirrored_pt(kvm) false
>>> #endif
>>> static inline u16 kvm_read_ldt(void)
>>
>> I think this 'has_mirrored_pt' (or a better name) is better, because it
>> clearly conveys it is for the "page table", but not the actual page that any
>> page table entry maps to.
>>
>> AFAICT we need to split the concept of "private page table itself" and the
>> "memory type of the actual GFN".
>>
>> E.g., both SEV-SNP and TDX has concept of "private memory" (obviously), but
>> I was told only TDX uses a dedicated private page table which isn't directly
>> accessible for KVV. SEV-SNP on the other hand just uses normal page table +
>> additional HW managed table to make sure the security.
>
> kvm_mmu_page_role.is_private is not good name now. Probably is_mirrored_pt or
> need_callback or whatever makes sense.
>
>
>> In other words, I think we should decide whether to invoke TDP MMU callback
>> for private mapping (the page table itself may just be normal one) depending
>> on the fault->is_private, but not whether the page table is private:
>>
>> if (fault->is_private && kvm_x86_ops->set_private_spte)
>> kvm_x86_set_private_spte(...);
>> else
>> tdp_mmu_set_spte_atomic(...);
>
> This doesn't work for two reasons.
>
> - We need to pass down struct kvm_page_fault fault deep only for this.
> We could change the code in such way.
>
> - We don't have struct kvm_page_fault fault for zapping case.
> We could create a dummy one and pass it around.
For both above, we don't necessarily need the whole 'kvm_page_fault', we
just need:
1) GFN
2) Whether it is private (points to private memory to be precise)
3) use a separate private page table.
>
> Essentially the issue is how to pass down is_private or stash the info
> somewhere or determine it somehow. Options I think of are
>
> - Pass around fault:
> Con: fault isn't passed down
> Con: Create fake fault for zapping case >
> - Stash it in struct tdp_iter and pass around iter:
> Pro: work for zapping case
> Con: we need to change the code to pass down tdp_iter >
> - Pass around is_private (or mirrored_pt or whatever):
> Pro: Don't need to add member to some structure
> Con: We need to pass it around still. >
> - Stash it in kvm_mmu_page:
> The patch series uses kvm_mmu_page.role.
> Pro: We don't need to pass around because we know struct kvm_mmu_page
> Con: Need to twist root page allocation
I don't think using kvm_mmu_page.role is correct.
If kvm_mmu_page.role is private, we definitely can assume the faulting
address is private; but otherwise the address can be both private or shared.
>
> - Use gfn. kvm_is_private_gfn(kvm, gfn):
> Con: The use of gfn is confusing. It's too TDX specific.
>
>
>> And the 'has_mirrored_pt' should be only used to select the root of the page
>> table that we want to operate on.
>
> We can add one more bool to struct kvm_page_fault.follow_mirrored_pt or
> something to represent it. We can initialize it in __kvm_mmu_do_page_fault().
>
> .follow_mirrored_pt = kvm->arch.has_mirrored_pt && kvm_is_private_gpa(gpa);
>
>
>> This also gives a chance that if there's anything special needs to be done
>> for page allocated for the "non-leaf" middle page table for SEV-SNP, it can
>> just fit.
>
> Can you please elaborate on this?
I meant SEV-SNP may have it's own version of link_private_spt().
I haven't looked into it, and it may not needed from hardware's
perspective, but providing such chance certainly doesn't hurt and is
more flexible IMHO.
Powered by blists - more mailing lists