linux-kernel - Re: [PATCH 10/16] KVM: x86/tdp_mmu: Support TDX private mapping for TDP MMU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a08779dc-056c-421c-a573-f0b1ba9da8ad@intel.com>
Date: Thu, 16 May 2024 14:07:36 +1200
From: "Huang, Kai" <kai.huang@...el.com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>, "kvm@...r.kernel.org"
	<kvm@...r.kernel.org>, "pbonzini@...hat.com" <pbonzini@...hat.com>,
	"seanjc@...gle.com" <seanjc@...gle.com>
CC: "sagis@...gle.com" <sagis@...gle.com>, "dmatlack@...gle.com"
	<dmatlack@...gle.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "isaku.yamahata@...il.com"
	<isaku.yamahata@...il.com>, "Zhao, Yan Y" <yan.y.zhao@...el.com>, "Aktas,
 Erdem" <erdemaktas@...gle.com>
Subject: Re: [PATCH 10/16] KVM: x86/tdp_mmu: Support TDX private mapping for
 TDP MMU


>>> @@ -470,6 +470,7 @@ struct kvm_mmu {
>>>          int (*sync_spte)(struct kvm_vcpu *vcpu,
>>>                           struct kvm_mmu_page *sp, int i);
>>>          struct kvm_mmu_root_info root;
>>> +       hpa_t private_root_hpa;
>>
>> Should we have
>>
>>          struct kvm_mmu_root_info private_root;
>>
>> instead?
> 
> This is corresponds to:
> mmu->root.hpa
> 
> We don't need the other fields, so I think better to not take space. It does
> look asymmetric though...

Being symmetric is why I asked.  Anyway no strong opinion.

[...]

>>>    
>>> @@ -4685,7 +4687,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct
>>> kvm_page_fault *fault)
>>>          if (kvm_mmu_honors_guest_mtrrs(vcpu->kvm)) {
>>>                  for ( ; fault->max_level > PG_LEVEL_4K; --fault->max_level)
>>> {
>>>                          int page_num = KVM_PAGES_PER_HPAGE(fault-
>>>> max_level);
>>> -                       gfn_t base = gfn_round_for_level(fault->gfn,
>>> +                       gfn_t base = gfn_round_for_level(gpa_to_gfn(fault-
>>>> addr),
>>>                                                           fault->max_level);
>>
>> I thought by reaching here the shared bit has already been stripped away
>> by the caller?
> 
> We don't support MTRRs so this code wont be executed for TDX, but not clear what
> you are asking.
> fault->addr has the shared bit (if present)
> fault->gfn has it stripped.

When I was looking at the code, I thought fault->gfn is still having the 
shred bit, and gpa_to_gfn() internally strips aways the shared bit, but 
sorry it is not true.

My question is why do we even need this change?  Souldn't we pass the 
actual GFN (which doesn't have the shared bit) to 
kvm_mtrr_check_gfn_range_consistency()?

If so, looks we should use fault->gfn to get the base?

> 
>>
>> It doesn't make a lot sense to still have it here, given we have a
>> universal KVM-defined PFERR_PRIVATE_ACCESS flag:
>>
>> https://lore.kernel.org/kvm/20240507155817.3951344-2-pbonzini@redhat.com/T/#mb30987f31b431771b42dfa64dcaa2efbc10ada5e
>>
>> IMHO we should just strip the shared bit in the TDX variant of
>> handle_ept_violation(), and pass the PFERR_PRIVATE_ACCESS (when GPA
>> doesn't hvae shared bit) to the common fault handler so it can correctly
>> set fault->is_private to true.
> 
> I'm not sure what you are seeing here, could elaborate?
See reply below.

[...]

>>
>> Anyway, from common code's perspective, we need to have some
>> clarification why we design to do it here.
>>
>>>          free_mmu_pages(&vcpu->arch.root_mmu);
>>>          free_mmu_pages(&vcpu->arch.guest_mmu);
>>>          mmu_free_memory_caches(vcpu);
>>> diff --git a/arch/x86/kvm/mmu/mmu_internal.h
>>> b/arch/x86/kvm/mmu/mmu_internal.h
>>> index 0f1a9d733d9e..3a7fe9261e23 100644
>>> --- a/arch/x86/kvm/mmu/mmu_internal.h
>>> +++ b/arch/x86/kvm/mmu/mmu_internal.h
>>> @@ -6,6 +6,8 @@
>>>    #include <linux/kvm_host.h>
>>>    #include <asm/kvm_host.h>
>>>    
>>> +#include "mmu.h"
>>> +
>>>    #ifdef CONFIG_KVM_PROVE_MMU
>>>    #define KVM_MMU_WARN_ON(x) WARN_ON_ONCE(x)
>>>    #else
>>> @@ -178,6 +180,16 @@ static inline void kvm_mmu_alloc_private_spt(struct
>>> kvm_vcpu *vcpu, struct kvm_m
>>>          sp->private_spt = kvm_mmu_memory_cache_alloc(&vcpu-
>>>> arch.mmu_private_spt_cache);
>>>    }
>>>    
>>> +static inline gfn_t kvm_gfn_for_root(struct kvm *kvm, struct kvm_mmu_page
>>> *root,
>>> +                                    gfn_t gfn)
>>> +{
>>> +       gfn_t gfn_for_root = kvm_gfn_to_private(kvm, gfn);
>>> +
>>> +       /* Set shared bit if not private */
>>> +       gfn_for_root |= -(gfn_t)!is_private_sp(root) &
>>> kvm_gfn_shared_mask(kvm);
>>> +       return gfn_for_root;
>>> +}
>>> +
>>>    static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page
>>> *sp)
>>>    {
>>>          /*
>>> @@ -348,7 +360,12 @@ static inline int __kvm_mmu_do_page_fault(struct
>>> kvm_vcpu *vcpu, gpa_t cr2_or_gp
>>>          int r;
>>>    
>>>          if (vcpu->arch.mmu->root_role.direct) {
>>> -               fault.gfn = fault.addr >> PAGE_SHIFT;
>>> +               /*
>>> +                * Things like memslots don't understand the concept of a
>>> shared
>>> +                * bit. Strip it so that the GFN can be used like normal,
>>> and the
>>> +                * fault.addr can be used when the shared bit is needed.
>>> +                */
>>> +               fault.gfn = gpa_to_gfn(fault.addr) &
>>> ~kvm_gfn_shared_mask(vcpu->kvm);
>>>                  fault.slot = kvm_vcpu_gfn_to_memslot(vcpu, fault.gfn);
>>
>> Again, I don't think it's nessary for fault.gfn to still have the shared
>> bit here?
> 
> It's getting stripped as it's set for the first time... What do you mean still
> have it?

Sorry, I meant fault->addr.

> 
>>
>> This kinda usage is pretty much the reason I want to get rid of
>> kvm_gfn_shared_mask().
> 
> I think you want to move it to an x86_op right? Not get rid of the concept of a
> shared bit? I think KVM will have a hard time doing TDX without knowing about
> the shared bit location.
> 
> Or maybe you are saying you think it should be stripped earlier and live as a PF
> error code?

I meant it seems we should just strip shared bit away from the GPA in 
handle_ept_violation() and pass it as 'cr2_or_gpa' here, so fault->addr 
won't have the shared bit.

Do you see any problem of doing so?

> 
>>
>>>          }
>>>    
>>> diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
>>> index fae559559a80..8a64bcef9deb 100644
>>> --- a/arch/x86/kvm/mmu/tdp_iter.h
>>> +++ b/arch/x86/kvm/mmu/tdp_iter.h
>>> @@ -91,7 +91,7 @@ struct tdp_iter {
>>>          tdp_ptep_t pt_path[PT64_ROOT_MAX_LEVEL];
>>>          /* A pointer to the current SPTE */
>>>          tdp_ptep_t sptep;
>>> -       /* The lowest GFN mapped by the current SPTE */
>>> +       /* The lowest GFN (shared bits included) mapped by the current SPTE
>>> */
>>>          gfn_t gfn;
>>
>> IMHO we need more clarification of this design.
> 
> Have you seen the documentation patch? Where do you think it should be? You mean
> in the tdp_iter struct?

My thinking:

Changelog should clarify why include shared bit to 'gfn' in tdp_iter.

And here around the 'gfn' we can have some simple sentence to explain 
why to include the shared bit.