[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <30be6d64-31bd-bfc8-72f7-fb57999e4566@linux.intel.com>
Date: Tue, 8 Nov 2022 21:41:44 +0800
From: Binbin Wu <binbin.wu@...ux.intel.com>
To: isaku.yamahata@...el.com, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org
Cc: isaku.yamahata@...il.com, Paolo Bonzini <pbonzini@...hat.com>,
erdemaktas@...gle.com, Sean Christopherson <seanjc@...gle.com>,
Sagi Shahar <sagis@...gle.com>,
David Matlack <dmatlack@...gle.com>,
Kai Huang <kai.huang@...el.com>
Subject: Re: [PATCH v10 049/108] KVM: x86/tdp_mmu: Support TDX private mapping
for TDP MMU
On 2022/10/30 14:22, isaku.yamahata@...el.com wrote:
> From: Isaku Yamahata<isaku.yamahata@...el.com>
>
> Allocate protected page table for private page table, and add hooks to
> operate on protected page table. This patch adds allocation/free of
> protected page tables and hooks. When calling hooks to update SPTE entry,
> freeze the entry, call hooks and unfree
unfreeze
> the entry to allow concurrent
> updates on page tables. Which is the advantage of TDP MMU. As
> kvm_gfn_shared_mask() returns false always, those hooks aren't called yet
> with this patch.
>
> When the faulting GPA is private, the KVM fault is called private. When
> resolving private KVM,
private KVM fault?
> allocate protected page table and call hooks to
> operate on protected page table. On the change of the private PTE entry,
> invoke kvm_x86_ops hook in __handle_changed_spte() to propagate the change
> to protected page table. The following depicts the relationship.
>
> private KVM page fault |
> | |
> V |
> private GPA | CPU protected EPTP
> | | |
> V | V
> private PT root | protected PT root
> | | |
> V | V
> private PT --hook to propagate-->protected PT
> | | |
> \--------------------+------\ |
> | | |
> | V V
> | private guest page
> |
> |
> non-encrypted memory | encrypted memory
> |
> PT: page table
>
> The existing KVM TDP MMU code uses atomic update of SPTE. On populating
> the EPT entry, atomically set the entry. However, it requires TLB
> shootdown to zap SPTE. To address it, the entry is frozen with the special
> SPTE value that clears the present bit. After the TLB shootdown, the entry
> is set to the eventual value (unfreeze).
>
> For protected page table, hooks are called to update protected page table
> in addition to direct access to the private SPTE. For the zapping case, it
> works to freeze the SPTE. It can call hooks in addition to TLB shootdown.
> For populating the private SPTE entry, there can be a race condition
> without further protection
>
> vcpu 1: populating 2M private SPTE
> vcpu 2: populating 4K private SPTE
> vcpu 2: TDX SEAMCALL to update 4K protected SPTE => error
> vcpu 1: TDX SEAMCALL to update 2M protected SPTE
>
> To avoid the race, the frozen SPTE is utilized. Instead of atomic update
> of the private entry, freeze the entry, call the hook that update protected
> SPTE, set the entry to the final value.
>
> Support 4K page only at this stage. 2M page support can be done in future
> patches.
>
> Co-developed-by: Kai Huang<kai.huang@...el.com>
> Signed-off-by: Kai Huang<kai.huang@...el.com>
> Signed-off-by: Isaku Yamahata<isaku.yamahata@...el.com>
> ---
> arch/x86/include/asm/kvm-x86-ops.h | 5 +
> arch/x86/include/asm/kvm_host.h | 11 ++
> arch/x86/kvm/mmu/mmu.c | 15 +-
> arch/x86/kvm/mmu/mmu_internal.h | 32 ++++
> arch/x86/kvm/mmu/tdp_iter.h | 2 +-
> arch/x86/kvm/mmu/tdp_mmu.c | 244 +++++++++++++++++++++++++----
> arch/x86/kvm/mmu/tdp_mmu.h | 2 +-
> virt/kvm/kvm_main.c | 1 +
> 8 files changed, 280 insertions(+), 32 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index f28c9fd72ac4..1b01dc2098b0 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -94,6 +94,11 @@ KVM_X86_OP_OPTIONAL_RET0(set_tss_addr)
> KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr)
> KVM_X86_OP_OPTIONAL_RET0(get_mt_mask)
> KVM_X86_OP(load_mmu_pgd)
> +KVM_X86_OP_OPTIONAL(link_private_spt)
> +KVM_X86_OP_OPTIONAL(free_private_spt)
> +KVM_X86_OP_OPTIONAL(set_private_spte)
> +KVM_X86_OP_OPTIONAL(remove_private_spte)
> +KVM_X86_OP_OPTIONAL(zap_private_spte)
> KVM_X86_OP(has_wbinvd_exit)
> KVM_X86_OP(get_l2_tsc_offset)
> KVM_X86_OP(get_l2_tsc_multiplier)
>
> @@ -509,9 +524,81 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
> WARN_ON_ONCE(ret);
> }
>
> + if (is_private_sp(sp) &&
> + WARN_ON(static_call(kvm_x86_free_private_spt)(kvm, sp->gfn, sp->role.level,
> + kvm_mmu_private_spt(sp)))) {
> + /*
> + * Failed to unlink Secure EPT page and there is nothing to do
> + * further. Intentionally leak the page to prevent the kernel
> + * from accessing the encrypted page.
> + */
> + kvm_mmu_init_private_spt(sp, NULL);
Do you think is it better to add some statistics for the intentinal
leakage?
> + }
> +
> call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
> }
>
> pu, until a matching vcpu_put()
Powered by blists - more mailing lists