[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3e6018ad-06b7-f465-7b74-aeb20bca9489@redhat.com>
Date: Wed, 20 Apr 2022 19:15:04 +0200
From: Paolo Bonzini <pbonzini@...hat.com>
To: Lai Jiangshan <jiangshanlai@...il.com>,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
Sean Christopherson <seanjc@...gle.com>
Cc: Lai Jiangshan <jiangshan.ljs@...group.com>,
Jonathan Corbet <corbet@....net>,
Vitaly Kuznetsov <vkuznets@...hat.com>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Joerg Roedel <joro@...tes.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
"H. Peter Anvin" <hpa@...or.com>, linux-doc@...r.kernel.org
Subject: Re: [PATCH 2/2] KVM: X86/MMU: Introduce role.passthrough for
shadowing 5-level NPT for 4-level NPT L1 guest
On 4/20/22 15:12, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@...group.com>
>
> When shadowing 5-level NPT for 4-level NPT L1 guest, the root_sp is
> allocated with role.level = 5 and the guest pagetable's root gfn.
>
> And root_sp->spt[0] is also allocated with the same gfn and the same
> role except role.level = 4. Luckily that they are different shadow
> pages, but only root_sp->spt[0] is the real translation of the guest
> pagetable.
>
> Here comes a problem:
>
> If the guest switches from gCR4_LA57=0 to gCR4_LA57=1 (or vice verse)
> and uses the same gfn as the root page for nested NPT before and after
> switching gCR4_LA57. The host (hCR4_LA57=1) might use the same root_sp
> for the guest even the guest switches gCR4_LA57. The guest will see
> unexpected page mapped and L2 may exploit the bug and hurt L1. It is
> lucky that the problem can't hurt L0.
>
> And three special cases need to be handled:
>
> The root_sp should be like role.direct=1 sometimes: its contents are
> not backed by gptes, root_sp->gfns is meaningless. (For a normal high
> level sp in shadow paging, sp->gfns is often unused and kept zero, but
> it could be relevant and meaningful if sp->gfns is used because they
> are backed by concrete gptes.)
>
> For such root_sp in the case, root_sp is just a portal to contribute
> root_sp->spt[0], and root_sp->gfns should not be used and
> root_sp->spt[0] should not be dropped if gpte[0] of the guest root
> pagetable is changed.
>
> Such root_sp should not be accounted too.
>
> So add role.passthrough to distinguish the shadow pages in the hash
> when gCR4_LA57 is toggled and fix above special cases by using it in
> kvm_mmu_page_{get|set}_gfn() and sp_has_gptes().
>
> Signed-off-by: Lai Jiangshan <jiangshan.ljs@...group.com>
Looks good, and is easy to backport; thanks.
However, once we have also your series "KVM: X86/MMU: Use one-off
special shadow page for special roots", perhaps the passthrough sp can
be converted to a special root (but now with a shadow_page) as well?
Paolo
> ---
> Documentation/virt/kvm/mmu.rst | 3 +++
> arch/x86/include/asm/kvm_host.h | 5 +++--
> arch/x86/kvm/mmu/mmu.c | 16 ++++++++++++++++
> arch/x86/kvm/mmu/paging_tmpl.h | 1 +
> 4 files changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/virt/kvm/mmu.rst b/Documentation/virt/kvm/mmu.rst
> index 5b1ebad24c77..4018b9d7a0d3 100644
> --- a/Documentation/virt/kvm/mmu.rst
> +++ b/Documentation/virt/kvm/mmu.rst
> @@ -202,6 +202,9 @@ Shadow pages contain the following information:
> Is 1 if the MMU instance cannot use A/D bits. EPT did not have A/D
> bits before Haswell; shadow EPT page tables also cannot use A/D bits
> if the L1 hypervisor does not enable them.
> + role.passthrough:
> + Is 1 if role.level = 5 when shadowing 5-level shadow NPT for
> + 4-level NPT L1.
> gfn:
> Either the guest page table containing the translations shadowed by this
> page, or the base page frame for linear translations. See role.direct.
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 9694dd5e6ccc..d4f8f4784d87 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -285,7 +285,7 @@ struct kvm_kernel_irq_routing_entry;
> * minimize the size of kvm_memory_slot.arch.gfn_track, i.e. allows allocating
> * 2 bytes per gfn instead of 4 bytes per gfn.
> *
> - * Indirect upper-level shadow pages are tracked for write-protection via
> + * Upper-level shadow pages having gptes are tracked for write-protection via
> * gfn_track. As above, gfn_track is a 16 bit counter, so KVM must not create
> * more than 2^16-1 upper-level shadow pages at a single gfn, otherwise
> * gfn_track will overflow and explosions will ensure.
> @@ -331,7 +331,8 @@ union kvm_mmu_page_role {
> unsigned smap_andnot_wp:1;
> unsigned ad_disabled:1;
> unsigned guest_mode:1;
> - unsigned :6;
> + unsigned passthrough:1;
> + unsigned :5;
>
> /*
> * This is left at the top of the word so that
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 1bdff55218ef..d14cb6f99cb1 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -737,6 +737,9 @@ static void mmu_free_pte_list_desc(struct pte_list_desc *pte_list_desc)
>
> static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
> {
> + if (sp->role.passthrough)
> + return sp->gfn;
> +
> if (!sp->role.direct)
> return sp->gfns[index];
>
> @@ -745,6 +748,11 @@ static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
>
> static void kvm_mmu_page_set_gfn(struct kvm_mmu_page *sp, int index, gfn_t gfn)
> {
> + if (sp->role.passthrough) {
> + WARN_ON_ONCE(gfn != sp->gfn);
> + return;
> + }
> +
> if (!sp->role.direct) {
> sp->gfns[index] = gfn;
> return;
> @@ -1861,6 +1869,9 @@ static bool sp_has_gptes(struct kvm_mmu_page *sp)
> if (sp->role.direct)
> return false;
>
> + if (sp->role.passthrough)
> + return false;
> +
> return true;
> }
>
> @@ -2059,6 +2070,8 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
> quadrant &= (1 << ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1;
> role.quadrant = quadrant;
> }
> + if (level <= vcpu->arch.mmu->root_level)
> + role.passthrough = 0;
>
> sp_list = &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)];
> for_each_valid_sp(vcpu->kvm, sp, sp_list) {
> @@ -4890,6 +4903,9 @@ kvm_calc_shadow_npt_root_page_role(struct kvm_vcpu *vcpu,
>
> role.base.direct = false;
> role.base.level = kvm_mmu_get_tdp_level(vcpu);
> + if (role.base.level == PT64_ROOT_5LEVEL &&
> + role_regs_to_root_level(regs) == PT64_ROOT_4LEVEL)
> + role.base.passthrough = 1;
>
> return role;
> }
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index 8621188b46df..c1b975fb85a2 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -1042,6 +1042,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
> .level = 0xf,
> .access = 0x7,
> .quadrant = 0x3,
> + .passthrough = 0x1,
> };
>
> /*
Powered by blists - more mailing lists