linux-kernel - Re: [PATCH 5/7] KVM: x86/mmu: Introduce kvm_tdp_map

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CABgObfaoVMzEhu6O5HPe=GXH-bCkpTwSy8Ji0a1=je6f3eSqRQ@mail.gmail.com>
Date: Wed, 17 Apr 2024 23:31:49 +0200
From: Paolo Bonzini <pbonzini@...hat.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: linux-kernel@...r.kernel.org, kvm@...r.kernel.org, 
	isaku.yamahata@...el.com, xiaoyao.li@...el.com, binbin.wu@...ux.intel.com, 
	rick.p.edgecombe@...el.com
Subject: Re: [PATCH 5/7] KVM: x86/mmu: Introduce kvm_tdp_map_page() to
 populate guest memory

On Wed, Apr 17, 2024 at 11:24 PM Sean Christopherson <seanjc@...glecom> wrote:
>
> On Wed, Apr 17, 2024, Paolo Bonzini wrote:
> > From: Isaku Yamahata <isaku.yamahata@...el.com>
> >
> > Introduce a helper function to call the KVM fault handler.  It allows a new
> > ioctl to invoke the KVM fault handler to populate without seeing RET_PF_*
> > enums or other KVM MMU internal definitions because RET_PF_* are internal
> > to x86 KVM MMU.  The implementation is restricted to two-dimensional paging
> > for simplicity.  The shadow paging uses GVA for faulting instead of L1 GPA.
> > It makes the API difficult to use.
> >
> > Signed-off-by: Isaku Yamahata <isaku.yamahata@...el.com>
> > Message-ID: <9b866a0ae7147f96571c439e75429a03dcb659b6.1712785629.git.isaku.yamahata@...el.com>
> > Signed-off-by: Paolo Bonzini <pbonzini@...hat.com>
> > ---
> >  arch/x86/kvm/mmu.h     |  3 +++
> >  arch/x86/kvm/mmu/mmu.c | 32 ++++++++++++++++++++++++++++++++
> >  2 files changed, 35 insertions(+)
> >
> > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> > index e8b620a85627..51ff4f67e115 100644
> > --- a/arch/x86/kvm/mmu.h
> > +++ b/arch/x86/kvm/mmu.h
> > @@ -183,6 +183,9 @@ static inline void kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu,
> >       __kvm_mmu_refresh_passthrough_bits(vcpu, mmu);
> >  }
> >
> > +int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code,
> > +                  u8 *level);
> > +
> >  /*
> >   * Check if a given access (described through the I/D, W/R and U/S bits of a
> >   * page fault error code pfec) causes a permission fault with the given PTE
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 7fbcfc97edcc..fb2149d16f8d 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -4646,6 +4646,38 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> >       return direct_page_fault(vcpu, fault);
> >  }
> >
> > +int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code,
> > +                  u8 *level)
>
> If the return is an overloaded "long", then there's no need for @level, ie. do
> the level=>size conversion in this helper.
>
> > +{
> > +     int r;
> > +
> > +     /* Restrict to TDP page fault. */
>
> Do we want to restrict this to the TDP MMU?  Not for any particular reason, mostly
> just to keep moving towards officially deprecating/removing TDP support from the
> shadow MMU.

Heh, yet another thing I briefly thought about while reviewing Isaku's
work. In the end I decided that, with the implementation being just a
regular prefault, there's not much to save from keeping the shadow MMU
away from this.

The real ugly part is that if the memslots are zapped the
pre-population effect basically goes away (damn
kvm_arch_flush_shadow_memslot). This is the reason why I initially
thought of KVM_CHECK_EXTENSION for the VM file descriptor, to only
allow this for TDX VMs.

The real solution for this is to not "graduate" this ioctl too soon to
kvm/next. Let's keep it in kvm-coco-queue until TDX is ready and then
make a final decision.

Paolo

> > +     if (vcpu->arch.mmu->page_fault != kvm_tdp_page_fault)
> > +             return -EOPNOTSUPP;
> > +
> > +     r = __kvm_mmu_do_page_fault(vcpu, gpa, error_code, true, NULL, level);
> > +     if (r < 0)
> > +             return r;
> > +
> > +     switch (r) {
> > +     case RET_PF_RETRY:
> > +             return -EAGAIN;
> > +
> > +     case RET_PF_FIXED:
> > +     case RET_PF_SPURIOUS:
> > +             return 0;
>
> Going with the "long" idea, this becomes:
>
>                 end = (gpa & KVM_HPAGE_MASK(level)) + KVM_HPAGE_SIZE(level);
>                 return min(size, end - gpa);
>
> though I would vote for a:
>
>                 break;
>
> so that the happy path is nicely isolated at the end of the function.
>
> > +
> > +     case RET_PF_EMULATE:
> > +             return -EINVAL;
> > +
> > +     case RET_PF_CONTINUE:
> > +     case RET_PF_INVALID:
> > +     default:
> > +             WARN_ON_ONCE(r);
> > +             return -EIO;
> > +     }
> > +}
> > +
> >  static void nonpaging_init_context(struct kvm_mmu *context)
> >  {
> >       context->page_fault = nonpaging_page_fault;
> > --
> > 2.43.0
> >
> >
>