linux-kernel - Re: [PATCH 03/22] KVM: x86/mmu: Trigger unprotect logic only on write-protection page faults

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Zr0GtPKepCbBgjWW@google.com>
Date: Wed, 14 Aug 2024 12:34:12 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Paolo Bonzini <pbonzini@...hat.com>
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Peter Gonda <pgonda@...gle.com>, Michael Roth <michael.roth@....com>, 
	Vishal Annapurve <vannapurve@...gle.com>, Ackerly Tng <ackerleytng@...gle.com>
Subject: Re: [PATCH 03/22] KVM: x86/mmu: Trigger unprotect logic only on
 write-protection page faults

On Wed, Aug 14, 2024, Paolo Bonzini wrote:
> On 8/9/24 21:03, Sean Christopherson wrote:
> > Trigger KVM's various "unprotect gfn" paths if and only if the page fault
> > was a write to a write-protected gfn.  To do so, add a new page fault
> > return code, RET_PF_WRITE_PROTECTED, to explicitly and precisely track
> > such page faults.
> > 
> > If a page fault requires emulation for any MMIO (or any reason besides
> > write-protection), trying to unprotect the gfn is pointless and risks
> > putting the vCPU into an infinite loop.  E.g. KVM will put the vCPU into
> > an infinite loop if the vCPU manages to trigger MMIO on a page table walk.
> > 
> > Fixes: 147277540bbc ("kvm: svm: Add support for additional SVM NPF error codes")
> > Cc: stable@...r.kernel.org
> 
> Do we really want Cc: stable@ for all these patches?  Most of them are of
> the "if it hurts, don't do it" kind;

True.  I was thinking that the VMX PFERR_GUEST_{FINAL,PAGE}_MASK bug in particular
was stable-worthy, but until TDX comes along, it's only relevant if guests puts
PDPTRs in an MMIO region.  And in that case, the guest is likely hosed anyway,
the only difference is if it gets stuck or killed.

I'll drop the stable@ tags unless someone objects.

> as long as there are no infinite loops in a non-killable region, I prefer not
> to complicate our lives with cherry picks of unknown quality.

Yeah, the RET_PF_WRITE_PROTECTED one in particular has high potential for a bad
cherry-pick.

> That said, this patch could be interesting for 6.11 because of the effect on
> prefaulting (see below).
> 
> > Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> > ---
> >   arch/x86/kvm/mmu/mmu.c          | 78 +++++++++++++++++++--------------
> >   arch/x86/kvm/mmu/mmu_internal.h |  3 ++
> >   arch/x86/kvm/mmu/mmutrace.h     |  1 +
> >   arch/x86/kvm/mmu/paging_tmpl.h  |  2 +-
> >   arch/x86/kvm/mmu/tdp_mmu.c      |  6 +--
> >   5 files changed, 53 insertions(+), 37 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 901be9e420a4..e3aa04c498ea 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -2914,10 +2914,8 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
> >   		trace_kvm_mmu_set_spte(level, gfn, sptep);
> >   	}
> > -	if (wrprot) {
> > -		if (write_fault)
> > -			ret = RET_PF_EMULATE;
> > -	}
> > +	if (wrprot && write_fault)
> > +		ret = RET_PF_WRITE_PROTECTED;
> >   	if (flush)
> >   		kvm_flush_remote_tlbs_gfn(vcpu->kvm, gfn, level);
> > @@ -4549,7 +4547,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
> >   		return RET_PF_RETRY;
> >   	if (page_fault_handle_page_track(vcpu, fault))
> > -		return RET_PF_EMULATE;
> > +		return RET_PF_WRITE_PROTECTED;
> >   	r = fast_page_fault(vcpu, fault);
> >   	if (r != RET_PF_INVALID)
> > @@ -4642,7 +4640,7 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
> >   	int r;
> >   	if (page_fault_handle_page_track(vcpu, fault))
> > -		return RET_PF_EMULATE;
> > +		return RET_PF_WRITE_PROTECTED;
> >   	r = fast_page_fault(vcpu, fault);
> >   	if (r != RET_PF_INVALID)
> > @@ -4726,6 +4724,9 @@ static int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code,
> >   	case RET_PF_EMULATE:
> >   		return -ENOENT;
> > +	case RET_PF_WRITE_PROTECTED:
> > +		return -EPERM;
> 
> Shouldn't this be a "return 0"?  Even if kvm_mmu_do_page_fault() cannot
> fully unprotect the page, it was nevertheless prefaulted as much as
> possible.

Hmm, I hadn't thought about it from that perspective.  Ah, right, and the early
check in page_fault_handle_page_track() only handles PRESENT faults, so KVM will
at least install a read-only mapping.

So yeah, agreed this should return 0.