linux-kernel - Re: [PATCH 08/16] KVM: x86/mmu: WARN and skip MMIO cache on private, reserved page faults

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZeXt6A_w4etYCYP7@google.com>
Date: Mon, 4 Mar 2024 07:51:04 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Kai Huang <kai.huang@...el.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Yan Zhao <yan.y.zhao@...el.com>, Isaku Yamahata <isaku.yamahata@...el.com>, 
	Michael Roth <michael.roth@....com>, Yu Zhang <yu.c.zhang@...ux.intel.com>, 
	Chao Peng <chao.p.peng@...ux.intel.com>, Fuad Tabba <tabba@...gle.com>, 
	David Matlack <dmatlack@...gle.com>
Subject: Re: [PATCH 08/16] KVM: x86/mmu: WARN and skip MMIO cache on private,
 reserved page faults

On Fri, Mar 01, 2024, Kai Huang wrote:
> On 1/03/2024 12:06 pm, Sean Christopherson wrote:
> > E.g. in this case, KVM will just skip various fast paths because of the RSVD flag,
> > and treat the fault like a PRIVATE fault.  Hmm, but page_fault_handle_page_track()
> > would skip write tracking, which could theoretically cause data corruption, so I
> > guess arguably it would be safer to bail?
> > 
> > Anyone else have an opinion?  This type of bug should never escape development,
> > so I'm a-ok effectively killing the VM.  Unless someone has a good argument for
> > continuing on, I'll go with Kai's suggestion and squash this:
> > 
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index cedacb1b89c5..d796a162b2da 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -5892,8 +5892,10 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
> >                  error_code |= PFERR_PRIVATE_ACCESS;
> >          r = RET_PF_INVALID;
> > -       if (unlikely((error_code & PFERR_RSVD_MASK) &&
> > -                    !WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))) {
> > +       if (unlikely(error_code & PFERR_RSVD_MASK)) {
> > +               if (WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))
> > +                       return -EFAULT;
> 
> -EFAULT is part of guest_memfd() memory fault ABI.  I didn't think over this
> thoroughly but do you want to return -EFAULT here?

Yes, I/we do.  There are many existing paths that can return -EFAULT from KVM_RUN
without setting run->exit_reason to KVM_EXIT_MEMORY_FAULT.  Userspace is responsible
for checking run->exit_reason on -EFAULT (and -EHWPOISON), i.e. must be prepared
to handle a "bare" -EFAULT, where for all intents and purposes "handle" means
"terminate the guest".

That's actually one of the reasons why KVM_EXIT_MEMORY_FAULT exists, it'd require
an absurd amount of work and churn in KVM to *safely* return useful information
on *all* -EFAULTs.  FWIW, I had hopes and dreams of actually doing exactly this,
but have long since abandoned those dreams.

In other words, KVM_EXIT_MEMORY_FAULT essentially communicates to userspace that
(a) userspace can likely fix whatever badness triggered the -EFAULT, and (b) that
KVM is in a state where fixing the underlying problem and resuming the guest is
safe, e.g. won't corrupt the guest (because KVM is in a half-baked state).