linux-kernel - Re: [PATCH 1/2] KVM: Deliver VM fault signals to userspace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <YKxJLcg/WomPE422@google.com>
Date:   Tue, 25 May 2021 00:47:41 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     James Houghton <jthoughton@...gle.com>
Cc:     Mike Kravetz <mike.kravetz@...cle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Shuah Khan <shuah@...nel.org>, Ben Gardon <bgardon@...gle.com>,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org, linux-kselftest@...r.kernel.org,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        Jue Wang <juew@...gle.com>, Peter Xu <peterx@...hat.com>,
        Andrea Arcangeli <aarcange@...hat.com>
Subject: Re: [PATCH 1/2] KVM: Deliver VM fault signals to userspace

On Wed, May 19, 2021, James Houghton wrote:
> This patch has been written to support page-ins using userfaultfd's
> SIGBUS feature.  When a userfaultfd is created with UFFD_FEATURE_SIGBUS,
> `handle_userfault` will return VM_FAULT_SIGBUS instead of putting the
> calling thread to sleep. Normal (non-guest) threads that access memory
> that has been registered with a UFFD_FEATURE_SIGBUS userfaultfd receive
> a SIGBUS.
> 
> When a vCPU gets an EPT page fault in a userfaultfd-registered region,
> KVM calls into `handle_userfault` to resolve the page fault. With
> UFFD_FEATURE_SIGBUS, VM_FAULT_SIGBUS is returned, but a SIGBUS is never
> delivered to the userspace thread. This patch propagates the
> VM_FAULT_SIGBUS error up to KVM, where we then send the signal.
> 
> Upon receiving a VM_FAULT_SIGBUS, the KVM_RUN ioctl will exit to
> userspace. This functionality already exists.

I would strongly prefer to fix this in KVM by returning a KVM specific exit
reason (instead of -EFAULT), with additional information provided in vcpu->run,
e.g. address, etc...

VirtioFS has (had?) a similar problem with a file being truncated in the host
and the guest being killed as a result due to KVM returning -EFAULT without any
useful information[*].  That idea never got picked up, but I'm 99% certain the
solution would provide exactly the functionality you want.

[*] https://lkml.kernel.org/r/20200617230052.GB27751@linux.intel.com


Handling this purely in KVM would have several advantages:

  - No need to plumb @fault_error around mm/.  KVM might be able to fudge this
    anyways by looking for -EFAULT, but then it would mess up SIGBUS vs SIGSEGV.

  - KVM can provide more relevant information then the signal path, e.g. guest
    RIP and GPA.  This probably isn't useful for your use case, but for debug
    and other use cases it can be very helpful.

  - The error and its info are synchronous and delivered together (on exit to
    userspace), instead of being split across KVM and the signal handling.

  - This behavior needs to be opt-in to avoid breaking KVM's (awful) ABI, but we
    might be able to get away with squeezing the extra info into vcpu->run even
    if userspace doesn't opt-in (though that doesn't mean userspace will do
    anything with it).

  - I hate signal handling (ok, not a legitimate reason).

The big downside is that implementing the synchronous reporting would need to
either be done for every KVM architecture, or would need to touch every arch if
done generically.  I haven't looked at other architectures for this specific
issue, so I don't know which of those routes would be least awful.

A very incomplete patch for x86 would look something like:

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0144c40d09c7..2d4d32425c49 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2875,8 +2875,11 @@ static void kvm_send_hwpoison_signal(unsigned long address, struct task_struct *
        send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, PAGE_SHIFT, tsk);
 }
 
-static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn)
+static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, hva_t hva,
+                              kvm_pfn_t pfn)
 {
+       struct kvm_mem_fault_exit *fault = &vcpu->run->mem_fault;
+
        /*
         * Do not cache the mmio info caused by writing the readonly gfn
         * into the spte otherwise read access on readonly gfn also can
@@ -2886,25 +2889,32 @@ static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn)
                return RET_PF_EMULATE;
 
        if (pfn == KVM_PFN_ERR_HWPOISON) {
-               kvm_send_hwpoison_signal(kvm_vcpu_gfn_to_hva(vcpu, gfn), current);
+               kvm_send_hwpoison_signal(hva, current);
                return RET_PF_RETRY;
        }
 
+       fault->userspace_address = hva;
+       fault->guest_physical_address = gpa;
+       fault->guest_rip = kvm_rip_read(vcpu);
+
+       if (vcpu->kvm->arch.mem_fault_reporting_enabled)
+               return KVM_EXIT_MEM_FAULT;
+
        return -EFAULT;
 }
 
-static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
-                               kvm_pfn_t pfn, unsigned int access,
+static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu, gva_t gva, gpa_t gpa,
+                               hva_t hva, kvm_pfn_t pfn, unsigned int access,
                                int *ret_val)
 {
        /* The pfn is invalid, report the error! */
        if (unlikely(is_error_pfn(pfn))) {
-               *ret_val = kvm_handle_bad_page(vcpu, gfn, pfn);
+               *ret_val = kvm_handle_bad_page(vcpu, gpa, hva, pfn);
                return true;
        }
 
        if (unlikely(is_noslot_pfn(pfn))) {
-               vcpu_cache_mmio_info(vcpu, gva, gfn,
+               vcpu_cache_mmio_info(vcpu, gva, gpa >> PAGE_SHIFT,
                                     access & shadow_mmio_access_mask);
                /*
                 * If MMIO caching is disabled, emulate immediately without
@@ -3746,7 +3756,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
                         write, &map_writable))
                return RET_PF_RETRY;
 
-       if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r))
+       if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gpa, hva, pfn, ACC_ALL, &r))
                return r;
 
        r = RET_PF_RETRY;