linux-kernel - Re: [PATCH v2 0/3] kvm/mm: Allow GUP to respond to non fatal signals

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YvQJO7qK2e5WU1Eg@xz-m1.local>
Date:   Wed, 10 Aug 2022 15:38:35 -0400
From:   Peter Xu <peterx@...hat.com>
To:     linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Cc:     David Hildenbrand <david@...hat.com>,
        "Dr . David Alan Gilbert" <dgilbert@...hat.com>,
        John Hubbard <jhubbard@...dia.com>,
        Sean Christopherson <seanjc@...gle.com>,
        Linux MM Mailing List <linux-mm@...ck.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Andrea Arcangeli <aarcange@...hat.com>
Subject: Re: [PATCH v2 0/3] kvm/mm: Allow GUP to respond to non fatal signals

Any further comments?  Thanks,

On Wed, Jul 20, 2022 at 08:03:15PM -0400, Peter Xu wrote:
> v2:
> - Added r-b
> - Rewrite the comment in faultin_page() for FOLL_INTERRUPTIBLE [John]
> - Dropped the controversial patch to introduce a flag for
>   __gfn_to_pfn_memslot(), instead used a boolean for now [Sean]
> - Rename s/is_sigpending_pfn/KVM_PFN_ERR_SIGPENDING/ [Sean]
> - Change comment in kvm_faultin_pfn() mentioning fatal signals [Sean]
> 
> rfc: https://lore.kernel.org/kvm/20220617014147.7299-1-peterx@redhat.com
> v1:  https://lore.kernel.org/kvm/20220622213656.81546-1-peterx@redhat.com
> 
> One issue was reported that libvirt won't be able to stop the virtual
> machine using QMP command "stop" during a paused postcopy migration [1].
> 
> It won't work because "stop the VM" operation requires the hypervisor to
> kick all the vcpu threads out using SIG_IPI in QEMU (which is translated to
> a SIGUSR1).  However since during a paused postcopy, the vcpu threads are
> hang death at handle_userfault() so there're simply not responding to the
> kicks.  Further, the "stop" command will further hang the QMP channel.
> 
> The mm has facility to process generic signal (FAULT_FLAG_INTERRUPTIBLE),
> however it's only used in the PF handlers only, not in GUP. Unluckily, KVM
> is a heavy GUP user on guest page faults.  It means we won't be able to
> interrupt a long page fault for KVM fetching guest pages with what we have
> right now.
> 
> I think it's reasonable for GUP to only listen to fatal signals, as most of
> the GUP users are not really ready to handle such case.  But actually KVM
> is not such an user, and KVM actually has rich infrastructure to handle
> even generic signals, and properly deliver the signal to the userspace.
> Then the page fault can be retried in the next KVM_RUN.
> 
> This patchset added FOLL_INTERRUPTIBLE to enable FAULT_FLAG_INTERRUPTIBLE,
> and let KVM be the first one to use it.  KVM and mm/gup can always be able
> to respond to fatal signals, but not non-fatal ones until this patchset.
> 
> One thing to mention is that this is not allowing all KVM paths to be able
> to respond to non fatal signals, but only on x86 slow page faults.  In the
> future when more code is ready for handling signal interruptions, we can
> explore possibility to have more gup callers using FOLL_INTERRUPTIBLE.
> 
> Tests
> =====
> 
> I created a postcopy environment, pause the migration by shutting down the
> network to emulate a network failure (so the handle_userfault() will stuck
> for a long time), then I tried three things:
> 
>   (1) Sending QMP command "stop" to QEMU monitor,
>   (2) Hitting Ctrl-C from QEMU cmdline,
>   (3) GDB attach to the dest QEMU process.
> 
> Before this patchset, all three use case hang.  After the patchset, all
> work just like when there's not network failure at all.
> 
> Please have a look, thanks.
> 
> [1] https://gitlab.com/qemu-project/qemu/-/issues/1052

-- 
Peter Xu