linux-kernel - Re: [PATCH] KVM: x86/mmu: Update number of zapped pages even if page list is stable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YZKzr4mn1jJ3vdqK@google.com>
Date:   Mon, 15 Nov 2021 19:23:27 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     David Matlack <dmatlack@...gle.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org, Ben Gardon <bgardon@...gle.com>
Subject: Re: [PATCH] KVM: x86/mmu: Update number of zapped pages even if page
 list is stable

On Mon, Nov 15, 2021, David Matlack wrote:
> On Thu, Nov 11, 2021 at 2:14 PM Sean Christopherson <seanjc@...gle.com> wrote:
> >
> > When zapping obsolete pages, update the running count of zapped pages
> > regardless of whether or not the list has become unstable due to zapping
> > a shadow page with its own child shadow pages.  If the VM is backed by
> > mostly 4kb pages, KVM can zap an absurd number of SPTEs without bumping
> > the batch count and thus without yielding.  In the worst case scenario,
> > this can cause an RCU stall.
> >
> >   rcu: INFO: rcu_sched self-detected stall on CPU
> >   rcu:     52-....: (20999 ticks this GP) idle=7be/1/0x4000000000000000
> >                                           softirq=15759/15759 fqs=5058
> >    (t=21016 jiffies g=66453 q=238577)
> >   NMI backtrace for cpu 52
> >   Call Trace:
> >    ...
> >    mark_page_accessed+0x266/0x2f0
> >    kvm_set_pfn_accessed+0x31/0x40
> >    handle_removed_tdp_mmu_page+0x259/0x2e0
> >    __handle_changed_spte+0x223/0x2c0
> >    handle_removed_tdp_mmu_page+0x1c1/0x2e0
> >    __handle_changed_spte+0x223/0x2c0
> >    handle_removed_tdp_mmu_page+0x1c1/0x2e0
> >    __handle_changed_spte+0x223/0x2c0
> >    zap_gfn_range+0x141/0x3b0
> >    kvm_tdp_mmu_zap_invalidated_roots+0xc8/0x130
> 
> This is a useful patch but I don't see the connection with this stall.
> The stall is detected in kvm_tdp_mmu_zap_invalidated_roots, which runs
> after kvm_zap_obsolete_pages. How would rescheduling during
> kvm_zap_obsolete_pages help?

Ah shoot, I copy+pasted the wrong splat.  The correct, revelant backtrace is:

   mark_page_accessed+0x266/0x2e0
   kvm_set_pfn_accessed+0x31/0x40
   mmu_spte_clear_track_bits+0x136/0x1c0
   drop_spte+0x1a/0xc0
   mmu_page_zap_pte+0xef/0x120
   __kvm_mmu_prepare_zap_page+0x205/0x5e0
   kvm_mmu_zap_all_fast+0xd7/0x190
   kvm_mmu_invalidate_zap_pages_in_memslot+0xe/0x10
   kvm_page_track_flush_slot+0x5c/0x80
   kvm_arch_flush_shadow_memslot+0xe/0x10
   kvm_set_memslot+0x1a8/0x5d0
   __kvm_set_memory_region+0x337/0x590
   kvm_vm_ioctl+0xb08/0x1040