lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 14 Dec 2021 23:34:59 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Ben Gardon <bgardon@...gle.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        Hou Wenlong <houwenlong93@...ux.alibaba.com>
Subject: Re: [PATCH 15/28] KVM: x86/mmu: Take TDP MMU roots off list when
 invalidating all roots

On Mon, Nov 22, 2021, Ben Gardon wrote:
> On Mon, Nov 22, 2021 at 3:08 PM Sean Christopherson <seanjc@...gle.com> wrote:
> >
> > On Mon, Nov 22, 2021, Ben Gardon wrote:
> > > On Fri, Nov 19, 2021 at 8:51 PM Sean Christopherson <seanjc@...gle.com> wrote:
> > > >
> > > > Take TDP MMU roots off the list of roots when they're invalidated instead
> > > > of walking later on to find the roots that were just invalidated.  In
> > > > addition to making the flow more straightforward, this allows warning
> > > > if something attempts to elevate the refcount of an invalid root, which
> > > > should be unreachable (no longer on the list so can't be reached by MMU
> > > > notifier, and vCPUs must reload a new root before installing new SPTE).
> > > >
> > > > Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> > >
> > > There are a bunch of awesome little cleanups and unrelated fixes
> > > included in this commit that could be factored out.
> > >
> > > I'm skeptical of immediately moving the invalidated roots into another
> > > list as that seems like it has a lot of potential for introducing
> > > weird races.
> >
> > I disagree, the entire premise of fast invalidate is that there can't be races,
> > i.e. mmu_lock must be held for write.  IMO, it's actually the opposite, as the only
> > reason leaving roots on the per-VM list doesn't have weird races is because slots_lock
> > is held.  If slots_lock weren't required to do a fast zap, which is feasible for the
> > TDP MMU since it doesn't rely on the memslots generation, then it would be possible
> > for multiple calls to kvm_tdp_mmu_zap_invalidated_roots() to run in parallel.  And in
> > that case, leaving roots on the per-VM list would lead to a single instance of a
> > "fast zap" zapping roots it didn't invalidate.  That's wouldn't be problematic per se,
> > but I don't like not having a clear "owner" of the invalidated root.
> 
> That's a good point, the potential interleaving of zap_alls would be gross.
> 
> My mental model for the invariant here was "roots that are still in
> use are on the roots list," but I can see how "the roots list contains
> all valid, in-use roots" could be a more useful invariant.

Sadly, my idea of taking invalid roots off the list ain't gonna happen.

The immediate issue is that the TDP MMU doesn't zap invalid roots in mmu_notifier
callbacks.  This leads to use-after-free and other issues if the mmu_notifier runs
to completion while an invalid root zapper yields as KVM fails to honor the
requirement that there must be _no_ references to the page after the mmu_notifier
returns.

This is most noticeable with set_nx_huge_pages() + kvm_mmu_notifier_release(),
but the bug exists between kvm_mmu_notifier_invalidate_range_start() and memslot
updates as well.  The pages aren't accessible by the guest, and KVM won't read or
write the data itself, but KVM will trigger e.g. kvm_set_pfn_dirty() when zapping
SPTEs, _after_ the mmu_notifier returns.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ