linux-kernel - Re: [PATCH 1/2] KVM: MMU: Do not treat ZONE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Mon, 11 Nov 2019 10:27:51 -0800
From:   Sean Christopherson <sean.j.christopherson@...el.com>
To:     Dan Williams <dan.j.williams@...el.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>,
        Radim Krčmář <rkrcmar@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>, KVM list <kvm@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Adam Borowski <kilobyte@...band.pl>,
        David Hildenbrand <david@...hat.com>
Subject: Re: [PATCH 1/2] KVM: MMU: Do not treat ZONE_DEVICE pages as being
 reserved

On Fri, Nov 08, 2019 at 06:00:46PM -0800, Dan Williams wrote:
> On Fri, Nov 8, 2019 at 5:43 PM Sean Christopherson
> <sean.j.christopherson@...el.com> wrote:
> > On Thu, Nov 07, 2019 at 07:58:46AM -0800, Sean Christopherson wrote:
> > > Insertion into KVM's secondary MMU is mutually exclusive with an invalidate
> > > from the mmu_notifier.  KVM holds a reference to the to-be-inserted page
> > > until the page has been inserted, which ensures that the page is pinned and
> > > thus won't be invalidated until after the page is inserted.  This prevents
> > > an invalidate from racing with insertion.  Dropping the reference
> > > immediately after gup() would allow the invalidate to run prior to the page
> > > being inserted, and so KVM would map the stale PFN into the guest's page
> > > tables after it was invalidated in the host.
> >
> > My previous analysis is wrong, although I did sort of come to the right
> > conclusion.
> >
> > The part that's wrong is that KVM does not rely on pinning a page/pfn when
> > installing the pfn into its secondary MMU (guest page tables).  Instead,
> > KVM keeps track of mmu_notifier invalidate requests and cancels insertion
> > if an invalidate occured at any point between the start of hva_to_pfn(),
> > i.e. the get_user_pages() call, and acquiring KVM's mmu lock (which must
> > also be grabbed by mmu_notifier invalidate).  So for any pfn, regardless
> > of whether it's backed by a struct page, KVM inserts a pfn if and only if
> > it is guaranteed to get an mmu_notifier invalidate for the pfn (and isn't
> > already invalidated).
> >
> > In the page fault flow, KVM doesn't care whether or not the pfn remains
> > valid in the associated vma.  In other words, Dan's idea of immediately
> > doing put_page() on ZONE_DEVICE pages would work for *page faults*...
> >
> > ...but not for all the other flows where KVM uses gfn_to_pfn(), and thus
> > get_user_pages().  When accessing entire pages of guest memory, e.g. for
> > nested virtualization, KVM gets the page associated with a gfn, maps it
> > with kmap() to get a kernel address and keeps the mapping/page until it's
> > done reading/writing the page.  Immediately putting ZONE_DEVICE pages
> > would result in use-after-free scenarios for these flows.
> 
> Thanks for this clarification. I do want to put out though that
> ZONE_DEVICE pages go idle, they don't get freed. As long as KVM drops
> its usage on invalidate it's perfectly fine for KVM to operate on idle
> ZONE_DEVICE pages. The common case is that ZONE_DEVICE pages are
> accessed and mapped while idle. Only direct-I/O temporarily marks them
> busy to synchronize with invalidate. KVM obviates that need by
> coordinating with mmu-notifiers instead.

Only the KVM MMU, e.g. page fault handler, coordinates via mmu_notifier,
the kvm_vcpu_map() case would continue using pages across an invalidate.

Or did I misunderstand?