[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZzVVYCNFkH3cpGY-@google.com>
Date: Wed, 13 Nov 2024 17:41:52 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Paolo Bonzini <pbonzini@...hat.com>
Cc: linux-kernel@...r.kernel.org, kvm@...r.kernel.org, michael.roth@....com
Subject: Re: [PATCH 3/3] KVM: gmem: track preparedness a page at a time
On Wed, Nov 13, 2024, Sean Christopherson wrote:
> On Fri, Nov 08, 2024, Paolo Bonzini wrote:
> > static void kvm_gmem_mark_prepared(struct file *file, pgoff_t index, struct folio *folio)
> > {
> > - folio_mark_uptodate(folio);
> > + struct kvm_gmem_inode *i_gmem = (struct kvm_gmem_inode *)file->f_inode->i_private;
> > + unsigned long *p = i_gmem->prepared + BIT_WORD(index);
> > + unsigned long npages = folio_nr_pages(folio);
> > +
> > + /* Folios must be naturally aligned */
> > + WARN_ON_ONCE(index & (npages - 1));
> > + index &= ~(npages - 1);
> > +
> > + /* Clear page before updating bitmap. */
> > + smp_wmb();
> > +
> > + if (npages < BITS_PER_LONG) {
> > + bitmap_set_atomic_word(p, index, npages);
> > + } else {
> > + BUILD_BUG_ON(BITS_PER_LONG != 64);
> > + memset64((u64 *)p, ~0, BITS_TO_LONGS(npages));
>
> More code that could be deduplicated (unprepared path below).
>
> But more importantly, I'm pretty sure the entire lockless approach is unsafe.
>
> Callers of kvm_gmem_get_pfn() do not take any locks that protect kvm_gmem_mark_prepared()
> from racing with kvm_gmem_mark_range_unprepared(). kvm_mmu_invalidate_begin()
> prevents KVM from installing a stage-2 mapping, i.e. from consuming the PFN, but
> it doesn't prevent kvm_gmem_mark_prepared() from being called. And
> sev_handle_rmp_fault() never checks mmu_notifiers, which is probably fine? But
> sketchy.
>
> Oof. Side topic. sev_handle_rmp_fault() is suspect for other reasons. It does
> its own lookup of the PFN, and so could see an entirely different PFN than was
> resolved by kvm_mmu_page_fault(). And thanks to KVM's wonderful 0/1/-errno
> behavior, sev_handle_rmp_fault() is invoked even when KVM wants to retry the
> fault, e.g. due to an active MMU invalidation.
>
> Anyways, KVM wouldn't _immediately_ consume bad data, as the invalidation
> would block the current page fault. But clobbering i_gmem->prepared would result
> in a missed "prepare" phase if the hole-punch region were restored.
>
> One idea would be to use a rwlock to protect updates to the bitmap (setters can
> generally stomp on each other). And to avoid contention whenever possible in
> page fault paths, only take the lock if the page is not up-to-date, because again
> kvm_mmu_invalidate_{begin,end}() protects against UAF, it's purely updates to the
> bitmap that need extra protection.
Actually, there's no point in having a rwlock, because readers are serialized on
the folio's lock. And KVM absolutely relies on that already, because otherwise
multiple vCPUs could see an unprepared folio, and clear_highpage() could end up
racing with writes from the vCPU.
> Note, using is mmu_invalidate_retry_gfn() is unsafe, because it must be called
> under mmu_lock to ensure it doesn't get false negatives.
Powered by blists - more mailing lists