lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 24 May 2023 11:29:45 -0700
From:   Sean Christopherson <seanjc@...gle.com>
To:     Peter Xu <peterx@...hat.com>
Cc:     David Stevens <stevensd@...omium.org>,
        Marc Zyngier <maz@...nel.org>,
        Oliver Upton <oliver.upton@...ux.dev>,
        Paolo Bonzini <pbonzini@...hat.com>,
        linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.cs.columbia.edu,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [PATCH v6 1/4] KVM: mmu: introduce new gfn_to_pfn_noref functions

On Wed, May 24, 2023, Peter Xu wrote:
> On Wed, May 24, 2023 at 09:46:13AM -0700, Sean Christopherson wrote:
> > If we hack kvm_pfn_to_refcounted_page(), then all of those protections are lost
> > because KVM would drop its assertions and also skip dirtying pages, i.e. would
> > effectively suppress the latent detection by check_new_page_bad().
> 
> So it's probably that I totally have no idea what are the attributes for
> those special pages so I don't understand enough on why we need to handle
> those pages differently from e.g. PFNMAP pages, and also the benefits.
> 
> I think what I can tell is that they're pages that doesn't have
> PageCompound bits set on either head or tails, however it's still a
> multi-2-order large page.  Is there an example on how these pages are used
> and allocated?  Why would we need those pages, and whether these pages need
> to be set dirty/accessed after all?

The use case David is interested in is where an AMD GPU driver kmallocs() a
chunk of memory, let's it be mmap()'d by userspace, and userspace then maps it
into the guest for a virtual (passthrough?) GPU.  For all intents and purposes,
it's normal memory, just not refcounted.

> >  static bool kvm_is_ad_tracked_page(struct page *page)
> >  {
> > +       /*
> > +        * Assert that KVM isn't attempting to mark a freed page as Accessed or
> > +        * Dirty, i.e. that KVM's MMU doesn't have a use-after-free bug.  KVM
> > +        * (typically) doesn't pin pages that are mapped in KVM's MMU, and
> > +        * instead relies on mmu_notifiers to know when a mapping needs to be
> > +        * zapped/invalidated.  Unmapping from KVM's MMU must happen _before_
> > +        * KVM returns from its mmu_notifier, i.e. the page should have an
> > +        * elevated refcount at this point even though KVM doesn't hold a
> > +        * reference of its own.
> > +        */
> > +       if (WARN_ON_ONCE(!page_count(page)))
> > +               return false;
> > +
> >         /*
> >          * Per page-flags.h, pages tagged PG_reserved "should in general not be
> >          * touched (e.g. set dirty) except by its owner".
> > 
> 
> This looks like a good thing to have, indeed.  But again it doesn't seem
> like anything special to the pages we're discussing here, say, !Compound &&
> refcount==0 ones.

The problem is that if KVM ignores refcount==0 pages, then KVM can't distinguish
between the legitimate[*] refcount==0 AMD GPU case and a buggy refcount==0
use-after-free scenario.  I don't want to make that sacrifice as the legimiate
!refcounted use case is a very specific use case, whereas consuming refcounted
memory is ubiquituous (outside of maybe AWS).

[*] Consuming !refcounted pages is safe only for flows that are tied into the
    mmu_notifiers.  The current proposal/plan is to add an off-by-default module
    param that let's userspace opt-in to kmap() use of !refcounted memory, e.g.
    this case and PFNMAP memory.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ