lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 9 Sep 2022 11:31:49 -0300
From:   Jason Gunthorpe <jgg@...dia.com>
To:     Alex Williamson <alex.williamson@...hat.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        John Hubbard <jhubbard@...dia.com>,
        David Hildenbrand <david@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        linux-kernel@...r.kernel.org
Subject: Re: [GIT PULL] VFIO fix for v6.0-rc5

On Fri, Sep 09, 2022 at 06:08:32AM -0600, Alex Williamson wrote:
> On Fri, 9 Sep 2022 07:53:17 -0400
> Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> 
> > On Fri, Sep 9, 2022 at 6:52 AM Alex Williamson
> > <alex.williamson@...hat.com> wrote:
> > >
> > > VFIO fix for v6.0-rc5
> > >
> > >  - Fix zero page refcount leak (Alex Williamson)  
> > 
> > Ugh. This is disgusting.
> > 
> > Don't get me wrong - I've pulled this, but I think there's some deeper
> > problem that made this patch required.

The basic issue is that VFIO is using the reserved page as an
indication that pin_user_pages_remote() didn't process the page and
instead VFIO's bad follow_pfn() path was used.

It really shouldn't be working like this, which path was used to
obtain the PFN should be recorded independently. VFIO just doesn't
have a datastructure to accommodate it.

Aside from that, most longterm users don't ever even see the zero page
because that typically creates that GUP incoherence we've talked about
alot. The ugly FOLL_FORCE prevents it from happening. VFIO doesn't do
this either so it suffers the incoherence bug along with exposure to
the zero pages :(

David will fix the FOLL_FORCE issue and this will almost go away.

We had a long thread about this and discussed fixing it by simply
setting FOLL_FORCE as other pin_user_page users do. This reached an
interesting point that I'm curious on your view about ABI stability.

It is related to mlock accounting. When vfio pins pages it accounts
for them in mlock. Alex says there are users that set a mlock
limit. VFIO has this historical bug where it doesn't account for every
page it is asked to pin in mlock, eg it ignores the zero page. So, if
we change how GUP or VFIO does page pinning in a way that increases
its mlock charge is this an ABI break?

On one hand users with an very strict limit may find their environment
requires a limit increase after a kernel upgrade. So it fails the
'everything keeps working after kernel upgrade' test.

On the other hand, treating these sorts of limits as ABI means that a
lot of internal stuff is suddenly ABI - eg with a memory cgroup is
increasing the charge on GFP_KERNEL_ACCOUNT also ABI breaking? That
seems unworkable.

Thanks,
Jason

Powered by blists - more mailing lists