[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YxiTOyGqXHFkR/DY@ziepe.ca>
Date: Wed, 7 Sep 2022 09:48:59 -0300
From: Jason Gunthorpe <jgg@...pe.ca>
To: David Hildenbrand <david@...hat.com>
Cc: "Tian, Kevin" <kevin.tian@...el.com>,
Alex Williamson <alex.williamson@...hat.com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"lpivarc@...hat.com" <lpivarc@...hat.com>,
"Liu, Jingqi" <jingqi.liu@...el.com>,
"Lu, Baolu" <baolu.lu@...el.com>
Subject: Re: [PATCH] vfio/type1: Unpin zero pages
On Wed, Sep 07, 2022 at 11:00:21AM +0200, David Hildenbrand wrote:
> > > I do wonder if that's a real issue, though. One approach would be to
> > > warn the VFIO users and allow for slightly exceeding the MEMLOCK limit
> > > for a while. Of course, that only works if we assume that such pinned
> > > zeropages are only extremely rarely longterm-pinned for a single VM
> > > instance by VFIO.
> >
> > I'm confused, doesn't vfio increment the memlock for every page of VA
> > it pins? Why would it matter if the page was COW'd or not? It is
> > already accounted for today as though it was a unique page.
> >
> > IOW if we add FOLL_FORCE it won't change the value of the memlock.
>
> I only briefly skimmed over the code Alex might be able to provide more
> details and correct me if I'm wrong:
>
> vfio_pin_pages_remote() contains a comment:
>
> "Reserved pages aren't counted against the user, externally pinned pages are
> already counted against the user."
>
> is_invalid_reserved_pfn() should return "true" for the shared zeropage and
> prevent us from accounting it via vfio_lock_acct(). Otherwise,
> vfio_find_vpfn() seems to be in place to avoid double-accounting pages.
is_invalid_reserved_pfn() is supposed to return 'true' for PFNs that
cannot be returned from pin_user_pages():
/*
* Some mappings aren't backed by a struct page, for example an mmap'd
* MMIO range for our own or another device. These use a different
* pfn conversion and shouldn't be tracked as locked pages.
* For compound pages, any driver that sets the reserved bit in head
* page needs to set the reserved bit in all subpages to be safe.
*/
static bool is_invalid_reserved_pfn(unsigned long pfn)
What it is talking about by 'different pfn conversion' is the
follow_fault_pfn() path, not the PUP path.
So, it is some way for VFIO to keep track of when a pfn was returned
by PUP vs follow_fault_pfn(), because it treats those two paths quite
differently.
I lost track of what the original cause of this bug is - however AFAIK
pin_user_pages() used to succeed when the zero page is mapped.
No other PUP user call this follow_fault_pfn() hacky path, and we
expect things like O_DIRECT to work properly even when reading from VA
that has the zero page mapped.
So, if we go back far enough in the git history we will find a case
where PUP is returning something for the zero page, and that something
caused is_invalid_reserved_pfn() == false since VFIO did work at some
point.
IHMO we should simply go back to the historical behavior - make
is_invalid_reserved_pfn() check for the zero_pfn and return
false. Meaning that PUP returned it.
Jason
Powered by blists - more mailing lists