[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220128141247.GS8034@ziepe.ca>
Date: Fri, 28 Jan 2022 10:12:47 -0400
From: Jason Gunthorpe <jgg@...pe.ca>
To: Peter Xu <peterx@...hat.com>
Cc: John Hubbard <jhubbard@...dia.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>,
Andrea Arcangeli <aarcange@...hat.com>,
Jan Kara <jack@...e.cz>,
Jérôme Glisse <jglisse@...hat.com>,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
Alex Williamson <alex.williamson@...hat.com>
Subject: Re: [PATCH] mm: Fix invalid page pointer returned with FOLL_PIN gups
On Fri, Jan 28, 2022 at 11:26:16AM +0800, Peter Xu wrote:
> diff --git a/mm/gup.c b/mm/gup.c
> index 8ebc04058e97..b3a109114968 100644
> +++ b/mm/gup.c
> @@ -439,10 +439,6 @@ static struct page *no_page_table(struct vm_area_struct *vma,
> static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address,
> pte_t *pte, unsigned int flags)
> {
> - /* No page to get reference */
> - if (flags & (FOLL_GET | FOLL_PIN))
> - return -EFAULT;
> -
> if (flags & FOLL_TOUCH) {
> pte_t entry = *pte;
>
> Then the convertion will make sure it behaves like before. There is a very
> slight side effect on FOLL_TOUCH above but it shouldn't be a major concern, at
> least not in the vfio use case
You mean that it touches the page even though it will not return the
page? This is what devmap is doing already :\
> IMHO the other site follow_devmap_pud() is tricker, because we can't simply
> remove the two lines here:
Sure, but I didn't ask to fix everything, just trend toward something
sane - if you drop the 4 line above then your patch will do what it
needs to do, we don't need to jump into devmap to fix this bug.
> - /*
> - * device mapped pages can only be returned if the
> - * caller will manage the page reference count.
> - *
> - * At least one of FOLL_GET | FOLL_PIN must be set, so assert that here:
> - */
> - if (!(flags & (FOLL_GET | FOLL_PIN)))
> - return ERR_PTR(-EEXIST);
> -
> pfn += (addr & ~PUD_MASK) >> PAGE_SHIFT;
> *pgmap = get_dev_pagemap(pfn, *pgmap);
> if (!*pgmap)
>
> Because with the old code we'll 100% return -EEXIST as long as we don't have
> GET|PIN (e.g. mlock), but then later when the two lines removed we'll start to
> try call get_dev_pagemap(), if it returned something we'll start to fetch the
> page.
This is part of the "return codes don't make any sense" I was
complaining about..
devmap has a populated PTE and a struct page, it shouldn't be
returning -EFAULT. Failure to get the struct page is either EEXIST or
NULL (take a fault to resolve the race).
[as we've discussed before the get_dev_pagemap shouldn't even be here
at all, I suppose this is why it is so weird looking]
> That part looks fine, __get_user_pages() will still continue but ignore the
> page* returned.
IMHO it would be clearer if the __get_user_pages() checked the struct
page zone and rejected the cases it doesn't want than to sprinkle
these checks all over the place based on PMD flags.
We want to go in this direction anyhow so we can remove the pmd_dax()
flag..
> But what if get_dev_pagemap() returned NULL? Then we'll start to return
> -EFAULT where we used to constantly return -EEXIST.
IHMO most of the EFAULTs in the walker call chain are pretty
questionable..
they should ideally be
EEXIST or NULL. eg why does get_gate_page() return EFAULT for a
non-present page? That should be NULL and faultin_page() should
generate the EFAULT
> > For instance, should we return -EEXIST in other cases like devmap and
> > more that have PTEs present but are not returnable? It is already
> > really confusing (and probably wrong) that we sometimes return NULL
> > for populated PTEs. NULL causes faultin_page() to be called on
> > something we already know is populated, seems like nonsense.
>
> Could you elaborate what's the "return NULL" confusion you mentioned?
Returning null for a popoulated PTE as a permanent failure is
nonsense.
NULL means 'call faultin_page()', it should be used on populated pages
that ought to have a struct page but for some racy reason is
unavailable. The faultin should resolve that race and make the pte
either fully non-present or restore the struct page.
> > It is not about useful, it is about understandable and
> > consistency.. There should be clear rules about when and what errno to
> > return in every case, we should trend in that direction.
>
> I see that both you and John has a strong preference on at least the
> WARN_ON_ONCE() in the patch.
Well, I wanted to see the WARN_ON_ONCE avoided by making it
unnecessary by construction, not just deleted :\
Jason
Powered by blists - more mailing lists