[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <13b14aa6-302e-63cc-2a99-f5c22b9931fc@redhat.com>
Date: Fri, 28 Jul 2023 12:12:57 +0200
From: David Hildenbrand <david@...hat.com>
To: John Hubbard <jhubbard@...dia.com>, linux-kernel@...r.kernel.org
Cc: linux-mm@...ck.org, linux-fsdevel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
liubo <liubo254@...wei.com>, Peter Xu <peterx@...hat.com>,
Matthew Wilcox <willy@...radead.org>,
Hugh Dickins <hughd@...gle.com>,
Jason Gunthorpe <jgg@...pe.ca>, stable@...r.kernel.org
Subject: Re: [PATCH v1 2/4] mm/gup: Make follow_page() succeed again on
PROT_NONE PTEs/PMDs
On 28.07.23 11:08, David Hildenbrand wrote:
> On 28.07.23 04:30, John Hubbard wrote:
>> On 7/27/23 14:28, David Hildenbrand wrote:
>>> We accidentally enforced PROT_NONE PTE/PMD permission checks for
>>> follow_page() like we do for get_user_pages() and friends. That was
>>> undesired, because follow_page() is usually only used to lookup a currently
>>> mapped page, not to actually access it. Further, follow_page() does not
>>> actually trigger fault handling, but instead simply fails.
>>
>> I see that follow_page() is also completely undocumented. And that
>> reduces us to deducing how it should be used...these things that
>> change follow_page()'s behavior maybe should have a go at documenting
>> it too, perhaps.
>
> I can certainly be motivated to do that. :)
>
>>
>>>
>>> Let's restore that behavior by conditionally setting FOLL_FORCE if
>>> FOLL_WRITE is not set. This way, for example KSM and migration code will
>>> no longer fail on PROT_NONE mapped PTEs/PMDS.
>>>
>>> Handling this internally doesn't require us to add any new FOLL_FORCE
>>> usage outside of GUP code.
>>>
>>> While at it, refuse to accept FOLL_FORCE: we don't even perform VMA
>>> permission checks like in check_vma_flags(), so especially
>>> FOLL_FORCE|FOLL_WRITE would be dodgy.
>>>
>>> This issue was identified by code inspection. We'll add some
>>> documentation regarding FOLL_FORCE next.
>>>
>>> Reported-by: Peter Xu <peterx@...hat.com>
>>> Fixes: 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()")
>>> Cc: <stable@...r.kernel.org>
>>> Signed-off-by: David Hildenbrand <david@...hat.com>
>>> ---
>>> mm/gup.c | 10 +++++++++-
>>> 1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index 2493ffa10f4b..da9a5cc096ac 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -841,9 +841,17 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
>>> if (vma_is_secretmem(vma))
>>> return NULL;
>>>
>>> - if (WARN_ON_ONCE(foll_flags & FOLL_PIN))
>>> + if (WARN_ON_ONCE(foll_flags & (FOLL_PIN | FOLL_FORCE)))
>>> return NULL;
>>
>> This is not a super happy situation: follow_page() is now prohibited
>> (see above: we should document that interface) from passing in
>> FOLL_FORCE...
>
> I guess you saw my patch #4.
>
> If you take a look at the existing callers (that are fortunately very
> limited), you'll see that nobody cares.
>
> Most of the FOLL flags don't make any sense for follow_page(), and
> limiting further (ab)use is at least to me very appealing.
>
>>
>>>
>>> + /*
>>> + * Traditionally, follow_page() succeeded on PROT_NONE-mapped pages
>>> + * but failed follow_page(FOLL_WRITE) on R/O-mapped pages. Let's
>>> + * keep these semantics by setting FOLL_FORCE if FOLL_WRITE is not set.
>>> + */
>>> + if (!(foll_flags & FOLL_WRITE))
>>> + foll_flags |= FOLL_FORCE;
>>> +
>>
>> ...but then we set it anyway, for special cases. It's awkward because
>> FOLL_FORCE is not an "internal to gup" flag (yet?).
>>
>> I don't yet have suggestions, other than:
>>
>> 1) Yes, the FOLL_NUMA made things bad.
>>
>> 2) And they are still very confusing, especially the new use of
>> FOLL_FORCE.
>>
>> ...I'll try to let this soak in and maybe recommend something
>> in a more productive way. :)
>
> What I can offer that might be very appealing is the following:
>
> Get rid of the flags parameter for follow_page() *completely*. Yes, then
> we can even rename FOLL_ to something reasonable in the context where it
> is nowadays used ;)
>
>
> Internally, we'll then set
>
> FOLL_GET | FOLL_DUMP | FOLL_FORCE
>
> and document exactly what this functions does. Any user that needs
> something different should just look into using get_user_pages() instead.
>
> I can prototype that on top of this work easily.
The end result looks something like:
/**
* follow_page - look up and reference a page descriptor from a user-virtual
* address
* @vma: vm_area_struct mapping @address
* @address: virtual address to look up
*
* follow_page() will look up the page mapped at the given address and
* take a reference on the page. The returned page has to be released using
* put_page().
*
* follow_page() will not return special (like zero) pages and does not check
* PTE protection: the returned page might be mapped PROT_NONE, R/O or R/W.
* Consequently, follow_page() will not trigger NUMA hinting faults.
*
* follow_page() does not trigger page faults. If no page is mapped, or
* a special (like zero) page is mapped, it returns %NULL or an error pointer.
*
* Note: new users with different requirements are probably better off using
* one of the get_user_pages() variants or one of the walk_page_range()
* variants.
*
* Return: the mapped (struct page *), %NULL if no mapping exists, or
* an error pointer if there is a mapping to something not represented
* by a page descriptor (see also vm_normal_page()) or the zero page.
*/
struct page *follow_page(struct vm_area_struct *vma, unsigned long address)
{
struct follow_page_context ctx = { NULL };
unsigned long gup_flags;
struct page *page;
if (vma_is_secretmem(vma))
return NULL;
/*
* FOLL_GET: We always want a reference on the returned page.
* FOL_DUMP: Ignore special (like zero) pages.
* FOLL_FORCE: Succeeded on PROT_NONE-mapped pages.
*/
gup_flags = FOLL_GET | FOLL_DUMP | FOLL_FORCE;
page = follow_page_mask(vma, address, gup_flags, &ctx);
if (ctx.pgmap)
put_dev_pagemap(ctx.pgmap);
return page;
}
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists