[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f49c2a51-4dd8-784b-57fa-34fb397db2b7@redhat.com>
Date: Thu, 27 Jul 2023 15:28:49 +0200
From: David Hildenbrand <david@...hat.com>
To: Peter Xu <peterx@...hat.com>
Cc: liubo <liubo254@...wei.com>, akpm@...ux-foundation.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
hughd@...gle.com, willy@...radead.org
Subject: Re: [PATCH] smaps: Fix the abnormal memory statistics obtained
through /proc/pid/smaps
>>> Therefore, when obtaining pages through the follow_trans_huge_pmd
>>> interface, add the FOLL_FORCE flag to count the pages corresponding to
>>> PROTNONE to solve the above problem.
>>>
>>
>> We really want to avoid the usage of FOLL_FORCE, and ideally limit it
>> to ptrace only.
>
> Fundamentally when removing FOLL_NUMA we did already assumed !FORCE is
> FOLL_NUMA. It means to me after the removal it's not possible to say in a
> gup walker that "it's not FORCEd, but I don't want to trigger NUMA but just
> get the page".
>
> Is that what we want? Shall we document that in FOLL_FORCE if we intended
> to enforce numa balancing as long as !FORCE?
That was the idea, yes. I could have sworn we had that at least in some
patch description.
Back then, I played with special-casing on gup_can_follow_protnone() on
FOLL_GET | FOLL_PIN. But it's all just best guesses.
Can always be added if deemed necessary and worth it.
Here, it's simply an abuse of that GUP function that I wasn't aware of
-- otherwise I'd have removed that before hand.
>
>>
>>> Signed-off-by: liubo <liubo254@...wei.com>
>>> Fixes: 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()")
>>> ---
>>> fs/proc/task_mmu.c | 6 ++++--
>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>>> index c1e6531cb02a..ed08f9b869e2 100644
>>> --- a/fs/proc/task_mmu.c
>>> +++ b/fs/proc/task_mmu.c
>>> @@ -571,8 +571,10 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
>>> bool migration = false;
>>>
>>> if (pmd_present(*pmd)) {
>>> - /* FOLL_DUMP will return -EFAULT on huge zero page */
>>> - page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
>>> + /* FOLL_DUMP will return -EFAULT on huge zero page
>>> + * FOLL_FORCE follow a PROT_NONE mapped page
>>> + */
>>> + page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP | FOLL_FORCE);
>>> } else if (unlikely(thp_migration_supported() && is_swap_pmd(*pmd))) {
>>> swp_entry_t entry = pmd_to_swp_entry(*pmd);
>>
>> Might do as an easy fix. But we really should get rid of that
>> absolutely disgusting usage of follow_trans_huge_pmd().
>>
>> We don't need 99% of what follow_trans_huge_pmd() does here.
>>
>> Would the following also fix your issue?
>>
>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>> index 507cd4e59d07..fc744964816e 100644
>> --- a/fs/proc/task_mmu.c
>> +++ b/fs/proc/task_mmu.c
>> @@ -587,8 +587,7 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
>> bool migration = false;
>>
>> if (pmd_present(*pmd)) {
>> - /* FOLL_DUMP will return -EFAULT on huge zero page */
>> - page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
>> + page = vm_normal_page_pmd(vma, addr, *pmd);
>> } else if (unlikely(thp_migration_supported() && is_swap_pmd(*pmd))) {
>> swp_entry_t entry = pmd_to_swp_entry(*pmd);
>>
>> It also skips the shared zeropage and pmd_devmap(),
>>
>> Otherwise, a simple pmd_page(*pmd) + is_huge_zero_pmd(*pmd) check will do, but I
>> suspect vm_normal_page_pmd() might be what we actually want to have here.
>>
>> Because smaps_pte_entry() properly checks for vm_normal_page().
>
> There're indeed some very trivial detail in vm_normal_page_pmd() that's
> different, but maybe not so relevant. E.g.,
>
> if (WARN_ON_ONCE(folio_ref_count(folio) <= 0))
> return -ENOMEM;
Note that we're not even passing FOLL_GET | FOLL_PIN. Because we're not
actually doing GUP. So the refcount is not that relevant.
>
> if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page)))
> return -EREMOTEIO;
>
> I'm not sure whether the p2pdma page would matter in any form here. E.g.,
> whether it can be mapped privately.
Good point, but I don't think that people messing with GUP even imagined
that we would call that function from a !GUP place.
This was wrong from the very start. If we're not in GUP, we shouldn't
call GUP functions.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists