lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 27 Jul 2023 15:28:49 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Peter Xu <peterx@...hat.com>
Cc:     liubo <liubo254@...wei.com>, akpm@...ux-foundation.org,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        hughd@...gle.com, willy@...radead.org
Subject: Re: [PATCH] smaps: Fix the abnormal memory statistics obtained
 through /proc/pid/smaps

>>> Therefore, when obtaining pages through the follow_trans_huge_pmd
>>> interface, add the FOLL_FORCE flag to count the pages corresponding to
>>> PROTNONE to solve the above problem.
>>>
>>
>> We really want to avoid the usage of FOLL_FORCE, and ideally limit it
>> to ptrace only.
> 
> Fundamentally when removing FOLL_NUMA we did already assumed !FORCE is
> FOLL_NUMA.  It means to me after the removal it's not possible to say in a
> gup walker that "it's not FORCEd, but I don't want to trigger NUMA but just
> get the page".
> 
> Is that what we want?  Shall we document that in FOLL_FORCE if we intended
> to enforce numa balancing as long as !FORCE?

That was the idea, yes. I could have sworn we had that at least in some 
patch description.

Back then, I played with special-casing on gup_can_follow_protnone() on 
FOLL_GET | FOLL_PIN. But it's all just best guesses.

Can always be added if deemed necessary and worth it.

Here, it's simply an abuse of that GUP function that I wasn't aware of 
-- otherwise I'd have removed that before hand.

> 
>>
>>> Signed-off-by: liubo <liubo254@...wei.com>
>>> Fixes: 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()")
>>> ---
>>>   fs/proc/task_mmu.c | 6 ++++--
>>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>>> index c1e6531cb02a..ed08f9b869e2 100644
>>> --- a/fs/proc/task_mmu.c
>>> +++ b/fs/proc/task_mmu.c
>>> @@ -571,8 +571,10 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
>>>          bool migration = false;
>>>
>>>          if (pmd_present(*pmd)) {
>>> -               /* FOLL_DUMP will return -EFAULT on huge zero page */
>>> -               page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
>>> +               /* FOLL_DUMP will return -EFAULT on huge zero page
>>> +                * FOLL_FORCE follow a PROT_NONE mapped page
>>> +                */
>>> +               page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP | FOLL_FORCE);
>>>          } else if (unlikely(thp_migration_supported() && is_swap_pmd(*pmd))) {
>>>                  swp_entry_t entry = pmd_to_swp_entry(*pmd);
>>
>> Might do as an easy fix. But we really should get rid of that
>> absolutely disgusting usage of follow_trans_huge_pmd().
>>
>> We don't need 99% of what follow_trans_huge_pmd() does here.
>>
>> Would the following also fix your issue?
>>
>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>> index 507cd4e59d07..fc744964816e 100644
>> --- a/fs/proc/task_mmu.c
>> +++ b/fs/proc/task_mmu.c
>> @@ -587,8 +587,7 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
>>          bool migration = false;
>>
>>          if (pmd_present(*pmd)) {
>> -               /* FOLL_DUMP will return -EFAULT on huge zero page */
>> -               page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
>> +               page = vm_normal_page_pmd(vma, addr, *pmd);
>>          } else if (unlikely(thp_migration_supported() && is_swap_pmd(*pmd))) {
>>                  swp_entry_t entry = pmd_to_swp_entry(*pmd);
>>
>> It also skips the shared zeropage and pmd_devmap(),
>>
>> Otherwise, a simple pmd_page(*pmd) + is_huge_zero_pmd(*pmd) check will do, but I
>> suspect vm_normal_page_pmd() might be what we actually want to have here.
>>
>> Because smaps_pte_entry() properly checks for vm_normal_page().
> 
> There're indeed some very trivial detail in vm_normal_page_pmd() that's
> different, but maybe not so relevant.  E.g.,
> 
> 	if (WARN_ON_ONCE(folio_ref_count(folio) <= 0))
> 		return -ENOMEM;

Note that we're not even passing FOLL_GET | FOLL_PIN. Because we're not 
actually doing GUP. So the refcount is not that relevant.

> 
> 	if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page)))
> 		return -EREMOTEIO;
> 
> I'm not sure whether the p2pdma page would matter in any form here.  E.g.,
> whether it can be mapped privately.

Good point, but I don't think that people messing with GUP even imagined 
that we would call that function from a !GUP place.

This was wrong from the very start. If we're not in GUP, we shouldn't 
call GUP functions.

-- 
Cheers,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ