lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3e2ebfd2-7a63-449b-901a-f559049473e4@redhat.com>
Date: Wed, 3 Jul 2024 18:19:03 +0200
From: David Hildenbrand <david@...hat.com>
To: Yang Shi <shy828301@...il.com>, Ryan Roberts <ryan.roberts@....com>
Cc: Baolin Wang <baolin.wang@...ux.alibaba.com>,
 Bang Li <libang.li@...group.com>, hughd@...gle.com,
 akpm@...ux-foundation.org, wangkefeng.wang@...wei.com, ziy@...dia.com,
 linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH] support "THPeligible" semantics for mTHP with anonymous
 shmem

On 03.07.24 18:08, Yang Shi wrote:
> On Tue, Jul 2, 2024 at 1:24 AM Ryan Roberts <ryan.roberts@....com> wrote:
>>
>> On 01/07/2024 19:20, Yang Shi wrote:
>>> On Mon, Jul 1, 2024 at 3:23 AM David Hildenbrand <david@...hat.com> wrote:
>>>>
>>>> On 01.07.24 12:16, Ryan Roberts wrote:
>>>>> On 01/07/2024 10:17, David Hildenbrand wrote:
>>>>>> On 01.07.24 11:14, Ryan Roberts wrote:
>>>>>>> On 01/07/2024 09:57, David Hildenbrand wrote:
>>>>>>>> On 01.07.24 10:50, Ryan Roberts wrote:
>>>>>>>>> On 01/07/2024 09:48, David Hildenbrand wrote:
>>>>>>>>>> On 01.07.24 10:40, Ryan Roberts wrote:
>>>>>>>>>>> On 01/07/2024 09:33, Baolin Wang wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 2024/7/1 15:55, Ryan Roberts wrote:
>>>>>>>>>>>>> On 28/06/2024 11:49, Bang Li wrote:
>>>>>>>>>>>>>> After the commit 7fb1b252afb5 ("mm: shmem: add mTHP support for
>>>>>>>>>>>>>> anonymous shmem"), we can configure different policies through
>>>>>>>>>>>>>> the multi-size THP sysfs interface for anonymous shmem. But
>>>>>>>>>>>>>> currently "THPeligible" indicates only whether the mapping is
>>>>>>>>>>>>>> eligible for allocating THP-pages as well as the THP is PMD
>>>>>>>>>>>>>> mappable or not for anonymous shmem, we need to support semantics
>>>>>>>>>>>>>> for mTHP with anonymous shmem similar to those for mTHP with
>>>>>>>>>>>>>> anonymous memory.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Bang Li <libang.li@...group.com>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>        fs/proc/task_mmu.c      | 10 +++++++---
>>>>>>>>>>>>>>        include/linux/huge_mm.h | 11 +++++++++++
>>>>>>>>>>>>>>        mm/shmem.c              |  9 +--------
>>>>>>>>>>>>>>        3 files changed, 19 insertions(+), 11 deletions(-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>>>>>>>>>>>>>> index 93fb2c61b154..09b5db356886 100644
>>>>>>>>>>>>>> --- a/fs/proc/task_mmu.c
>>>>>>>>>>>>>> +++ b/fs/proc/task_mmu.c
>>>>>>>>>>>>>> @@ -870,6 +870,7 @@ static int show_smap(struct seq_file *m, void *v)
>>>>>>>>>>>>>>        {
>>>>>>>>>>>>>>            struct vm_area_struct *vma = v;
>>>>>>>>>>>>>>            struct mem_size_stats mss = {};
>>>>>>>>>>>>>> +    bool thp_eligible;
>>>>>>>>>>>>>>              smap_gather_stats(vma, &mss, 0);
>>>>>>>>>>>>>>        @@ -882,9 +883,12 @@ static int show_smap(struct seq_file *m, void
>>>>>>>>>>>>>> *v)
>>>>>>>>>>>>>>              __show_smap(m, &mss, false);
>>>>>>>>>>>>>>        -    seq_printf(m, "THPeligible:    %8u\n",
>>>>>>>>>>>>>> -           !!thp_vma_allowable_orders(vma, vma->vm_flags,
>>>>>>>>>>>>>> -               TVA_SMAPS | TVA_ENFORCE_SYSFS, THP_ORDERS_ALL));
>>>>>>>>>>>>>> +    thp_eligible = !!thp_vma_allowable_orders(vma, vma->vm_flags,
>>>>>>>>>>>>>> +                        TVA_SMAPS | TVA_ENFORCE_SYSFS, THP_ORDERS_ALL);
>>>>>>>>>>>>>> +    if (vma_is_anon_shmem(vma))
>>>>>>>>>>>>>> +        thp_eligible =
>>>>>>>>>>>>>> !!shmem_allowable_huge_orders(file_inode(vma->vm_file),
>>>>>>>>>>>>>> +                            vma, vma->vm_pgoff, thp_eligible);
>>>>>>>>>>>>>
>>>>>>>>>>>>> Afraid I haven't been following the shmem mTHP support work as much as I
>>>>>>>>>>>>> would
>>>>>>>>>>>>> have liked, but is there a reason why we need a separate function for
>>>>>>>>>>>>> shmem?
>>>>>>>>>>>>
>>>>>>>>>>>> Since shmem_allowable_huge_orders() only uses shmem specific logic to
>>>>>>>>>>>> determine
>>>>>>>>>>>> if huge orders are allowable, there is no need to complicate the
>>>>>>>>>>>> thp_vma_allowable_orders() function by adding more shmem related logic,
>>>>>>>>>>>> making
>>>>>>>>>>>> it more bloated. In my view, providing a dedicated helper
>>>>>>>>>>>> shmem_allowable_huge_orders(), specifically for shmem, simplifies the logic.
>>>>>>>>>>>
>>>>>>>>>>> My point was really that a single interface (thp_vma_allowable_orders)
>>>>>>>>>>> should be
>>>>>>>>>>> used to get this information. I have no strong opinon on how the
>>>>>>>>>>> implementation
>>>>>>>>>>> of that interface looks. What you suggest below seems perfectly reasonable
>>>>>>>>>>> to me.
>>>>>>>>>>
>>>>>>>>>> Right. thp_vma_allowable_orders() might require some care as discussed in
>>>>>>>>>> other
>>>>>>>>>> context (cleanly separate dax and shmem handling/orders). But that would be
>>>>>>>>>> follow-up cleanups.
>>>>>>>>>
>>>>>>>>> Are you planning to do that, or do you want me to send a patch?
>>>>>>>>
>>>>>>>> I'm planning on looking into some details, especially the interaction with large
>>>>>>>> folios in the pagecache. I'll let you know once I have a better idea what
>>>>>>>> actually should be done :)
>>>>>>>
>>>>>>> OK great - I'll scrub it from my todo list... really getting things done today :)
>>>>>>
>>>>>> Resolved the khugepaged thiny already? :P
>>>>>>
>>>>>> [khugepaged not active when only enabling the sub-size via the 2M folder IIRC]
>>>>>
>>>>> Hmm... baby brain?
>>>>
>>>> :)
>>>>
>>>> I think I only mentioned it in a private mail at some point.
>>>>
>>>>>
>>>>> Sorry about that. I've been a bit useless lately. For some reason it wasn't on
>>>>> my list, but its there now. Will prioritise it, because I agree it's not good.
>>>>
>>>>
>>>> IIRC, if you do
>>>>
>>>> echo never > /sys/kernel/mm/transparent_hugepage/enabled
>>>> echo always > /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
>>>>
>>>> khugepaged will not get activated.
>>>
>>> khugepaged is controlled by the top level knob.
>>
>> What do you mean by "top level knob"? I assume
>> /sys/kernel/mm/transparent_hugepage/enabled ?
> 
> Yes.
> 
>>
>> If so, that's not really a thing in its own right; its just the legacy PMD-size
>> THP control, and we only take any notice of it if a per-size control is set to
>> "inherit". So if we have:
>>
>> # echo always > /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
>>
>> Then by design, /sys/kernel/mm/transparent_hugepage/enabled should be ignored.
>>
>>> But the above setting
>>> sounds confusing, can we disable the top level knob, but enable it on
>>> a per-order basis? TBH, it sounds weird and doesn't make too much
>>> sense to me.
>>
>> Well that's the design and that's how its documented. It's done this way for
>> back-compat. All controls are now per-size. But at boot, we default all per-size
>> controls to "never" except for the PMD-sized control, which is defaulted to
>> "inherit". That way, an unenlightened user-space can still control PMD-sized THP
>> via the legacy (top-level) control. But enlightened apps can directly control
>> per-size.
> 
> OK, good to know.
> 
>>
>> I'm not sure how your way would work, because you would have 2 controls
>> competing to do the same thing?
> 
> I don't see how they compete if they are 2-level knobs. And I failed
> to see how it achieved back-compat. For example, memcached reads
> /sys/kernel/mm/transparent_hugepage/enabled to determine whether it
> should manage memory in huge page (2M) granularity. If the setting is
> set to :
> 
> # echo never > /sys/kernel/mm/transparent_hugepage/enabled
> # echo always > /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
> 
> memcached will manage memory in 4K granularity, but 2M THP is actually
> enabled unless memcached checks the per-order knobs.

And you can still do it the old way and keep it all working with 
existing software (compat mode as default).

It's just another option and some software might need updates to benefit 
from it (just like if you would enable other folio sizes).

You can happily do

echo always > /sys/kernel/mm/transparent_hugepage/enabled
echo inherit > /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled

It's an admin choice.

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ