linux-kernel - Re: [PATCH] mm/huge_memory: Avoid PMD-size page cache if needed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b53fa748-9e19-48f9-8ccf-8d2fe408d35d@linux.alibaba.com>
Date: Sat, 13 Jul 2024 20:57:59 +0800
From: Baolin Wang <baolin.wang@...ux.alibaba.com>
To: David Hildenbrand <david@...hat.com>, Gavin Shan <gshan@...hat.com>,
 Matthew Wilcox <willy@...radead.org>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
 akpm@...ux-foundation.org, william.kucharski@...cle.com,
 ryan.roberts@....com, shan.gavin@...il.com
Subject: Re: [PATCH] mm/huge_memory: Avoid PMD-size page cache if needed



On 2024/7/13 12:17, David Hildenbrand wrote:
> On 13.07.24 06:01, Baolin Wang wrote:
>>
>>
>> On 2024/7/13 09:03, David Hildenbrand wrote:
>>> On 12.07.24 07:39, Gavin Shan wrote:
>>>> On 7/12/24 7:03 AM, David Hildenbrand wrote:
>>>>> On 11.07.24 22:46, Matthew Wilcox wrote:
>>>>>> On Thu, Jul 11, 2024 at 08:48:40PM +1000, Gavin Shan wrote:
>>>>>>> +++ b/mm/huge_memory.c
>>>>>>> @@ -136,7 +136,8 @@ unsigned long __thp_vma_allowable_orders(struct
>>>>>>> vm_area_struct *vma,
>>>>>>>             while (orders) {
>>>>>>>                 addr = vma->vm_end - (PAGE_SIZE << order);
>>>>>>> -            if (thp_vma_suitable_order(vma, addr, order))
>>>>>>> +            if (!(vma->vm_file && order > MAX_PAGECACHE_ORDER) &&
>>>>>>> +                thp_vma_suitable_order(vma, addr, order))
>>>>>>>                     break;
>>>>>>
>>>>>> Why does 'orders' even contain potential orders that are larger than
>>>>>> MAX_PAGECACHE_ORDER?
>>>>>>
>>>>>> We do this at the top:
>>>>>>
>>>>>>            orders &= vma_is_anonymous(vma) ?
>>>>>>                            THP_ORDERS_ALL_ANON : THP_ORDERS_ALL_FILE;
>>>>>>
>>>>>> include/linux/huge_mm.h:#define THP_ORDERS_ALL_FILE
>>>>>> (BIT(PMD_ORDER) | BIT(PUD_ORDER))
>>>>>>
>>>>>> ... and that seems very wrong.  We support all kinds of orders for
>>>>>> files, not just PMD order.  We don't support PUD order at all.
>>>>>>
>>>>>> What the hell is going on here?
>>>>>
>>>>> yes, that's just absolutely confusing. I mentioned it to Ryan lately
>>>>> that we should clean that up (I wanted to look into that, but am
>>>>> happy if someone else can help).
>>>>>
>>>>> There should likely be different defines for
>>>>>
>>>>> DAX (PMD|PUD)
>>>>>
>>>>> SHMEM (PMD) -- but soon more. Not sure if we want separate ANON_SHMEM
>>>>> for the time being. Hm. But shmem is already handles separately, so
>>>>> maybe we can just ignore shmem here.
>>>>>
>>>>> PAGECACHE (1 .. MAX_PAGECACHE_ORDER)
>>>>>
>>>>> ? But it's still unclear to me.
>>>>>
>>>>> At least DAX must stay special I think, and PAGECACHE should be
>>>>> capped at MAX_PAGECACHE_ORDER.
>>>>>
>>>>
>>>> David, I can help to clean it up. Could you please help to confirm the
>>>> following
>>>
>>> Thanks!
>>>
>>>> changes are exactly what you're suggesting? Hopefully, there are
>>>> nothing I've missed.
>>>> The original issue can be fixed by the changes. With the changes
>>>> applied, madvise(MADV_COLLAPSE)
>>>> returns with errno -22 in the test program.
>>>>
>>>> The fix tag needs to adjusted either.
>>>>
>>>> Fixes: 3485b88390b0 ("mm: thp: introduce multi-size THP sysfs 
>>>> interface")
>>>>
>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>> index 2aa986a5cd1b..45909efb0ef0 100644
>>>> --- a/include/linux/huge_mm.h
>>>> +++ b/include/linux/huge_mm.h
>>>> @@ -74,7 +74,12 @@ extern struct kobj_attribute shmem_enabled_attr;
>>>>     /*
>>>>      * Mask of all large folio orders supported for file THP.
>>>>      */
>>>> -#define THP_ORDERS_ALL_FILE    (BIT(PMD_ORDER) | BIT(PUD_ORDER))
>>>
>>> DAX doesn't have any MAX_PAGECACHE_ORDER restrictions (like hugetlb). So
>>> this should be
>>>
>>> /*
>>>    * FSDAX never splits folios, so the MAX_PAGECACHE_ORDER limit does 
>>> not
>>>    * apply here.
>>>    */
>>> THP_ORDERS_ALL_FILE_DAX ((BIT(PMD_ORDER) | BIT(PUD_ORDER))
>>>
>>> Something like that
>>>
>>>> +#define THP_ORDERS_ALL_FILE_DAX                \
>>>> +       ((BIT(PMD_ORDER) | BIT(PUD_ORDER)) & (BIT(MAX_PAGECACHE_ORDER
>>>> + 1) - 1))
>>>> +#define THP_ORDERS_ALL_FILE_DEFAULT    \
>>>> +       ((BIT(MAX_PAGECACHE_ORDER + 1) - 1) & ~BIT(0))
>>>> +#define THP_ORDERS_ALL_FILE            \
>>>> +       (THP_ORDERS_ALL_FILE_DAX | THP_ORDERS_ALL_FILE_DEFAULT)
>>>
>>> Maybe we can get rid of THP_ORDERS_ALL_FILE (to prevent abuse) and fixup
>>> THP_ORDERS_ALL instead.
>>>
>>>>     /*
>>>>      * Mask of all large folio orders supported for THP.
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 2120f7478e55..4690f33afaa6 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -88,9 +88,17 @@ unsigned long __thp_vma_allowable_orders(struct
>>>> vm_area_struct *vma,
>>>>            bool smaps = tva_flags & TVA_SMAPS;
>>>>            bool in_pf = tva_flags & TVA_IN_PF;
>>>>            bool enforce_sysfs = tva_flags & TVA_ENFORCE_SYSFS;
>>>> +       unsigned long supported_orders;
>>>> +
>>>>            /* Check the intersection of requested and supported 
>>>> orders. */
>>>> -       orders &= vma_is_anonymous(vma) ?
>>>> -                       THP_ORDERS_ALL_ANON : THP_ORDERS_ALL_FILE;
>>>> +       if (vma_is_anonymous(vma))
>>>> +               supported_orders = THP_ORDERS_ALL_ANON;
>>>> +       else if (vma_is_dax(vma))
>>>> +               supported_orders = THP_ORDERS_ALL_FILE_DAX;
>>>> +       else
>>>> +               supported_orders = THP_ORDERS_ALL_FILE_DEFAULT;
>>>
>>> This is what I had in mind.
>>>
>>> But, do we have to special-case shmem as well or will that be handled
>>> correctly?
>>
>> For anonymous shmem, it is now same as anonymous THP, which can utilize
>> THP_ORDERS_ALL_ANON.
>> For tmpfs, we currently only support PMD-sized THP
>> (will support more larger orders in the future). Therefore, I think we
>> can reuse THP_ORDERS_ALL_ANON for shmem now:
>>
>> if (vma_is_anonymous(vma) || shmem_file(vma->vm_file)))
>>     supported_orders = THP_ORDERS_ALL_ANON;
>> ......
>>
> 
> 
> It should be THP_ORDERS_ALL_FILE_DEFAULT (MAX_PAGECACHE_ORDER imitation 
> applies).

Yes, indeed, I missed MAX_PAGECACHE_ORDER limitation.