[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b11d6006-1efb-4329-baa0-75799935e019@linux.alibaba.com>
Date: Sat, 13 Jul 2024 12:01:17 +0800
From: Baolin Wang <baolin.wang@...ux.alibaba.com>
To: David Hildenbrand <david@...hat.com>, Gavin Shan <gshan@...hat.com>,
Matthew Wilcox <willy@...radead.org>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
akpm@...ux-foundation.org, william.kucharski@...cle.com,
ryan.roberts@....com, shan.gavin@...il.com
Subject: Re: [PATCH] mm/huge_memory: Avoid PMD-size page cache if needed
On 2024/7/13 09:03, David Hildenbrand wrote:
> On 12.07.24 07:39, Gavin Shan wrote:
>> On 7/12/24 7:03 AM, David Hildenbrand wrote:
>>> On 11.07.24 22:46, Matthew Wilcox wrote:
>>>> On Thu, Jul 11, 2024 at 08:48:40PM +1000, Gavin Shan wrote:
>>>>> +++ b/mm/huge_memory.c
>>>>> @@ -136,7 +136,8 @@ unsigned long __thp_vma_allowable_orders(struct
>>>>> vm_area_struct *vma,
>>>>> while (orders) {
>>>>> addr = vma->vm_end - (PAGE_SIZE << order);
>>>>> - if (thp_vma_suitable_order(vma, addr, order))
>>>>> + if (!(vma->vm_file && order > MAX_PAGECACHE_ORDER) &&
>>>>> + thp_vma_suitable_order(vma, addr, order))
>>>>> break;
>>>>
>>>> Why does 'orders' even contain potential orders that are larger than
>>>> MAX_PAGECACHE_ORDER?
>>>>
>>>> We do this at the top:
>>>>
>>>> orders &= vma_is_anonymous(vma) ?
>>>> THP_ORDERS_ALL_ANON : THP_ORDERS_ALL_FILE;
>>>>
>>>> include/linux/huge_mm.h:#define THP_ORDERS_ALL_FILE
>>>> (BIT(PMD_ORDER) | BIT(PUD_ORDER))
>>>>
>>>> ... and that seems very wrong. We support all kinds of orders for
>>>> files, not just PMD order. We don't support PUD order at all.
>>>>
>>>> What the hell is going on here?
>>>
>>> yes, that's just absolutely confusing. I mentioned it to Ryan lately
>>> that we should clean that up (I wanted to look into that, but am
>>> happy if someone else can help).
>>>
>>> There should likely be different defines for
>>>
>>> DAX (PMD|PUD)
>>>
>>> SHMEM (PMD) -- but soon more. Not sure if we want separate ANON_SHMEM
>>> for the time being. Hm. But shmem is already handles separately, so
>>> maybe we can just ignore shmem here.
>>>
>>> PAGECACHE (1 .. MAX_PAGECACHE_ORDER)
>>>
>>> ? But it's still unclear to me.
>>>
>>> At least DAX must stay special I think, and PAGECACHE should be
>>> capped at MAX_PAGECACHE_ORDER.
>>>
>>
>> David, I can help to clean it up. Could you please help to confirm the
>> following
>
> Thanks!
>
>> changes are exactly what you're suggesting? Hopefully, there are
>> nothing I've missed.
>> The original issue can be fixed by the changes. With the changes
>> applied, madvise(MADV_COLLAPSE)
>> returns with errno -22 in the test program.
>>
>> The fix tag needs to adjusted either.
>>
>> Fixes: 3485b88390b0 ("mm: thp: introduce multi-size THP sysfs interface")
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 2aa986a5cd1b..45909efb0ef0 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -74,7 +74,12 @@ extern struct kobj_attribute shmem_enabled_attr;
>> /*
>> * Mask of all large folio orders supported for file THP.
>> */
>> -#define THP_ORDERS_ALL_FILE (BIT(PMD_ORDER) | BIT(PUD_ORDER))
>
> DAX doesn't have any MAX_PAGECACHE_ORDER restrictions (like hugetlb). So
> this should be
>
> /*
> * FSDAX never splits folios, so the MAX_PAGECACHE_ORDER limit does not
> * apply here.
> */
> THP_ORDERS_ALL_FILE_DAX ((BIT(PMD_ORDER) | BIT(PUD_ORDER))
>
> Something like that
>
>> +#define THP_ORDERS_ALL_FILE_DAX \
>> + ((BIT(PMD_ORDER) | BIT(PUD_ORDER)) & (BIT(MAX_PAGECACHE_ORDER
>> + 1) - 1))
>> +#define THP_ORDERS_ALL_FILE_DEFAULT \
>> + ((BIT(MAX_PAGECACHE_ORDER + 1) - 1) & ~BIT(0))
>> +#define THP_ORDERS_ALL_FILE \
>> + (THP_ORDERS_ALL_FILE_DAX | THP_ORDERS_ALL_FILE_DEFAULT)
>
> Maybe we can get rid of THP_ORDERS_ALL_FILE (to prevent abuse) and fixup
> THP_ORDERS_ALL instead.
>
>> /*
>> * Mask of all large folio orders supported for THP.
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 2120f7478e55..4690f33afaa6 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -88,9 +88,17 @@ unsigned long __thp_vma_allowable_orders(struct
>> vm_area_struct *vma,
>> bool smaps = tva_flags & TVA_SMAPS;
>> bool in_pf = tva_flags & TVA_IN_PF;
>> bool enforce_sysfs = tva_flags & TVA_ENFORCE_SYSFS;
>> + unsigned long supported_orders;
>> +
>> /* Check the intersection of requested and supported orders. */
>> - orders &= vma_is_anonymous(vma) ?
>> - THP_ORDERS_ALL_ANON : THP_ORDERS_ALL_FILE;
>> + if (vma_is_anonymous(vma))
>> + supported_orders = THP_ORDERS_ALL_ANON;
>> + else if (vma_is_dax(vma))
>> + supported_orders = THP_ORDERS_ALL_FILE_DAX;
>> + else
>> + supported_orders = THP_ORDERS_ALL_FILE_DEFAULT;
>
> This is what I had in mind.
>
> But, do we have to special-case shmem as well or will that be handled
> correctly?
For anonymous shmem, it is now same as anonymous THP, which can utilize
THP_ORDERS_ALL_ANON. For tmpfs, we currently only support PMD-sized THP
(will support more larger orders in the future). Therefore, I think we
can reuse THP_ORDERS_ALL_ANON for shmem now:
if (vma_is_anonymous(vma) || shmem_file(vma->vm_file)))
supported_orders = THP_ORDERS_ALL_ANON;
......
Powered by blists - more mailing lists