lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <df83a218-e2e5-496e-999a-e446a7d0b383@redhat.com>
Date: Sat, 13 Jul 2024 03:03:10 +0200
From: David Hildenbrand <david@...hat.com>
To: Gavin Shan <gshan@...hat.com>, Matthew Wilcox <willy@...radead.org>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
 akpm@...ux-foundation.org, william.kucharski@...cle.com,
 ryan.roberts@....com, shan.gavin@...il.com
Subject: Re: [PATCH] mm/huge_memory: Avoid PMD-size page cache if needed

On 12.07.24 07:39, Gavin Shan wrote:
> On 7/12/24 7:03 AM, David Hildenbrand wrote:
>> On 11.07.24 22:46, Matthew Wilcox wrote:
>>> On Thu, Jul 11, 2024 at 08:48:40PM +1000, Gavin Shan wrote:
>>>> +++ b/mm/huge_memory.c
>>>> @@ -136,7 +136,8 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
>>>>            while (orders) {
>>>>                addr = vma->vm_end - (PAGE_SIZE << order);
>>>> -            if (thp_vma_suitable_order(vma, addr, order))
>>>> +            if (!(vma->vm_file && order > MAX_PAGECACHE_ORDER) &&
>>>> +                thp_vma_suitable_order(vma, addr, order))
>>>>                    break;
>>>
>>> Why does 'orders' even contain potential orders that are larger than
>>> MAX_PAGECACHE_ORDER?
>>>
>>> We do this at the top:
>>>
>>>           orders &= vma_is_anonymous(vma) ?
>>>                           THP_ORDERS_ALL_ANON : THP_ORDERS_ALL_FILE;
>>>
>>> include/linux/huge_mm.h:#define THP_ORDERS_ALL_FILE     (BIT(PMD_ORDER) | BIT(PUD_ORDER))
>>>
>>> ... and that seems very wrong.  We support all kinds of orders for
>>> files, not just PMD order.  We don't support PUD order at all.
>>>
>>> What the hell is going on here?
>>
>> yes, that's just absolutely confusing. I mentioned it to Ryan lately that we should clean that up (I wanted to look into that, but am happy if someone else can help).
>>
>> There should likely be different defines for
>>
>> DAX (PMD|PUD)
>>
>> SHMEM (PMD) -- but soon more. Not sure if we want separate ANON_SHMEM for the time being. Hm. But shmem is already handles separately, so maybe we can just ignore shmem here.
>>
>> PAGECACHE (1 .. MAX_PAGECACHE_ORDER)
>>
>> ? But it's still unclear to me.
>>
>> At least DAX must stay special I think, and PAGECACHE should be capped at MAX_PAGECACHE_ORDER.
>>
> 
> David, I can help to clean it up. Could you please help to confirm the following

Thanks!

> changes are exactly what you're suggesting? Hopefully, there are nothing I've missed.
> The original issue can be fixed by the changes. With the changes applied, madvise(MADV_COLLAPSE)
> returns with errno -22 in the test program.
> 
> The fix tag needs to adjusted either.
> 
> Fixes: 3485b88390b0 ("mm: thp: introduce multi-size THP sysfs interface")
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 2aa986a5cd1b..45909efb0ef0 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -74,7 +74,12 @@ extern struct kobj_attribute shmem_enabled_attr;
>    /*
>     * Mask of all large folio orders supported for file THP.
>     */
> -#define THP_ORDERS_ALL_FILE    (BIT(PMD_ORDER) | BIT(PUD_ORDER))

DAX doesn't have any MAX_PAGECACHE_ORDER restrictions (like hugetlb). So 
this should be

/*
  * FSDAX never splits folios, so the MAX_PAGECACHE_ORDER limit does not
  * apply here.
  */
THP_ORDERS_ALL_FILE_DAX ((BIT(PMD_ORDER) | BIT(PUD_ORDER))

Something like that

> +#define THP_ORDERS_ALL_FILE_DAX                \
> +       ((BIT(PMD_ORDER) | BIT(PUD_ORDER)) & (BIT(MAX_PAGECACHE_ORDER + 1) - 1))
> +#define THP_ORDERS_ALL_FILE_DEFAULT    \
> +       ((BIT(MAX_PAGECACHE_ORDER + 1) - 1) & ~BIT(0))
> +#define THP_ORDERS_ALL_FILE            \
> +       (THP_ORDERS_ALL_FILE_DAX | THP_ORDERS_ALL_FILE_DEFAULT)

Maybe we can get rid of THP_ORDERS_ALL_FILE (to prevent abuse) and fixup
THP_ORDERS_ALL instead.

>    
>    /*
>     * Mask of all large folio orders supported for THP.
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 2120f7478e55..4690f33afaa6 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -88,9 +88,17 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
>           bool smaps = tva_flags & TVA_SMAPS;
>           bool in_pf = tva_flags & TVA_IN_PF;
>           bool enforce_sysfs = tva_flags & TVA_ENFORCE_SYSFS;
> +       unsigned long supported_orders;
> +
>           /* Check the intersection of requested and supported orders. */
> -       orders &= vma_is_anonymous(vma) ?
> -                       THP_ORDERS_ALL_ANON : THP_ORDERS_ALL_FILE;
> +       if (vma_is_anonymous(vma))
> +               supported_orders = THP_ORDERS_ALL_ANON;
> +       else if (vma_is_dax(vma))
> +               supported_orders = THP_ORDERS_ALL_FILE_DAX;
> +       else
> +               supported_orders = THP_ORDERS_ALL_FILE_DEFAULT;

This is what I had in mind.

But, do we have to special-case shmem as well or will that be handled 
correctly?

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ