[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5001e231-795f-4d8c-bd9d-16096e428aef@arm.com>
Date: Tue, 31 Oct 2023 13:13:10 +0000
From: Ryan Roberts <ryan.roberts@....com>
To: David Hildenbrand <david@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Matthew Wilcox <willy@...radead.org>,
Yin Fengwei <fengwei.yin@...el.com>,
Yu Zhao <yuzhao@...gle.com>,
Catalin Marinas <catalin.marinas@....com>,
Anshuman Khandual <anshuman.khandual@....com>,
Yang Shi <shy828301@...il.com>,
"Huang, Ying" <ying.huang@...el.com>, Zi Yan <ziy@...dia.com>,
Luis Chamberlain <mcgrof@...nel.org>,
Itaru Kitayama <itaru.kitayama@...il.com>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
John Hubbard <jhubbard@...dia.com>,
David Rientjes <rientjes@...gle.com>,
Vlastimil Babka <vbabka@...e.cz>,
Hugh Dickins <hughd@...gle.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH v6 0/9] variable-order, large folios for anonymous memory
On 31/10/2023 12:03, David Hildenbrand wrote:
> On 31.10.23 12:55, Ryan Roberts wrote:
>> On 31/10/2023 11:50, Ryan Roberts wrote:
>>> On 06/10/2023 21:06, David Hildenbrand wrote:
>>> [...]
>>>>
>>>> Change 2: sysfs interface.
>>>>
>>>> If we call it THP, it shall go under "/sys/kernel/mm/transparent_hugepage/", I
>>>> agree.
>>>>
>>>> What we expose there and how, is TBD. Again, not a friend of "orders" and
>>>> bitmaps at all. We can do better if we want to go down that path.
>>>>
>>>> Maybe we should take a look at hugetlb, and how they added support for multiple
>>>> sizes. What *might* make sense could be (depending on which values we actually
>>>> support!)
>>>>
>>>>
>>>> /sys/kernel/mm/transparent_hugepage/hugepages-64kB/
>>>> /sys/kernel/mm/transparent_hugepage/hugepages-128kB/
>>>> /sys/kernel/mm/transparent_hugepage/hugepages-256kB/
>>>> /sys/kernel/mm/transparent_hugepage/hugepages-512kB/
>>>> /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/
>>>> /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/
>>>>
>>>> Each one would contain an "enabled" and "defrag" file. We want something
>>>> minimal
>>>> first? Start with the "enabled" option.
>>>>
>>>>
>>>> enabled: always [global] madvise never
>>>>
>>>> Initially, we would set it for PMD-sized THP to "global" and for everything
>>>> else
>>>> to "never".
>>>
>>> Hi David,
>>>
>>> I've just started coding this, and it occurs to me that I might need a small
>>> clarification here; the existing global "enabled" control is used to drive
>>> decisions for both anonymous memory and (non-shmem) file-backed memory. But the
>>> proposed new per-size "enabled" is implicitly only controlling anon memory (for
>>> now).
>>>
>>> 1) Is this potentially confusing for the user? Should we rename the per-size
>>> controls to "anon_enabled"? Or is it preferable to jsut keep it vague for now so
>>> we can reuse the same control for file-backed memory in future?
>>>
>>> 2) The global control will continue to drive the file-backed memory decision
>>> (for now), even when hugepages-2048kB/enabled != "global"; agreed?
>>>
>>> Thanks,
>>> Ryan
>>>
>>
>> Also, an implementation question:
>>
>> hugepage_vma_check() doesn't currently care whether enabled="never" for DAX VMAs
>> (although it does honour MADV_NOHUGEPAGE and the prctl); It will return true
>> regardless. Is that by design? It couldn't fathom any reasoning from the
>> commit log:
>
> The whole DAX "hugepage" and THP mixup is just plain confusing. We're simply
> using PUD/PMD mappings of DAX memory, and PMD/PTE- remap when required (VMA
> split I assume, COW).
>
> It doesn't result in any memory waste, so who really cares how it's mapped?
> Apparently we want individual processes to just disable PMD/PUD mappings of DAX
> using the prctl and madvise. Maybe there are good reasons.
>
> Looks like a design decision, probably some legacy leftovers.
OK, I'll ensure I keep this behaviour.
Thanks!
>
Powered by blists - more mailing lists