[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d727ceb1-8396-4303-b8c1-dbaf75f760fc@arm.com>
Date: Wed, 17 Jul 2024 11:48:43 +0100
From: Ryan Roberts <ryan.roberts@....com>
To: David Hildenbrand <david@...hat.com>, Lance Yang <ioworker0@...il.com>,
Baolin Wang <baolin.wang@...ux.alibaba.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Hugh Dickins
<hughd@...gle.com>, Jonathan Corbet <corbet@....net>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>,
Barry Song <baohua@...nel.org>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Subject: Re: [PATCH v1 2/2] mm: mTHP stats for pagecache folio allocations
On 17/07/2024 11:25, David Hildenbrand wrote:
> On 17.07.24 12:18, Ryan Roberts wrote:
>> On 17/07/2024 11:03, David Hildenbrand wrote:
>>>>>>>>
>>>>>>>> But today, controls and stats are exposed for:
>>>>>>>>
>>>>>>>> anon:
>>>>>>>> min order: 2
>>>>>>>> max order: PMD_ORDER
>>>>>>>> anon-shmem:
>>>>>>>> min order: 2
>>>>>>>> max order: PMD_ORDER
>>>>>>>> tmpfs-shmem:
>>>>>>>> min order: PMD_ORDER
>>>>>>>> max order: PMD_ORDER
>>>>>>>> file:
>>>>>>>> min order: Nothing yet (this patch proposes 1)
>>>>>>>> max order: Nothing yet (this patch proposes MAX_PAGECACHE_ORDER)
>>>>>>>>
>>>>>>>> So I think there is definitely a bug for shmem where the minimum order
>>>>>>>> control
>>>>>>>> should be order-1 but its currently order-2.
>>>>>>>
>>>>>>> Maybe, did not play with that yet. Likely order-1 will work. (although
>>>>>>> probably
>>>>>>> of questionable use :) )
>>>>>>
>>>>>> You might have to expand on why its of "questionable use". I'd assume it has
>>>>>> the
>>>>>> same amount of value as using order-1 for regular page cache pages? i.e. half
>>>>>> the number of objects to manage for the same amount of memory.
>>>>>
>>>>> order-1 was recently added for the pagecache to get some device setups running
>>>>> (IIRC, where we cannot use order-0, because device blocksize > PAGE_SIZE).
>>>>>
>>>>> You might be right about "half the number of objects", but likely just
>>>>> going for
>>>>> order-2, order-3, order-4 ... for shmem might be even better. And simply
>>>>> falling
>>>>> back to order-0 when you cannot get the larger orders.
>>>>
>>>> Sure, but then you're into the territory of baking in policy. Remember that
>>>> originally I was only interested in 64K but the concensus was to expose all the
>>>> sizes. Same argument applies to 8K; expose it and let others decide policy.
>>>
>>> I don't disagree. The point I'm trying to make is that there was so far there
>>> was no strong evidence that it is really required. Support for the pagecache had
>>> a different motivation for these special devices.
>>
>> Sure, but there was no clear need for anon mTHP orders other than order-2 and
>> order-4 (for arm64's HPA and contpte, respectively), but we still chose to
>> expose all the others.
>
> order-2 and order-3 are valuable for AMD EPYC (depending on the generation 16
> vs. 32 KiB coalescing).
>
> But in general, at least for me, it's easier to argue why larger orders make
> more sense than very tiny ones.
>
> For example, order-5 can be mapped using cont-pte as well and you get roughly
> half the memory allocation+page fault overhead compared to order-4.
>
> order-1 ? No TLB optimization at least on any current HW I know.
I believe there are some variants of HPA that coalesce "up to" 4 pages, meaning
2 pages (or 3 or 4) could be coalesced into a single TLB entry. But I'm not 100%
sure on that.
>
> But I believe we're in violent agreement here :)
>
Powered by blists - more mailing lists