[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8247162b-a9dc-496e-83ee-504f74378e8e@redhat.com>
Date: Thu, 31 Oct 2024 11:46:20 +0100
From: David Hildenbrand <david@...hat.com>
To: Baolin Wang <baolin.wang@...ux.alibaba.com>, Daniel Gomez <d@...ces.com>,
Daniel Gomez <da.gomez@...sung.com>,
"Kirill A. Shutemov" <kirill@...temov.name>
Cc: Matthew Wilcox <willy@...radead.org>, akpm@...ux-foundation.org,
hughd@...gle.com, wangkefeng.wang@...wei.com, 21cnbao@...il.com,
ryan.roberts@....com, ioworker0@...il.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>
Subject: Re: [RFC PATCH v3 0/4] Support large folios for tmpfs
>>> I am still worried about adding a new kconfig option, which might
>>> complicate the tmpfs controls further.
>>
>> Why exactly?
>
> There will be more options to control huge pages allocation for tmpfs,
> which may confuse users and make life harder? Yes, we can add some
> documentation, but I'm still a bit cautious about this.
If it's just "changing the default from "huge=never" to "huge=X" I don't
see a big problem here. Again, we already do that for anon THPs.
If we make more behavior depend on than (which I don't think we should
be doing), I agree that it would be more controversial.
[..]
>>>
>>>> That should probably do as a first shot; I assume people will want more
>>>> control over which size to use, especially during page faults, but that
>>>> can likely be added later.
>>
>> I know, it puts you in a bad position because there are different
>> opinions floating around. But let's try to find something that is
>> reasonable and still acceptable. And let's hope that Hugh will voice an
>> opinion :D
>
> Yes, I am also waiting to see if Hugh has any inputs :)
We keep saying that ... I have to find a way to summon him :)
>
>>> After some discussions, I think the first step is to achieve two goals:
>>> 1) Try to make tmpfs use large folios like other file systems, that
>>> means we should avoid adding more complex control options (per Matthew).
>>> 2) Still need maintain compatibility with the 'huge=' mount option (per
>>> Kirill), as I also remembered we have customers who use
>>> 'huge=within_size' to allocate THPs for better performance.
>>
>>>
>>> Based on these considerations, my first step is to neither add a new
>>> 'huge=' option parameter nor introduce the mTHP interfaces control for
>>> tmpfs, but rather to change the default huge allocation behavior for
>>> tmpfs. That is to say, when 'huge=' option is not configured, we will
>>> allow the huge folios allocation based on the write size. As a result,
>>> the behavior of huge pages for tmpfs will change as follows:
>> > > no 'huge=' set: can allocate any size huge folios based on write size
>> > huge=never: no any size huge folios> huge=always: only PMD sized THP
>> allocation as before
>> > huge=fadvise: like "always" but only with fadvise/madvise>
>> huge=within_size: like "fadvise" but respect i_size
>>
>> I don't like that:
>>
>> (a) there is no way to explicitly enable/name that new behavior.
>
> But this is similar to other file systems that enable large folios
> (setting mapping_set_large_folios()), and I haven't seen any other file
> systems supporting large folios requiring a new Kconfig. Maybe tmpfs is
> a bit special?
I'm afraid I don't have the energy to explain once more why I think
tmpfs is not just like any other file system in some cases.
And distributions are rather careful when it comes to something like
this ...
>
> If we all agree that tmpfs is a bit special when using huge pages, then
> fine, a Kconfig option might be needed.
>
>> (b) "always" etc. are only concerned about PMDs.
>
> Yes, currently maintain the same semantics as before, in case users
> still expect THPs.
Again, I don't think that is a reasonable approach to make PMD-sized
ones special here. It will all get seriously confusing and inconsistent.
THPs are opportunistic after all, and page fault behavior will remain
unchanged (PMD-sized) for now. And even if we support other sizes during
page faults, we'd like start with the largest size (PMD-size) first, and
it likely might just all work better than before.
Happy to learn where this really makes a difference.
Of course, if you change the default behavior (which you are planning),
it's ... a changed default.
If there are reasons to have more tunables regarding the sizes to use,
then it should not be limited to PMD-size.
> >> So again, I suggest:
>>
>> huge=never: No THPs of any size
>> huge=always: THPs of any size
>> huge=fadvise: like "always" but only with fadvise/madvise
>> huge=within_size: like "fadvise" but respect i_size
>>
>> "huge=" default depends on a Kconfig option.
>>
>> With that we:
>>
>> (1) Maximize the cases where we will use large folios of any sizes
>> (which Willy cares about).
>> (2) Have a way to disable them completely (which I care about).
>> (3) Allow distros to keep the default unchanged.
>>
>> Likely, for now we will only try allocating PMD-sized THPs during page
>> faults, and allocate different sizes only during write(). So the effect
>> for many use cases (VMs, DBs) that primarily mmap() tmpfs files will be
>> completely unchanged even with "huge=always".
>>
>> It will get more tricky once we change that behavior as well, but that's
>> something to likely figure out if it is a real problem at at different
>> day :)
>>
>>
>> I really preferred using the sysfs toggles (as discussed with Hugh in
>> the meeting back then), but I can also understand why we at least want
>> to try making tmpfs behave more like other file systems. But I'm a bit
>> more careful to not ignore the cases where it really isn't like any
>> other file system.
>
> That's also my previous thought, but Matthew is strongly against that.
> Let's step by step.
Yes, I understand his view as well.
But I won't blindly agree to the "tmpfs is just like any other file
system" opinion :)
> >> If we start making PMD-sized THPs special in any non-configurable way,
>> then we are effectively off *worse* than allowing to configure them
>> properly. So if someone voices "but we want only PMD-sized" ones, the
>> next one will say "but we only want cont-pte sized-ones" and then we
>> should provide an option to control the actual sizes to use differently,
>> in some way. But let's see if that is even required.
>
> Yes, I agree. So what I am thinking is, the 'huge=' option should be
> gradually deprecated in the future and eventually tmpfs can allocate any
> size large folios as default.
Let's be realistic, it won't get removed any time soon. ;)
So changing "huge=always" etc. semantics to reflect our new size
options, and then try changing the default (with the option for
people/distros to have the old default) is a reasonable approach, at
least to me.
I'm trying to stay open-minded here, but the proposal I heard so far is
not particularly appealing.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists