linux-kernel - Re: [RFC PATCH v3 0/4] Support large folios for tmpfs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8247162b-a9dc-496e-83ee-504f74378e8e@redhat.com>
Date: Thu, 31 Oct 2024 11:46:20 +0100
From: David Hildenbrand <david@...hat.com>
To: Baolin Wang <baolin.wang@...ux.alibaba.com>, Daniel Gomez <d@...ces.com>,
 Daniel Gomez <da.gomez@...sung.com>,
 "Kirill A. Shutemov" <kirill@...temov.name>
Cc: Matthew Wilcox <willy@...radead.org>, akpm@...ux-foundation.org,
 hughd@...gle.com, wangkefeng.wang@...wei.com, 21cnbao@...il.com,
 ryan.roberts@....com, ioworker0@...il.com, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org,
 "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>
Subject: Re: [RFC PATCH v3 0/4] Support large folios for tmpfs

>>> I am still worried about adding a new kconfig option, which might
>>> complicate the tmpfs controls further.
>>
>> Why exactly?
> 
> There will be more options to control huge pages allocation for tmpfs,
> which may confuse users and make life harder? Yes, we can add some
> documentation, but I'm still a bit cautious about this.

If it's just "changing the default from "huge=never" to "huge=X" I don't 
see a big problem here. Again, we already do that for anon THPs.

If we make more behavior depend on than (which I don't think we should 
be doing), I agree that it would be more controversial.

[..]

>>>
>>>> That should probably do as a first shot; I assume people will want more
>>>> control over which size to use, especially during page faults, but that
>>>> can likely be added later.
>>
>> I know, it puts you in a bad position because there are different
>> opinions floating around. But let's try to find something that is
>> reasonable and still acceptable. And let's hope that Hugh will voice an
>> opinion :D
> 
> Yes, I am also waiting to see if Hugh has any inputs :)

We keep saying that ... I have to find a way to summon him :)

> 
>>> After some discussions, I think the first step is to achieve two goals:
>>> 1) Try to make tmpfs use large folios like other file systems, that
>>> means we should avoid adding more complex control options (per Matthew).
>>> 2) Still need maintain compatibility with the 'huge=' mount option (per
>>> Kirill), as I also remembered we have customers who use
>>> 'huge=within_size' to allocate THPs for better performance.
>>
>>>
>>> Based on these considerations, my first step is to neither add a new
>>> 'huge=' option parameter nor introduce the mTHP interfaces control for
>>> tmpfs, but rather to change the default huge allocation behavior for
>>> tmpfs. That is to say, when 'huge=' option is not configured, we will
>>> allow the huge folios allocation based on the write size. As a result,
>>> the behavior of huge pages for tmpfs will change as follows:
>>   > > no 'huge=' set: can allocate any size huge folios based on write size
>>   > huge=never: no any size huge folios> huge=always: only PMD sized THP
>> allocation as before
>>   > huge=fadvise: like "always" but only with fadvise/madvise>
>> huge=within_size: like "fadvise" but respect i_size
>>
>> I don't like that:
>>
>> (a) there is no way to explicitly enable/name that new behavior.
> 
> But this is similar to other file systems that enable large folios
> (setting mapping_set_large_folios()), and I haven't seen any other file
> systems supporting large folios requiring a new Kconfig. Maybe tmpfs is
> a bit special?

I'm afraid I don't have the energy to explain once more why I think 
tmpfs is not just like any other file system in some cases.

And distributions are rather careful when it comes to something like 
this ...

> 
> If we all agree that tmpfs is a bit special when using huge pages, then
> fine, a Kconfig option might be needed.
> 
>> (b) "always" etc. are only concerned about PMDs.
> 
> Yes, currently maintain the same semantics as before, in case users
> still expect THPs.

Again, I don't think that is a reasonable approach to make PMD-sized 
ones special here. It will all get seriously confusing and inconsistent.

THPs are opportunistic after all, and page fault behavior will remain 
unchanged (PMD-sized) for now. And even if we support other sizes during 
page faults, we'd like start with the largest size (PMD-size) first, and 
it likely might just all work better than before.

Happy to learn where this really makes a difference.

Of course, if you change the default behavior (which you are planning), 
it's ... a changed default.

If there are reasons to have more tunables regarding the sizes to use, 
then it should not be limited to PMD-size.

 > >> So again, I suggest:
>>
>> huge=never: No THPs of any size
>> huge=always: THPs of any size
>> huge=fadvise: like "always" but only with fadvise/madvise
>> huge=within_size: like "fadvise" but respect i_size
>>
>> "huge=" default depends on a Kconfig option.
>>
>> With that we:
>>
>> (1) Maximize the cases where we will use large folios of any sizes
>>       (which Willy cares about).
>> (2) Have a way to disable them completely (which I care about).
>> (3) Allow distros to keep the default unchanged.
>>
>> Likely, for now we will only try allocating PMD-sized THPs during page
>> faults, and allocate different sizes only during write(). So the effect
>> for many use cases (VMs, DBs) that primarily mmap() tmpfs files will be
>> completely unchanged even with "huge=always".
>>
>> It will get more tricky once we change that behavior as well, but that's
>> something to likely figure out if it is a real problem at at different
>> day :)
>>
>>
>> I really preferred using the sysfs toggles (as discussed with Hugh in
>> the meeting back then), but I can also understand why we at least want
>> to try making tmpfs behave more like other file systems. But I'm a bit
>> more careful to not ignore the cases where it really isn't like any
>> other file system.
> 
> That's also my previous thought, but Matthew is strongly against that.
> Let's step by step.

Yes, I understand his view as well.

But I won't blindly agree to the "tmpfs is just like any other file 
system" opinion :)

 > >> If we start making PMD-sized THPs special in any non-configurable way,
>> then we are effectively off *worse* than allowing to configure them
>> properly. So if someone voices "but we want only PMD-sized" ones, the
>> next one will say "but we only want cont-pte sized-ones" and then we
>> should provide an option to control the actual sizes to use differently,
>> in some way. But let's see if that is even required.
> 
> Yes, I agree. So what I am thinking is, the 'huge=' option should be
> gradually deprecated in the future and eventually tmpfs can allocate any
> size large folios as default.

Let's be realistic, it won't get removed any time soon. ;)

So changing "huge=always" etc. semantics to reflect our new size 
options, and then try changing the default (with the option for 
people/distros to have the old default) is a reasonable approach, at 
least to me.

I'm trying to stay open-minded here, but the proposal I heard so far is 
not particularly appealing.

-- 
Cheers,

David / dhildenb