[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <28f311ec-9b46-4f28-991c-ac74177acf32@redhat.com>
Date: Wed, 8 May 2024 14:10:21 +0200
From: David Hildenbrand <david@...hat.com>
To: Ryan Roberts <ryan.roberts@....com>,
Baolin Wang <baolin.wang@...ux.alibaba.com>, akpm@...ux-foundation.org,
hughd@...gle.com
Cc: willy@...radead.org, ioworker0@...il.com, wangkefeng.wang@...wei.com,
ying.huang@...el.com, 21cnbao@...il.com, shy828301@...il.com,
ziy@...dia.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 5/8] mm: shmem: add multi-size THP sysfs interface for
anonymous shmem
On 08.05.24 14:02, David Hildenbrand wrote:
> On 08.05.24 11:02, Ryan Roberts wrote:
>> On 08/05/2024 08:12, David Hildenbrand wrote:
>>> On 08.05.24 09:08, David Hildenbrand wrote:
>>>> On 08.05.24 06:45, Baolin Wang wrote:
>>>>>
>>>>>
>>>>> On 2024/5/7 18:52, Ryan Roberts wrote:
>>>>>> On 06/05/2024 09:46, Baolin Wang wrote:
>>>>>>> To support the use of mTHP with anonymous shmem, add a new sysfs interface
>>>>>>> 'shmem_enabled' in the '/sys/kernel/mm/transparent_hugepage/hugepages-kB/'
>>>>>>> directory for each mTHP to control whether shmem is enabled for that mTHP,
>>>>>>> with a value similar to the top level 'shmem_enabled', which can be set to:
>>>>>>> "always", "inherit (to inherit the top level setting)", "within_size",
>>>>>>> "advise",
>>>>>>> "never", "deny", "force". These values follow the same semantics as the top
>>>>>>> level, except the 'deny' is equivalent to 'never', and 'force' is equivalent
>>>>>>> to 'always' to keep compatibility.
>>>>>>
>>>>>> We decided at [1] to not allow 'force' for non-PMD-sizes.
>>>>>>
>>>>>> [1]
>>>>>> https://lore.kernel.org/linux-mm/533f37e9-81bf-4fa2-9b72-12cdcb1edb3f@redhat.com/
>>>>>>
>>>>>> However, thinking about this a bit more, I wonder if the decision we made to
>>>>>> allow all hugepages-xxkB/enabled controls to take "inherit" was the wrong one.
>>>>>> Perhaps we should have only allowed the PMD-sized enable=inherit (this is just
>>>>>> for legacy back compat after all, I don't think there is any use case where
>>>>>> changing multiple mTHP size controls atomically is actually useful). Applying
>>>>>
>>>>> Agree. This is also our usage of 'inherit'.
>>>
>>> Missed that one: there might be use cases in the future once we would start
>>> defaulting to "inherit" for all knobs (a distro might default to that) and
>>> default-enable THP in the global knob. Then, it would be easy to disable any THP
>>> by disabling the global knob. (I think that's the future we're heading to, where
>>> we'd have an "auto" mode that can be set on the global toggle).
>>>
>>> But I am just making up use cases ;) I think it will be valuable and just doing
>>> it consistently now might be cleaner.
>>
>> I agree that consistency between enabled and shmem_enabled is top priority. And
>> yes, I had forgotten about the glorious "auto" future. So probably continuing
>> all sizes to select "inherit" is best.
>>
>> But for shmem_enabled, that means we need the following error checking:
>>
>> - It is an error to set "force" for any size except PMD-size
>>
>> - It is an error to set "force" for the global control if any size except PMD-
>> size is set to "inherit"
>>
>> - It is an error to set "inherit" for any size except PMD-size if the global
>> control is set to "force".
>>
>> Certainly not too difficult to code and prove to be correct, but not the nicest
>> UX from the user's point of view when they start seeing errors.
>>
>> I think we previously said this would likely be temporary, and if/when tmpfs
>> gets mTHP support, we could simplify and allow all sizes to be set to "force".
>> But I wonder if tmpfs would ever need explicit mTHP control? Maybe it would be
>> more suited to the approach the page cache takes to transparently ramp up the
>> folio size as it faults more in. (Just saying there is a chance that this error
>> checking becomes permanent).
>
> Note that with shmem you're inherently facing the same memory waste
> issues etc as you would with anonymous memory. (sometimes even worse, if
> you're running shmem that's configured to be unswappable!).
Also noting that memory waste is not really a problem when a write to a
shmem file allocates a large folio that stays within boundaries of that
write; issues only pop up if you end up over-allocating, especially,
during page faults where you have not that much clue about what to do
(single address, no real range provided).
There is the other issue that wasting large chunks of contiguous memory
on stuff that barely benefits from it. With memory that maybe never gets
evicted, there is no automatic "handing back" of that memory to the
system to be used by something else. With ordinary files, that's a bit
different. But I did not look closer into that issue yet, it's one of
the reasons MADV_HUGEPAGE was added IIRC.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists