[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5ff81ef9-e755-4a75-bcce-92c4a4d1da6e@redhat.com>
Date: Tue, 24 Jun 2025 11:38:59 +0200
From: David Hildenbrand <david@...hat.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Pedro Falcato <pfalcato@...e.de>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Suren Baghdasaryan <surenb@...gle.com>, Matthew Wilcox
<willy@...radead.org>, Rik van Riel <riel@...riel.com>,
Harry Yoo <harry.yoo@...cle.com>, Zi Yan <ziy@...dia.com>,
Baolin Wang <baolin.wang@...ux.alibaba.com>, Nico Pache <npache@...hat.com>,
Ryan Roberts <ryan.roberts@....com>, Dev Jain <dev.jain@....com>,
Jakub Matena <matenajakub@...il.com>, Wei Yang <richard.weiyang@...il.com>,
Barry Song <baohua@...nel.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 00/11] mm/mremap: introduce more mergeable mremap via
MREMAP_RELOCATE_ANON
On 20.06.25 21:28, Lorenzo Stoakes wrote:
> On Fri, Jun 20, 2025 at 07:59:17PM +0100, Pedro Falcato wrote:
>> On Tue, Jun 17, 2025 at 11:57:11AM +0100, Lorenzo Stoakes wrote:
>>> On Tue, Jun 17, 2025 at 10:45:53AM +0200, David Hildenbrand wrote:
>>>> mremap() is already an expensive operation ... so I think we need a pretty
>>>> convincing case to make this configurable by the user at all for each
>>>> individual mremap() invocation.
>>>
>>> My measurements suggest, unless you hit a very unfortunate case of -huge
>>> faulted in range all mapped PTE- that the work involved is not all that
>>> much more substantial in terms of order of magnitude than a normal mremap()
>>> operation.
>>>
>>
>> Could you share your measurements and/or post them on the cover letter for the
>> next version?
>
> Yeah am going to experiment nad gather some data for the next respin and see
> what might be possible.
>
> I will present this kind of data then.
>
>>
>> If indeed it makes no practical difference, maybe we could try to enable it by
>> default and see what happens...
>
> Well it makes a difference, but the question is how much it matters (we have to
> traverse every single PTE for faulted-in memory vs. if we move page tables we
> can potentially move at PMD granularity saving 512 traversals, but if the folios
> are large then we're not really slower...).
>
> I have some ideas... :)
As a first step, we could have some global way to enable/disable the
optimization system-wide. We could then learn if there is really any
workload that notices the change, while still having a way to revert to
the old behavior on affected systems easily.
Just a thought, I still hope we can avoid all that. Again, mremap() is
not really known for being a very efficient operation.
>
>>
>> Or: separate but maybe awful idea, but if the problem is the number of VMAs
>> maybe we could try harder based on the map count? i.e if
>> map_count > (max_map_count / 2), try to relocate anon.
>
> Interesting, though that'd make some things randomly merge and other stuff not,
> and you really have to consistently do this stuff to make things mergeable.
Yes, I'd prefer if we can make it more predictable.
(Of course, the VMA region size could also be used as an input to a
policy. e.g., small move -> much fragmentation -> merge, large move ->
less fragmentation -> don't care. Knowing about the use cases that use
mremap() of anon memory and how they might be affected could be very
valuable. Maybe it's mostly moving a handful of pages where we most care
about this optimization?).
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists