lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <68d8c7ad-aea0-4556-be63-9b67d70e4386@redhat.com>
Date: Tue, 17 Jun 2025 10:45:53 +0200
From: David Hildenbrand <david@...hat.com>
To: Pedro Falcato <pfalcato@...e.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
 Andrew Morton <akpm@...ux-foundation.org>, Vlastimil Babka <vbabka@...e.cz>,
 Jann Horn <jannh@...gle.com>, "Liam R . Howlett" <Liam.Howlett@...cle.com>,
 Suren Baghdasaryan <surenb@...gle.com>, Matthew Wilcox
 <willy@...radead.org>, Rik van Riel <riel@...riel.com>,
 Harry Yoo <harry.yoo@...cle.com>, Zi Yan <ziy@...dia.com>,
 Baolin Wang <baolin.wang@...ux.alibaba.com>, Nico Pache <npache@...hat.com>,
 Ryan Roberts <ryan.roberts@....com>, Dev Jain <dev.jain@....com>,
 Jakub Matena <matenajakub@...il.com>, Wei Yang <richard.weiyang@...il.com>,
 Barry Song <baohua@...nel.org>, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH 00/11] mm/mremap: introduce more mergeable mremap via
 MREMAP_RELOCATE_ANON

On 17.06.25 10:34, Pedro Falcato wrote:
> On Mon, Jun 16, 2025 at 10:41:20PM +0200, David Hildenbrand wrote:
>> On 16.06.25 22:24, David Hildenbrand wrote:
>>> Hi Lorenzo,
>>>
>>> as discussed offline, there is a lot going on an this is rather ... a
>>> lot of code+complexity for something that is more a corner cases. :)
>>>
>>> Corner-case as in: only select user space will benefit from this, which
>>> is really a shame.
>>>
>>> After your presentation at LSF/MM, I thought about this further, and I
>>> was wondering whether:
>>>
>>> (a) We cannot make this semi-automatic, avoiding flags.
>>>
>>> (b) We cannot simplify further by limiting it to the common+easy cases
>>> first.
>>>
>>> I think you already to some degree did b) as part of this non-RFC, which
>>> is great.
>>>
>>>
>>> So before digging into the details, let's discuss the high level problem
>>> briefly.
>>>
>>> I think there are three parts to it:
>>>
>>> (1) Detecting whether it is safe to adjust the folio->index (small
>>>        folios)
>>>
>>> (2) Performance implications of doing so
>>>
>>> (3) Detecting whether it is safe to adjust the folio->index (large PTE-
>>>        mapped  folios)
>>>
>>>
>>> Regarding (1), if we simply track whether a folio was ever used for
>>> COW-sharing, it would be very easy: and not only for present folios, but
>>> for any anon folios that are referenced by swap/migration entries.
>>> Skimming over patch #1, I think you apply a similar logic, which is good.
>>>
>>> Regarding (2), it would apply when we mremap() anon VMAs and they happen
>>> to reside next to other anon VMAs. Which workloads are we concerned
>>> about harming by implementing this optimization? I recall that the most
>>> common use case for mremap() is actually for file mappings, but I might
> 
> realloc() for mmapped allocations commonly calls mremap(), FYI (at least for
> glibc, and musl; can't bother to look at the rest).

Good point. Only for larger areas, I assume, where glibc would already 
fallback to expensive mmap()+munmap() instead of using the optimized 
sparse area.

> 
>>> be wrong. In any case, we could just have a different way to enable this
>>> optimization than for each and every mremap() invocation in a process.
> 
> /me thinks of prctl

I didn't want to spell that out :P I don't think this would have to be 
configurable per process ...

> 
> :P
> 
> 
> FWIW, with regards to the whole feature: While I do understand it's purpose (
> relocating anon might be too much for most workloads, but great for some), I'm
> uncomfortable with the amount of internals we're exposing here. Who's to say
> this is how mm rmap looks in 20 years? And we're stuck maintaining the userspace
> ABI until then.

Yes.

> 
> Personally, I would prefer if we just had a flag 'MREMAP_HARDER' that would
> vaguely be documented as "mremap but harder, even if have to do a little more
> work". Then we could move things around without promising RELOCATE_ANON makes
> conceptual sense, and userspace wouldn't have to think through the implications
> of such a flag by reading Lorenzo's great book.

Even such a flag is just weird.

Next time we do MREMAP_EVEN_HARDER

mremap() is already an expensive operation ... so I think we need a 
pretty convincing case to make this configurable by the user at all for 
each individual mremap() invocation.

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ