lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7ae2e19c-10f6-4121-bc15-dd07c11b197a@lucifer.local>
Date: Tue, 24 Jun 2025 11:19:58 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: David Hildenbrand <david@...hat.com>
Cc: Pedro Falcato <pfalcato@...e.de>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
        "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Matthew Wilcox <willy@...radead.org>, Rik van Riel <riel@...riel.com>,
        Harry Yoo <harry.yoo@...cle.com>, Zi Yan <ziy@...dia.com>,
        Baolin Wang <baolin.wang@...ux.alibaba.com>,
        Nico Pache <npache@...hat.com>, Ryan Roberts <ryan.roberts@....com>,
        Dev Jain <dev.jain@....com>, Jakub Matena <matenajakub@...il.com>,
        Wei Yang <richard.weiyang@...il.com>, Barry Song <baohua@...nel.org>,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 00/11] mm/mremap: introduce more mergeable mremap via
 MREMAP_RELOCATE_ANON

On Tue, Jun 24, 2025 at 11:38:59AM +0200, David Hildenbrand wrote:
> On 20.06.25 21:28, Lorenzo Stoakes wrote:
> > I have some ideas... :)

Note that I've been working hard on a respin, figuring out ways to
basically make it so we can't fail to set up folios (afaict) so we get
predictable undo.

Of course we make life very very hard for ourselves in mm :)

>
> As a first step, we could have some global way to enable/disable the
> optimization system-wide. We could then learn if there is really any
> workload that notices the change, while still having a way to revert to the
> old behavior on affected systems easily.

Yeah I was wondering if we could do something like this... I mean we could
hide it in /sys/kernel/mm worst case.

>
> Just a thought, I still hope we can avoid all that. Again, mremap() is not
> really known for being a very efficient operation.

Agreed, and I don't think we should microbenchmark it so much. I think as long
as it's roughly the same order of magnitude time taken then it should be fine?

>
> >
> > >
> > > Or: separate but maybe awful idea, but if the problem is the number of VMAs
> > > maybe we could try harder based on the map count? i.e if
> > > map_count > (max_map_count / 2), try to relocate anon.
> >
> > Interesting, though that'd make some things randomly merge and other stuff not,
> > and you really have to consistently do this stuff to make things mergeable.
>
> Yes, I'd prefer if we can make it more predictable.
>
> (Of course, the VMA region size could also be used as an input to a policy.
> e.g., small move -> much fragmentation -> merge, large move -> less
> fragmentation -> don't care. Knowing about the use cases that use mremap()
> of anon memory and how they might be affected could be very valuable. Maybe
> it's mostly moving a handful of pages where we most care about this
> optimization?).

I think fundamentally there are two problems:

1. Unexpected VMA fragmentation leading to later mremap() failure.
2. Unnecessary VMA proliferation.

So we could fix 1 with a 'allow multiple VMAs to be moved if no resize'
patch. And of course the relocate anon stuff is about 2.

In theory we could combine it, but things could become complicated as then
it's mulitple VMA/anon_vma merges.

>
>
> --
> Cheers,
>
> David / dhildenb
>

Anyway, let me polish up the respin and we can see how that goes :)
stress-ng is helping...

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ