linux-kernel - Re: [PATCH] mm/gup: Drain batched mlock folio processing before attempting migration

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3194a67b-194c-151d-a961-08c0d0f24d9b@google.com>
Date: Thu, 28 Aug 2025 09:12:30 -0700 (PDT)
From: Hugh Dickins <hughd@...gle.com>
To: David Hildenbrand <david@...hat.com>
cc: Hugh Dickins <hughd@...gle.com>, Will Deacon <will@...nel.org>, 
    linux-mm@...ck.org, linux-kernel@...r.kernel.org, 
    Keir Fraser <keirf@...gle.com>, Jason Gunthorpe <jgg@...pe.ca>, 
    John Hubbard <jhubbard@...dia.com>, Frederick Mayle <fmayle@...gle.com>, 
    Andrew Morton <akpm@...ux-foundation.org>, Peter Xu <peterx@...hat.com>, 
    Rik van Riel <riel@...riel.com>, Vlastimil Babka <vbabka@...e.cz>, 
    Ge Yang <yangge1116@....com>
Subject: Re: [PATCH] mm/gup: Drain batched mlock folio processing before
 attempting migration

On Thu, 28 Aug 2025, David Hildenbrand wrote:
> On 28.08.25 10:47, Hugh Dickins wrote:
...
> > It took several days in search of the least bad compromise, but
> > in the end I concluded the opposite of what we'd intended above.
> > 
> > There is a fundamental incompatibility between my 5.18 2fbb0c10d1e8
> > ("mm/munlock: mlock_page() munlock_page() batch by pagevec")
> > and Ge Yang's 6.11 33dfe9204f29
> > ("mm/gup: clear the LRU flag of a page before adding to LRU batch").
> > 
> > It turns out that the mm/swap.c folio batches (apart from lru_add)
> > are all for best-effort, doesn't matter if it's missed, operations;
> > whereas mlock and munlock are more serious.  Probably mlock could
> > be (not very satisfactorily) converted, but then munlock?  Because
> > of failed folio_test_clear_lru()s, it would be far too likely to
> > err on either side, munlocking too soon or too late.
> > 
> > I've concluded that one or the other has to go.  If we're having
> > a beauty contest, there's no doubt that 33dfe9204f29 is much nicer
> > than 2fbb0c10d1e8 (which is itself far from perfect).  But functionally,
> > I'm afraid that removing the mlock/munlock batching will show up as a
> > perceptible regression in realistic workloadsg; and on consideration,
> > I've found no real justification for the LRU flag clearing change.
> 
> Just to understand what you are saying: are you saying that we will go back to
> having a folio being part of multiple LRU caches?

Yes.  Well, if you count the mlock/munlock batches in as "LRU caches",
then that has been so all along.

> :/ If so, I really rally
> hope that we can find another way and not go back to that old handling.

For what reason?  It sounded like a nice "invariant" to keep in mind,
but it's a limitation, and  what purpose was it actually serving?

If it's the "spare room in struct page to keep the address of that one
batch entry ... efficiently extract ..." that I was dreaming of: that
has to be a future thing, when perhaps memdescs will allow an extensible
structure to be attached, and extending it for an mlocked folio (to hold
the mlock_count instead of squeezing it into lru.prev) would not need
mlock/munlock batching at all (I guess: far from uppermost in my mind!),
and including a field for "efficiently extract" from LRU batch would be
nice.

But, memdescs or not, there will always be pressure to keep the common
struct as small as possible, so I don't know if we would actually go
that way.

But I suspect that was not your reason at all: please illuminate.

Hugh