linux-kernel - Re: [PATCH 1/2] mm/khugepaged: do synchronous writeback for MADV

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <312bfcbd-d31a-40d3-8b9c-edc7b6166113@lucifer.local>
Date: Mon, 10 Nov 2025 13:36:10 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Dev Jain <dev.jain@....com>
Cc: Shivank Garg <shivankg@....com>, Andrew Morton <akpm@...ux-foundation.org>,
        David Hildenbrand <david@...hat.com>, Zi Yan <ziy@...dia.com>,
        Baolin Wang <baolin.wang@...ux.alibaba.com>,
        "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        Nico Pache <npache@...hat.com>, Ryan Roberts <ryan.roberts@....com>,
        Barry Song <baohua@...nel.org>, Lance Yang <lance.yang@...ux.dev>,
        Steven Rostedt <rostedt@...dmis.org>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Zach O'Keefe <zokeefe@...gle.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
        Branden Moore <Branden.Moore@....com>
Subject: Re: [PATCH 1/2] mm/khugepaged: do synchronous writeback for
 MADV_COLLAPSE

On Mon, Nov 10, 2025 at 06:54:57PM +0530, Dev Jain wrote:
>
> On 10/11/25 5:31 pm, Lorenzo Stoakes wrote:
> >
> > ofc you are addressing this with the !cc->is_khugepaged, but feels like we'd be
> > better off just doing it in madvise_collapse().
>
> I suppose the common case is that writeback is not needed, and I don't know what
> is the overhead of filemap_write_and_wait_range() in that case, but your

Low.

> suggestion will force that unconditional overhead and skip all the early exits
> possible in hpage_collapse_scan_file()?

Which are?

PMD-mapped folio at start of range, scan abort, non-LRU, spurious ref count

I don't think this matters.

And we're trading for putting _yet more_ logic that belongs elsewhere in the
wrong place.

I mean I'd actually be pretty good with us putting it literally in madvise.c,
but since we defer the collapse to khugepaged.c then madvise_collapse().

>
> > I wonder also if doing I/O without getting the mmap lock again and revalidating
> > is wise, as the state of things might have changed significantly.
> >
> > So maybe need a hugepage_vma_revalidate() as well?
>
> The file collapse path (apart from collapse_pte_mapped_thp) is solely concerned
> about doing the collapse in the page cache, for which information about the mm or
> the vma is not needed, that is why no locking is required. The get_file() in

The user has asked specifically to collapse pages in a VMA's range. Yes you can
go check the mapping of a pinned file which you're keeping pinned during this
operation (wise? Not sure it is).

This would be the first time in this operation we are doing a _synchronous_ I/O
operation where we sleep holding a pin.

So no, I think we really do need to revalidate.

'Collapse some random file we no longer map at this address' is probably not
great semantics, also of course, we are revalidating at each PMD anyway.

Maybe even do a addr -= HPAGE_PMD_SIZE; continue + take it from the top?

Maybe David has thoughts...

> khugepaged_scan_mm_slot() seems to be serving the same function as maybe_unlock_mmap_for_io().
> So I don't think this is needed?

We're talking about the MADV_COLLAPSE case so don't understand this? I may be
missing something here (happens a lot ;)!


>
> >
> >
> > > +
> > >   	result = alloc_charge_folio(&new_folio, mm, cc);
> > >   	if (result != SCAN_SUCCEED)
> > >   		goto out;
> > >
> > > base-commit: e9a6fb0bcdd7609be6969112f3fbfcce3b1d4a7c
> > > --
> > > 2.43.0
> > >
> > Thanks, Lorenzo

Cheers, Lorenzo