linux-kernel - Re: [PATCH] mm/mremap: Honour writable bit in mremap pte batching

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <jmxnalmkkc5ztfhokqtzqihsdji2gprnv5z4tzruxi6iqgfkni@aerronulpyem>
Date: Tue, 28 Oct 2025 11:48:51 +0000
From: Pedro Falcato <pfalcato@...e.de>
To: Dev Jain <dev.jain@....com>
Cc: linux-kernel@...r.kernel.org, stable@...r.kernel.org, 
	David Hildenbrand <david@...hat.com>, Andrew Morton <akpm@...ux-foundation.org>, 
	"Liam R. Howlett" <Liam.Howlett@...cle.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, 
	Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>, Barry Song <baohua@...nel.org>, 
	"open list:MEMORY MAPPING" <linux-mm@...ck.org>
Subject: Re: [PATCH] mm/mremap: Honour writable bit in mremap pte batching

On Tue, Oct 28, 2025 at 12:09:52PM +0530, Dev Jain wrote:
> Currently mremap folio pte batch ignores the writable bit during figuring
> out a set of similar ptes mapping the same folio. Suppose that the first
> pte of the batch is writable while the others are not - set_ptes will
> end up setting the writable bit on the other ptes, which is a violation
> of mremap semantics. Therefore, use FPB_RESPECT_WRITE to check the writable
> bit while determining the pte batch.
>

Hmm, it seems to be like we're doing the wrong thing by default here?
I must admit I haven't followed the contpte work as much as I would've
liked, but it doesn't make much sense to me why FPB_RESPECT_WRITE would
be an option you have to explicitly pass, and where folio_pte_batch (the
"simple" interface) doesn't Just Do The Right Thing for naive callers.

Auditing all callers:
 - khugepaged clears a variable number of ptes
 - memory.c clears a variable number of ptes
 - mempolicy.c grabs folios for migrations
 - mlock.c steps over nr_ptes - 1 ptes, speeding up traversal
 - mremap is borked since we're remapping nr_ptes ptes
 - rmap.c TTU unmaps nr_ptes ptes for a given folio

 so while the vast majority of callers don't seem to care, it would make
 sense that folio_pte_batch() works conservatively by default, and
 folio_pte_batch_flags() would allow for further batching (or maybe
 we would add a separate folio_pte_batch_clear() or
 folio_pte_batch_greedy() or whatnot).

> Cc: stable@...r.kernel.org #6.17
> Fixes: f822a9a81a31 ("mm: optimize mremap() by PTE batching")
> Reported-by: David Hildenbrand <david@...hat.com>
> Debugged-by: David Hildenbrand <david@...hat.com>
> Signed-off-by: Dev Jain <dev.jain@....com>

But the solution itself looks okay to me. so, fwiw:

Acked-by: Pedro Falcato <pfalcato@...e.de>

-- 
Pedro