lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <20251220-netfs_mmap_corruption-v1-1-da4a2ce87e18@codewreck.org>
Date: Sat, 20 Dec 2025 08:33:21 +0900
From: Dominique Martinet via B4 Relay <devnull+asmadeus.codewreck.org@...nel.org>
To: David Howells <dhowells@...hat.com>, Paulo Alcantara <pc@...guebit.org>, 
 Christian Schoenebeck <linux_oss@...debyte.com>, 
 Christian Brauner <brauner@...nel.org>
Cc: netfs@...ts.linux.dev, linux-fsdevel@...r.kernel.org, 
 linux-kernel@...r.kernel.org, v9fs@...ts.linux.dev, 
 Dominique Martinet <asmadeus@...ewreck.org>
Subject: [PATCH] netfs: fix folio unlock collection end point

From: Dominique Martinet <asmadeus@...ewreck.org>

9p suffered from a corruption in mmap-heavy workloads such as compiling
with clang (e.g. kernel build with `make LLVM=1`, failing with random
errors on mmaped headers and clang segfaults)

David Howells identified a problem in the netfs trace of such a crash,
below:
> We unlocked page ix=00005 before doing the ZERO subreq that clears
> the page tail.  That shouldn't have happened since the collection
> point hasn't reached the end of the folio yet.

netfs_collect_folio: R=00001b55 ix=00003 r=3000-4000 t=3000/5fb2
netfs_folio: i=157f3 ix=00003-00003 read-done
netfs_folio: i=157f3 ix=00003-00003 read-unlock
netfs_collect_folio: R=00001b55 ix=00004 r=4000-5000 t=4000/5fb2
netfs_folio: i=157f3 ix=00004-00004 read-done
netfs_folio: i=157f3 ix=00004-00004 read-unlock
netfs_collect_folio: R=00001b55 ix=00005 r=5000-5fb2 t=5000/5fb2
netfs_folio: i=157f3 ix=00005-00005 read-done
netfs_folio: i=157f3 ix=00005-00005 read-unlock
...
netfs_collect_stream: R=00001b55[0:] cto=5fb2 frn=ffffffff
netfs_collect_state: R=00001b55 col=5fb2 cln=6000 n=c
netfs_collect_stream: R=00001b55[0:] cto=5fb2 frn=ffffffff
netfs_collect_state: R=00001b55 col=5fb2 cln=6000 n=8
...
netfs_sreq: R=00001b55[2] ZERO SUBMT f=000 s=5fb2 0/4e s=0 e=0
netfs_sreq: R=00001b55[2] ZERO TERM  f=102 s=5fb2 4e/4e s=5 e=0

This patch changes the end condition for collecting folios that have
been read, from "is the IO over?" to "have we read up till the end of
the folio?" (collected_to >= fend), which delays the read-unlock to the
"ZERO CONSUM" step after the page tail has been zeroed

Reported-by: Christian Schoenebeck <linux_oss@...debyte.com>
Link: https://lkml.kernel.org/r/2332946.iZASKD2KPV@weasel
Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item")
Signed-off-by: Dominique Martinet <asmadeus@...ewreck.org>
---
A few notes out of order:
- I don't really understand why this actually matter -- sure,
  there'll be junk after the mmap region ends, but why does
  this cause a bug? Since the folio is unlocked the zero fill operation
  ends up zeroing the end of another page? Well, either way there's no
  doubt that the early unlock was a problem...
- I have no idea how to test that this doesn't break something else by
  not unlocking a page that should be unlocked e.g. something that
  wouldn't need zero-filling (partial refresh?) -- I don't *think* we
  have anything like that, but I don't actually know, so pleae don't
  trust me too much here... Nothing I normally test on 9p broke so it's
  probably not too far off though.

This has been broken for about a year so I guess there's no great hurry
at this point, but any extra pair of eyes would be appreciated...

(Christian S, I've tested as much as I could think of, but any extra
validation would be great too if you have time!)
---
 fs/netfs/read_collect.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
index a95e7aadafd072edbb0f1743e0020eaf65b7d1ed..7a0ffa675fb17aa7431c0fd6cc546deb0be0be22 100644
--- a/fs/netfs/read_collect.c
+++ b/fs/netfs/read_collect.c
@@ -137,7 +137,7 @@ static void netfs_read_unlock_folios(struct netfs_io_request *rreq,
 		rreq->front_folio_order = order;
 		fsize = PAGE_SIZE << order;
 		fpos = folio_pos(folio);
-		fend = umin(fpos + fsize, rreq->i_size);
+		fend = fpos + fsize;
 
 		trace_netfs_collect_folio(rreq, folio, fend, collected_to);
 

---
base-commit: b10e94717f7e295d7ff6ce08249f7e31b630ef1b
change-id: 20251220-netfs_mmap_corruption-62e187ffc28c

Best regards,
-- 
Dominique Martinet <asmadeus@...ewreck.org>



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ