lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2mfkm3sotjz5tfw6wvtfrwnerae5pqspelyxw6xg6e5glsyaq6@jl73gcvmtve5>
Date: Tue, 16 Dec 2025 23:33:20 -0800
From: Shakeel Butt <shakeel.butt@...ux.dev>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc: Matthew Wilcox <willy@...radead.org>, 
	Linus Torvalds <torvalds@...ux-foundation.org>, Christoph Hellwig <hch@...radead.org>, 
	"Darrick J. Wong" <djwong@...nel.org>, SHAURYA RANE <ssrane_b23@...vjti.ac.in>, 
	akpm@...ux-foundation.org, eddyz87@...il.com, andrii@...nel.org, ast@...nel.org, 
	linux-fsdevel@...r.kernel.org, linux-mm@...ck.org, linux-kernel@...r.kernel.org, 
	linux-kernel-mentees@...ts.linux.dev, skhan@...uxfoundation.org, david.hunter.linux@...il.com, 
	khalid@...nel.org, syzbot+09b7d050e4806540153d@...kaller.appspotmail.com, 
	bpf <bpf@...r.kernel.org>
Subject: Re: [PATCH] mm/filemap: fix NULL pointer dereference in
 do_read_cache_folio()

On Tue, Nov 18, 2025 at 11:27:47AM -0800, Andrii Nakryiko wrote:
> On Tue, Nov 18, 2025 at 7:37 AM Matthew Wilcox <willy@...radead.org> wrote:
> >
> > On Tue, Nov 18, 2025 at 05:03:24AM -0800, Christoph Hellwig wrote:
> > > On Mon, Nov 17, 2025 at 10:45:31AM -0800, Andrii Nakryiko wrote:
> > > > As I replied on another email, ideally we'd have some low-level file
> > > > reading interface where we wouldn't have to know about secretmem, or
> > > > XFS+DAX, or whatever other unusual combination of conditions where
> > > > exposed internal APIs like filemap_get_folio() + read_cache_folio()
> > > > can crash.
> > >
> > > The problem is that you did something totally insane and it kinda works
> > > most of the time.
> >
> > ... on 64-bit systems.  The HIGHMEM handling is screwed up too.
> >
> > > But bpf or any other file system consumer has
> > > absolutely not business poking into the page cache to start with.
> >
> > Agreed.
> 
> Then please help make it better, give us interfaces you think are
> appropriate. People do use this functionality in production, it's
> important and we are not going to drop it. In non-sleepable mode it's
> best-effort, if the requested part of the file is paged in, we'll
> successfully read data (such as ELF's build ID), and if not, we'll
> report that to the BPF program as -EFAULT. In sleepable mode, we'll
> wait for that part of the file to be paged in before proceeding.
> PROCMAP_QUERY ioctl() is always in sleepable mode, so it will wait for
> file data to be read.
> 
> If you don't like the implementation, please help improve it, don't
> just request dropping it "because BPF folks" or anything like that.
> 

So, I took a stab at this, particularly based on Willy's suggestions on
IOCB_NOWAIT. This is untested and I am just sharing to show how it looks
like and if there are any concerns. In addition I think I will look into
fstest part as well.

BTW by simple code inspection I already see that IOCB_NOWAIT is not well
respected. For example filemap_read() is doing cond_resched() without
any checks. The readahead i.e. page_cache_sync_ra() can potential take
sleeping locks. Btrfs is taking locks in btrfs_file_read_iter. So, it
seems like this would need extensive testing hopefully for all major
FSes.

Here the draft patch:


>From 9652cc97a817fe35e53a7e98a5fbb49c7788c744 Mon Sep 17 00:00:00 2001
From: Shakeel Butt <shakeel.butt@...ux.dev>
Date: Tue, 16 Dec 2025 16:53:57 -0800
Subject: [PATCH] lib/buildid: convert freader to use __kernel_read()

Convert the freader file reading implementation from direct page cache
access via filemap_get_folio()/read_cache_folio() to using kernel_read
interfaces.

Add a new __kernel_read_nowait() function that uses IOCB_NOWAIT flag
for non-blocking I/O. This is used when may_fault is false to avoid
blocking on I/O - if data is not immediately available, it returns
-EAGAIN.

For the may_fault case, use the standard __kernel_read() which can
block waiting for I/O.

This simplifies the code by removing the need to manage folios,
kmap/kunmap operations, and page cache locking. It also makes the
code work with filesystems that don't use the page cache directly.

Signed-off-by: Shakeel Butt <shakeel.butt@...ux.dev>
---
 fs/read_write.c         | 18 ++++++++-
 include/linux/buildid.h |  3 --
 include/linux/fs.h      |  1 +
 lib/buildid.c           | 85 +++++++++--------------------------------
 4 files changed, 37 insertions(+), 70 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 833bae068770..7a042cfeefec 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -503,7 +503,8 @@ static int warn_unsupported(struct file *file, const char *op)
 	return -EINVAL;
 }
 
-ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
+static ssize_t __kernel_read_internal(struct file *file, void *buf,
+			size_t count, loff_t *pos, int flags)
 {
 	struct kvec iov = {
 		.iov_base	= buf,
@@ -526,6 +527,7 @@ ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
 
 	init_sync_kiocb(&kiocb, file);
 	kiocb.ki_pos = pos ? *pos : 0;
+	kiocb.ki_flags |= flags;
 	iov_iter_kvec(&iter, ITER_DEST, &iov, 1, iov.iov_len);
 	ret = file->f_op->read_iter(&kiocb, &iter);
 	if (ret > 0) {
@@ -538,6 +540,20 @@ ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
 	return ret;
 }
 
+ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
+{
+	return __kernel_read_internal(file, buf, count, pos, 0);
+}
+
+/*
+ * Non-blocking variant of __kernel_read() using IOCB_NOWAIT.
+ * Returns -EAGAIN if the read would block waiting for I/O.
+ */
+ssize_t __kernel_read_nowait(struct file *file, void *buf, size_t count, loff_t *pos)
+{
+	return __kernel_read_internal(file, buf, count, pos, IOCB_NOWAIT);
+}
+
 ssize_t kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
 {
 	ssize_t ret;
diff --git a/include/linux/buildid.h b/include/linux/buildid.h
index 831c1b4b626c..f1fa220353a2 100644
--- a/include/linux/buildid.h
+++ b/include/linux/buildid.h
@@ -25,9 +25,6 @@ struct freader {
 	union {
 		struct {
 			struct file *file;
-			struct folio *folio;
-			void *addr;
-			loff_t folio_off;
 			bool may_fault;
 		};
 		struct {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f5c9cf28c4dc..498c804fc0b9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2832,6 +2832,7 @@ extern int do_pipe_flags(int *, int);
 
 extern ssize_t kernel_read(struct file *, void *, size_t, loff_t *);
 ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos);
+ssize_t __kernel_read_nowait(struct file *file, void *buf, size_t count, loff_t *pos);
 extern ssize_t kernel_write(struct file *, const void *, size_t, loff_t *);
 extern ssize_t __kernel_write(struct file *, const void *, size_t, loff_t *);
 extern struct file * open_exec(const char *);
diff --git a/lib/buildid.c b/lib/buildid.c
index aaf61dfc0919..c9d4491557fe 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -5,6 +5,7 @@
 #include <linux/elf.h>
 #include <linux/kernel.h>
 #include <linux/pagemap.h>
+#include <linux/fs.h>
 #include <linux/secretmem.h>
 
 #define BUILD_ID 3
@@ -28,55 +29,35 @@ void freader_init_from_mem(struct freader *r, const char *data, u64 data_sz)
 	r->data_sz = data_sz;
 }
 
-static void freader_put_folio(struct freader *r)
-{
-	if (!r->folio)
-		return;
-	kunmap_local(r->addr);
-	folio_put(r->folio);
-	r->folio = NULL;
-}
-
-static int freader_get_folio(struct freader *r, loff_t file_off)
+/*
+ * Read data from file at specified offset into the freader buffer.
+ * Uses non-blocking I/O when may_fault is false.
+ * Returns 0 on success, negative error code on failure.
+ */
+static int freader_read(struct freader *r, loff_t file_off, size_t sz)
 {
-	/* check if we can just reuse current folio */
-	if (r->folio && file_off >= r->folio_off &&
-	    file_off < r->folio_off + folio_size(r->folio))
-		return 0;
-
-	freader_put_folio(r);
+	ssize_t ret;
+	loff_t pos = file_off;
 
 	/* reject secretmem folios created with memfd_secret() */
 	if (secretmem_mapping(r->file->f_mapping))
 		return -EFAULT;
 
-	r->folio = filemap_get_folio(r->file->f_mapping, file_off >> PAGE_SHIFT);
-
-	/* if sleeping is allowed, wait for the page, if necessary */
-	if (r->may_fault && (IS_ERR(r->folio) || !folio_test_uptodate(r->folio))) {
-		filemap_invalidate_lock_shared(r->file->f_mapping);
-		r->folio = read_cache_folio(r->file->f_mapping, file_off >> PAGE_SHIFT,
-					    NULL, r->file);
-		filemap_invalidate_unlock_shared(r->file->f_mapping);
-	}
+	if (r->may_fault)
+		ret = __kernel_read(r->file, r->buf, sz, &pos);
+	else
+		ret = __kernel_read_nowait(r->file, r->buf, sz, &pos);
 
-	if (IS_ERR(r->folio) || !folio_test_uptodate(r->folio)) {
-		if (!IS_ERR(r->folio))
-			folio_put(r->folio);
-		r->folio = NULL;
+	if (ret < 0)
+		return ret;
+	if (ret != sz)
 		return -EFAULT;
-	}
-
-	r->folio_off = folio_pos(r->folio);
-	r->addr = kmap_local_folio(r->folio, 0);
 
 	return 0;
 }
 
 const void *freader_fetch(struct freader *r, loff_t file_off, size_t sz)
 {
-	size_t folio_sz;
-
 	/* provided internal temporary buffer should be sized correctly */
 	if (WARN_ON(r->buf && sz > r->buf_sz)) {
 		r->err = -E2BIG;
@@ -97,46 +78,18 @@ const void *freader_fetch(struct freader *r, loff_t file_off, size_t sz)
 		return r->data + file_off;
 	}
 
-	/* fetch or reuse folio for given file offset */
-	r->err = freader_get_folio(r, file_off);
+	/* read data from file into buffer */
+	r->err = freader_read(r, file_off, sz);
 	if (r->err)
 		return NULL;
 
-	/* if requested data is crossing folio boundaries, we have to copy
-	 * everything into our local buffer to keep a simple linear memory
-	 * access interface
-	 */
-	folio_sz = folio_size(r->folio);
-	if (file_off + sz > r->folio_off + folio_sz) {
-		u64 part_sz = r->folio_off + folio_sz - file_off, off;
-
-		memcpy(r->buf, r->addr + file_off - r->folio_off, part_sz);
-		off = part_sz;
-
-		while (off < sz) {
-			/* fetch next folio */
-			r->err = freader_get_folio(r, r->folio_off + folio_sz);
-			if (r->err)
-				return NULL;
-			folio_sz = folio_size(r->folio);
-			part_sz = min_t(u64, sz - off, folio_sz);
-			memcpy(r->buf + off, r->addr, part_sz);
-			off += part_sz;
-		}
-
-		return r->buf;
-	}
-
-	/* if data fits in a single folio, just return direct pointer */
-	return r->addr + (file_off - r->folio_off);
+	return r->buf;
 }
 
 void freader_cleanup(struct freader *r)
 {
 	if (!r->buf)
 		return; /* non-file-backed mode */
-
-	freader_put_folio(r);
 }
 
 /*
-- 
2.47.3


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ