lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <69365ec8.a70a0220.38f243.0086.GAE@google.com>
Date: Sun, 07 Dec 2025 21:14:48 -0800
From: syzbot <syzbot+e008db2ac01e282550ee@...kaller.appspotmail.com>
To: linux-kernel@...r.kernel.org, syzkaller-bugs@...glegroups.com
Subject: Forwarded: [PATCH] mm/workingset: fix crash from corrupted shadow
 entries in lru_gen

For archival purposes, forwarding an incoming command email to
linux-kernel@...r.kernel.org, syzkaller-bugs@...glegroups.com.

***

Subject: [PATCH] mm/workingset: fix crash from corrupted shadow entries in lru_gen
Author: kartikey406@...il.com

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

Syzbot reported crashes in lru_gen_test_recent() and subsequent NULL
pointer dereferences in the page cache code:

  Oops: general protection fault in lru_gen_test_recent+0xfc/0x370
  KASAN: probably user-memory-access in range [0x0000000000004e00-0x0000000000004e07]

And later:

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor instruction fetch in kernel mode
  RIP: 0010:0x0
  Call Trace:
   filemap_read_folio+0xc8/0x2a0

The root cause is that unpack_shadow() can extract an invalid node ID
from a corrupted shadow entry, causing NODE_DATA(nid) to return NULL for
pgdat. When this NULL pgdat is passed to mem_cgroup_lruvec(), it leads
to crashes when dereferencing memcg->nodeinfo.

Even if we detect and return early from lru_gen_test_recent(), the
corrupted state propagates through the call chain, eventually causing
crashes in the page cache code when trying to use the corrupted folio.

Fix this by:
1. Checking if pgdat is NULL in lru_gen_test_recent() and setting
   *lruvec to NULL to signal the corruption to the caller.
2. Adding a NULL check for lruvec in lru_gen_refault() to catch
   corrupted shadow entries and skip processing before the corruption
   can propagate further into the page cache code.

Reported-by: syzbot+e008db2ac01e282550ee@...kaller.appspot.com
Link: https://syzkaller.appspot.com/bug?extid=e008db2ac01e282550ee
Signed-off-by: Deepanshu Kartikey <kartikey406@...il.com>
---
---
 mm/workingset.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/mm/workingset.c b/mm/workingset.c
index e9f05634747a..364434168b4c 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -270,7 +270,15 @@ static bool lru_gen_test_recent(void *shadow, struct lruvec **lruvec,
 	struct pglist_data *pgdat;
 
 	unpack_shadow(shadow, &memcg_id, &pgdat, token, workingset);
-
+	/*
+	 * If pgdat is NULL, the shadow entry contains an invalid node ID.
+	 * Set lruvec to NULL so caller can detect and skip processing.
+	 */
+	if (unlikely(!pgdat)) {
+	        *lruvec = NULL;
+		pr_warn("lru_gen_test_recent: Detected corrupted shadow (NULL pgdat), setting lruvec=NULL\n");
+		return false;
+	}
 	memcg = mem_cgroup_from_id(memcg_id);
 	*lruvec = mem_cgroup_lruvec(memcg, pgdat);
 
@@ -294,9 +302,11 @@ static void lru_gen_refault(struct folio *folio, void *shadow)
 	rcu_read_lock();
 
 	recent = lru_gen_test_recent(shadow, &lruvec, &token, &workingset);
-	if (lruvec != folio_lruvec(folio))
+	if (!lruvec || lruvec != folio_lruvec(folio)) {
+		if(!lruvec)
+			pr_warn("lru_gen_refault: Skipping corrupted entry (lruvec=NULL)\n");
 		goto unlock;
-
+	}
 	mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + type, delta);
 
 	if (!recent)
-- 
2.43.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ