linux-kernel - [PATCH] mm/workingset: fix crash from corrupted shadow entries in lru

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251208060046.2933866-1-kartikey406@gmail.com>
Date: Mon,  8 Dec 2025 11:30:45 +0530
From: Deepanshu Kartikey <kartikey406@...il.com>
To: akpm@...ux-foundation.org,
	axelrasmussen@...gle.com,
	yuanchu@...gle.com,
	weixugc@...gle.com,
	hannes@...xchg.org,
	david@...nel.org,
	mhocko@...nel.org,
	zhengqi.arch@...edance.com,
	shakeel.butt@...ux.dev,
	lorenzo.stoakes@...cle.com
Cc: linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	Deepanshu Kartikey <kartikey406@...il.com>,
	syzbot+e008db2ac01e282550ee@...kaller.appspot.com,
	Yu Zhao <yuzhao@...gle.com>
Subject: [PATCH] mm/workingset: fix crash from corrupted shadow entries in lru_gen

Syzbot reported crashes in lru_gen_test_recent() and subsequent NULL
pointer dereferences in the page cache code:

  Oops: general protection fault in lru_gen_test_recent+0xfc/0x370
  KASAN: probably user-memory-access in range [0x0000000000004e00-0x0000000000004e07]

And later:

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor instruction fetch in kernel mode
  RIP: 0010:0x0
  Call Trace:
   filemap_read_folio+0xc8/0x2a0

Investigation revealed that unpack_shadow() can extract an invalid node ID
from shadow entries, causing NODE_DATA(nid) to return NULL for pgdat. In
the reported case, the shadow value was 0x0000000000000041, which is
suspiciously small and indicates corruption.

When this NULL pgdat is passed to mem_cgroup_lruvec(), it leads to crashes
when dereferencing memcg->nodeinfo. The corrupted state also propagates
through the call chain causing subsequent crashes in page cache code.

The root cause of shadow entry corruption is unclear and may indicate a
deeper issue in xarray management, page cache eviction/refault race
conditions, or memory corruption. However, regardless of the source, the
code should handle corrupted entries defensively.

Fix this by:
1. Checking if pgdat is NULL in lru_gen_test_recent() after unpacking the
   shadow entry, and setting *lruvec to NULL to signal corruption.
2. Adding a NULL check for lruvec in lru_gen_refault() to catch and skip
   processing of corrupted entries before the corruption propagates further.

This prevents the immediate crash while the root cause of shadow corruption
can be investigated separately.

Reported-by: syzbot+e008db2ac01e282550ee@...kaller.appspot.com
Closes: https://syzkaller.appspot.com/bug?extid=e008db2ac01e282550ee
Fixes: b1a71694fb00c ("mm/mglru: rework refault detection")
Cc: Yu Zhao <yuzhao@...gle.com>
Signed-off-by: Deepanshu Kartikey <kartikey406@...il.com>
---
 mm/workingset.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/mm/workingset.c b/mm/workingset.c
index e9f05634747a..0ec205a1ae92 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -270,7 +270,14 @@ static bool lru_gen_test_recent(void *shadow, struct lruvec **lruvec,
 	struct pglist_data *pgdat;

 	unpack_shadow(shadow, &memcg_id, &pgdat, token, workingset);
-
+	/*
+	 * If pgdat is NULL, the shadow entry contains an invalid node ID.
+	 * Set lruvec to NULL so caller can detect and skip processing.
+	 */
+	if (unlikely(!pgdat)) {
+		*lruvec = NULL;
+		return false;
+	}
 	memcg = mem_cgroup_from_id(memcg_id);
 	*lruvec = mem_cgroup_lruvec(memcg, pgdat);

@@ -294,9 +301,8 @@ static void lru_gen_refault(struct folio *folio, void *shadow)
 	rcu_read_lock();

 	recent = lru_gen_test_recent(shadow, &lruvec, &token, &workingset);
-	if (lruvec != folio_lruvec(folio))
+	if (!lruvec || lruvec != folio_lruvec(folio))
 		goto unlock;
-
 	mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + type, delta);

 	if (!recent)
-- 
2.43.0