linux-kernel - Re: [syzbot] [rcu?] [bcachefs?] BUG: unable to handle kernel NULL pointer dereference in rcu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aEgi4cHN_Mg31F-y@pc636>
Date: Tue, 10 Jun 2025 14:19:45 +0200
From: Uladzislau Rezki <urezki@...il.com>
To: Joel Fernandes <joelagnelf@...dia.com>
Cc: paulmck@...nel.org, Uladzislau Rezki <urezki@...il.com>,
	Kent Overstreet <kent.overstreet@...ux.dev>,
	syzbot <syzbot+80e5d6f453f14a53383a@...kaller.appspotmail.com>,
	akpm@...ux-foundation.org, josh@...htriplett.org,
	linux-bcachefs@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, rcu@...r.kernel.org,
	syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [rcu?] [bcachefs?] BUG: unable to handle kernel NULL
 pointer dereference in rcu_core (3)

On Mon, Jun 09, 2025 at 10:20:58AM -0400, Joel Fernandes wrote:
> 
> 
> On 6/9/2025 5:47 AM, Paul E. McKenney wrote:
> > On Mon, Jun 09, 2025 at 10:35:34AM +0200, Uladzislau Rezki wrote:
> >> On Sun, Jun 08, 2025 at 05:25:05PM -0700, Paul E. McKenney wrote:
> >>> On Sun, Jun 08, 2025 at 08:23:36PM +0200, Uladzislau Rezki wrote:
> >>>> On Sun, Jun 08, 2025 at 11:26:28AM -0400, Kent Overstreet wrote:
> >>>>> On Wed, Feb 05, 2025 at 06:56:19AM -0800, Paul E. McKenney wrote:
> >>>>>> On Tue, Feb 04, 2025 at 04:34:18PM -0800, syzbot wrote:
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> syzbot found the following issue on:
> >>>>>>>
> >>>>>>> HEAD commit:    0de63bb7d919 Merge tag 'pull-fix' of git://git.kernel.org/..
> >>>>>>> git tree:       upstream
> >>>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=10faf5f8580000
> >>>>>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=1909f2f0d8e641ce
> >>>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=80e5d6f453f14a53383a
> >>>>>>> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> >>>>>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16b69d18580000
> >>>>>>>
> >>>>>>> Downloadable assets:
> >>>>>>> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-0de63bb7.raw.xz
> >>>>>>> vmlinux: https://storage.googleapis.com/syzbot-assets/1142009a30a7/vmlinux-0de63bb7.xz
> >>>>>>> kernel image: https://storage.googleapis.com/syzbot-assets/5d9e46a8998d/bzImage-0de63bb7.xz
> >>>>>>> mounted in repro: https://storage.googleapis.com/syzbot-assets/526692501242/mount_0.gz
> >>>>>>>
> >>>>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> >>>>>>> Reported-by: syzbot+80e5d6f453f14a53383a@...kaller.appspotmail.com
> >>>>>>>
> >>>>>>>  slab radix_tree_node start ffff88803bf382c0 pointer offset 24 size 576
> >>>>>>> BUG: kernel NULL pointer dereference, address: 0000000000000000
> >>>>>>> #PF: supervisor instruction fetch in kernel mode
> >>>>>>> #PF: error_code(0x0010) - not-present page
> >>>>>>> PGD 0 P4D 0 
> >>>>>>> Oops: Oops: 0010 [#1] PREEMPT SMP KASAN NOPTI
> >>>>>>> CPU: 0 UID: 0 PID: 5705 Comm: syz-executor Not tainted 6.14.0-rc1-syzkaller-00020-g0de63bb7d919 #0
> >>>>>>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> >>>>>>> RIP: 0010:0x0
> >>>>>>> Code: Unable to access opcode bytes at 0xffffffffffffffd6.
> >>>>>>> RSP: 0018:ffffc90000007bd8 EFLAGS: 00010246
> >>>>>>> RAX: dffffc0000000000 RBX: 1ffff110077e705c RCX: 23438dd059a4b100
> >>>>>>> RDX: 0000000000000100 RSI: 0000000000000000 RDI: ffff88803bf382d8
> >>>>>>> RBP: ffffc90000007e10 R08: ffffffff819f146c R09: 1ffff11003f8519a
> >>>>>>> R10: dffffc0000000000 R11: 0000000000000000 R12: ffffffff81a6d507
> >>>>>>> R13: ffff88803bf382e0 R14: 0000000000000000 R15: ffff88803bf382d8
> >>>>>>> FS:  0000555567992500(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
> >>>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>>>> CR2: ffffffffffffffd6 CR3: 000000004da38000 CR4: 0000000000352ef0
> >>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>>>>> Call Trace:
> >>>>>>>  <IRQ>
> >>>>>>>  rcu_do_batch kernel/rcu/tree.c:2546 [inline]
> >>>>>>
> >>>>>> The usual way that this happens is that someone clobbers the rcu_head
> >>>>>> structure of something that has been passed to call_rcu().  The most
> >>>>>> popular way of clobbering this structure is to pass the same something to
> >>>>>> call_rcu() twice in a row, but other creative arrangements are possible.
> >>>>>>
> >>>>>> Building your kernel with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y can usually
> >>>>>> spot invoking call_rcu() twice in a row.
> >>>>>
> >>>>> I don't think it's that - syzbot's .config already has that enabled.
> >>>>> KASAN, too.
> >>>>>
> >>>>> And the only place we do call_rcu() is from rcu_pending.c, where we've
> >>>>> got a rearming rcu callback - but we track whether it's outstanding, and
> >>>>> we do all relevant operations with a lock held.
> >>>>>
> >>>>> And we only use rcu_pending.c with SRCU, not regular RCU.
> >>>>>
> >>>>> We do use kfree_rcu() in a few places (all boring, I expect), but that
> >>>>> doesn't (generally?) use the rcu callback list.
> >>>>>
> >>>> Right, kvfree_rcu() does not intersect with regular callbacks, it has
> >>>> its own path. 
> >>>>
> >>>> It looks like the problem is here:
> >>>>
> >>>> <snip>
> >>>>   f = rhp->func;
> >>>>   debug_rcu_head_callback(rhp);
> >>>>   WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
> >>>>   f(rhp);
> >>>> <snip>
> >>>>
> >>>> we do not check if callback, "f", is a NULL. If it is, the kernel bug
> >>>> is triggered right away. For example:
> >>>>
> >>>> call_rcu(&rh, NULL);
> >>>>
> >>>> @Paul, do you think it makes sense to narrow callers which apparently
> >>>> pass NULL as a callback? To me it seems the case of this bug. But we
> >>>> do not know the source.
> >>>>
> >>>> It would give at least a stack-trace of caller which passes a NULL.
> >>>
> >>> Adding a check for NULL func passed to __call_rcu_common(), you mean?
> >>>
> >> Yes. Currently there is no any check. So passing a NULL just triggers
> >> kernel panic.
> >>
> >>>
> >>> That wouldn't hurt, and would either (as you say) catch the culprit
> >>> or show that the problem is elsewhere.
> >>>
> >> I can add it then and send out the patch if no objections.
> > 
> > No objections from me!
> 
> Me neither! And I can push that into an -rc release as well once I have it
> (since it is related to a potential bug).
> 
I will prepare it and send out today.

--
Uladzislau Rezki