[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZxZ68KmHDQYU0yfD@pc636>
Date: Mon, 21 Oct 2024 18:01:52 +0200
From: Uladzislau Rezki <urezki@...il.com>
To: paulmck@...nel.org
Cc: paulmck@...nel.org, Dmitry Vyukov <dvyukov@...gle.com>,
syzbot <syzbot+061d370693bdd99f9d34@...kaller.appspotmail.com>,
Frederic Weisbecker <frederic@...nel.org>,
Neeraj Upadhyay <neeraj.upadhyay@...nel.org>,
Joel Fernandes <joel@...lfernandes.org>,
Josh Triplett <josh@...htriplett.org>,
Boqun Feng <boqun.feng@...il.com>, RCU <rcu@...r.kernel.org>,
Marco Elver <elver@...gle.com>, andrii@...nel.org, ast@...nel.org,
bpf@...r.kernel.org, daniel@...earbox.net, eddyz87@...il.com,
haoluo@...gle.com, john.fastabend@...il.com, jolsa@...nel.org,
kpsingh@...nel.org, linux-kernel@...r.kernel.org,
martin.lau@...ux.dev, sdf@...ichev.me, song@...nel.org,
syzkaller-bugs@...glegroups.com, yonghong.song@...ux.dev
Subject: Re: [syzbot] [bpf?] KCSAN: data-race in __mod_timer / kvfree_call_rcu
> On Mon, Oct 14, 2024 at 7:00 PM Paul E. McKenney <paulmck@...nel.org> wrote:
> >
> > On Mon, Oct 14, 2024 at 10:27:05AM +0200, Dmitry Vyukov wrote:
> > > On Mon, 14 Oct 2024 at 08:07, syzbot
> > > <syzbot+061d370693bdd99f9d34@...kaller.appspotmail.com> wrote:
> > > >
> > > > Hello,
> > > >
> > > > syzbot found the following issue on:
> > > >
> > > > HEAD commit: 5b7c893ed5ed Merge tag 'ntfs3_for_6.12' of https://github...
> > > > git tree: upstream
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=148ae327980000
> > > > kernel config: https://syzkaller.appspot.com/x/.config?x=a2f7ae2f221e9eae
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=061d370693bdd99f9d34
> > > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> > > >
> > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > >
> > > > Downloadable assets:
> > > > disk image: https://storage.googleapis.com/syzbot-assets/79bb9e82835a/disk-5b7c893e.raw.xz
> > > > vmlinux: https://storage.googleapis.com/syzbot-assets/5931997fd31c/vmlinux-5b7c893e.xz
> > > > kernel image: https://storage.googleapis.com/syzbot-assets/fc8cc3d97b18/bzImage-5b7c893e.xz
> > > >
> > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > Reported-by: syzbot+061d370693bdd99f9d34@...kaller.appspotmail.com
> > > >
> > > > ==================================================================
> > > > BUG: KCSAN: data-race in __mod_timer / kvfree_call_rcu
> > > >
> > > > read to 0xffff888237d1cce8 of 8 bytes by task 10149 on cpu 1:
> > > > schedule_delayed_monitor_work kernel/rcu/tree.c:3520 [inline]
> >
> > This is the access to krcp->monitor_work.timer.expires in the function
> > schedule_delayed_monitor_work().
> >
> > Uladzislau, could you please take a look at this one?
> >
> > Thanx, Paul
> >
> > > +rcu maintainers, this looks more like rcu issue
> > >
> > > #syz set subsystems: rcu
> > >
> > > > kvfree_call_rcu+0x3b8/0x510 kernel/rcu/tree.c:3839
> > > > trie_update_elem+0x47c/0x620 kernel/bpf/lpm_trie.c:441
> > > > bpf_map_update_value+0x324/0x350 kernel/bpf/syscall.c:203
> > > > generic_map_update_batch+0x401/0x520 kernel/bpf/syscall.c:1849
> > > > bpf_map_do_batch+0x28c/0x3f0 kernel/bpf/syscall.c:5143
> > > > __sys_bpf+0x2e5/0x7a0
> > > > __do_sys_bpf kernel/bpf/syscall.c:5741 [inline]
> > > > __se_sys_bpf kernel/bpf/syscall.c:5739 [inline]
> > > > __x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5739
> > > > x64_sys_call+0x2625/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:322
> > > > do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > > > do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
> > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > > >
> > > > write to 0xffff888237d1cce8 of 8 bytes by task 56 on cpu 0:
> > > > __mod_timer+0x578/0x7f0 kernel/time/timer.c:1173
> > > > add_timer_global+0x51/0x70 kernel/time/timer.c:1330
> > > > __queue_delayed_work+0x127/0x1a0 kernel/workqueue.c:2523
> > > > queue_delayed_work_on+0xdf/0x190 kernel/workqueue.c:2552
> > > > queue_delayed_work include/linux/workqueue.h:677 [inline]
> > > > schedule_delayed_monitor_work kernel/rcu/tree.c:3525 [inline]
> > > > kfree_rcu_monitor+0x5e8/0x660 kernel/rcu/tree.c:3643
> > > > process_one_work kernel/workqueue.c:3229 [inline]
> > > > process_scheduled_works+0x483/0x9a0 kernel/workqueue.c:3310
> > > > worker_thread+0x51d/0x6f0 kernel/workqueue.c:3391
> > > > kthread+0x1d1/0x210 kernel/kthread.c:389
> > > > ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:147
> > > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
> > > >
> > > > Reported by Kernel Concurrency Sanitizer on:
> > > > CPU: 0 UID: 0 PID: 56 Comm: kworker/u8:4 Not tainted 6.12.0-rc2-syzkaller-00050-g5b7c893ed5ed #0
> > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
> > > > Workqueue: events_unbound kfree_rcu_monitor
> > > > ==================================================================
> > > > bridge0: port 2(bridge_slave_1) entered blocking state
> > > > bridge0: port 2(bridge_slave_1) entered forwarding state
> > > >
>
I tried to reproduce it but i am not able to. For the other hand, it is
obvious that a reading "krcp->monitor_work.timer.expires" and simultaneous
writing is possible.
So, we can address it, i mean to prevent such parallel access by following patch:
<snip>
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index e641cc681901..d711870fde84 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3531,7 +3531,7 @@ static int krc_count(struct kfree_rcu_cpu *krcp)
}
static void
-schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
+__schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
{
long delay, delay_left;
@@ -3545,6 +3545,16 @@ schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
queue_delayed_work(system_wq, &krcp->monitor_work, delay);
}
+static void
+schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
+{
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&krcp->lock, flags);
+ __schedule_delayed_monitor_work(krcp);
+ raw_spin_unlock_irqrestore(&krcp->lock, flags);
+}
+
static void
kvfree_rcu_drain_ready(struct kfree_rcu_cpu *krcp)
{
@@ -3841,7 +3851,7 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr)
// Set timer to drain after KFREE_DRAIN_JIFFIES.
if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING)
- schedule_delayed_monitor_work(krcp);
+ __schedule_delayed_monitor_work(krcp);
unlock_return:
krc_this_cpu_unlock(krcp, flags);
<snip>
i will send out the patch after some testing!
--
Uladzislau Rezki
Powered by blists - more mailing lists