[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220613152337.GD1790663@paulmck-ThinkPad-P17-Gen-1>
Date: Mon, 13 Jun 2022 08:23:37 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: "Zhang, Qiang1" <qiang1.zhang@...el.com>
Cc: "Eric W. Biederman" <ebiederm@...ssion.com>,
syzbot <syzbot+9bb26e7c5e8e4fa7e641@...kaller.appspotmail.com>,
"brauner@...nel.org" <brauner@...nel.org>,
"keescook@...omium.org" <keescook@...omium.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"syzkaller-bugs@...glegroups.com" <syzkaller-bugs@...glegroups.com>
Subject: Re: [syzbot] WARNING in exit_tasks_rcu_finish
On Mon, Jun 13, 2022 at 01:55:31PM +0000, Zhang, Qiang1 wrote:
> > syzbot <syzbot+9bb26e7c5e8e4fa7e641@...kaller.appspotmail.com> writes:
> >
> > > Hello,
> > >
> > > syzbot found the following issue on:
> > >
> > > HEAD commit: 6d0c80680317 Add linux-next specific files for 20220610
> > > git tree: linux-next
> > > console output:
> > > https://syzkaller.appspot.com/x/log.txt?x=13b52c2ff00000
> > > kernel config:
> > > https://syzkaller.appspot.com/x/.config?x=a30d6e3e814e5931
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=9bb26e7c5e8e4fa7e641
> > > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> > >
> > > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > I don't understand what is going on in linux-next kernel/rcu/tasks.h
> > looks different than in Linus's tree. Paul does that mean you have
> > some staged rcu changes?
>
> >Less than 100 RCU-related patches in -rcu, so not all that bad. ;-)
> >
> >But yes, this could possibly be an issue in one of those patches.
>
> > Eric
> >
> >
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: syzbot+9bb26e7c5e8e4fa7e641@...kaller.appspotmail.com
> > >
> > > ------------[ cut here ]------------
> > > WARNING: CPU: 1 PID: 28639 at kernel/rcu/tasks.h:1664
> > > exit_tasks_rcu_finish_trace kernel/rcu/tasks.h:1664 [inline]
> > > WARNING: CPU: 1 PID: 28639 at kernel/rcu/tasks.h:1664
> > > exit_tasks_rcu_finish+0x122/0x1b0 kernel/rcu/tasks.h:1006
>
> >The usual way for this warning to trigger is for these a task to exit while in an RCU Tasks Trace read-side critical section:
> >
> > rcu_read_lock_trace();
> > do_something_that_causes_task_exit();
> >
>
> Hi Paul, wether the following scenarios be considered
>
> rcu_read_unlock_trace_special
> ->if (trs.b.blocked)
> ->raw_spin_lock_irqsave_rcu_node
> ->list_del_init(&t->trc_blkd_node)
> ->WRITE_ONCE(t->trc_reader_special.b.blocked, false)
> ->raw_spin_unlock_irqrestore_rcu_node
> ->Inerrrupt
> ->schedule
> ->rcu_note_context_switch
> ->rcu_tasks_trace_qs
> If (___rttq_nesting && !READ_ONCE((t)->trc_reader_special.b.blocked)
> /*___rttq_nesting ==1 && (t)->trc_reader_special.b.blocked =false*/
> rcu_tasks_trace_qs_blkd(t)
> ->WRITE_ONCE(t->trc_reader_nesting, 0)
> .......
> -> exit_tasks_rcu_finish
>
> Whether the following patch can fix it, or what am I missing?
> Any thoughts?
>
> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> index f1209ce621c5..c607e4c914d3 100644
> --- a/kernel/rcu/tasks.h
> +++ b/kernel/rcu/tasks.h
> @@ -1247,6 +1247,7 @@ void rcu_read_unlock_trace_special(struct task_struct *t)
> struct rcu_tasks_percpu *rtpcp;
> union rcu_special trs;
>
> + WRITE_ONCE(t->trc_reader_nesting, 0);
> // Open-coded full-word version of rcu_ld_need_qs().
> smp_mb(); // Enforce full grace-period ordering.
> trs = smp_load_acquire(&t->trc_reader_special);
> @@ -1267,7 +1268,6 @@ void rcu_read_unlock_trace_special(struct task_struct *t)
> WRITE_ONCE(t->trc_reader_special.b.blocked, false);
> raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
> }
> - WRITE_ONCE(t->trc_reader_nesting, 0);
> }
> EXPORT_SYMBOL_GPL(rcu_read_unlock_trace_special);
Thank you for looking into this!
You do have what I believe to be the correct failure scenario, but the
above fix would break nested RCU Tasks Trace read-side critical sections.
But would you be willing to try out the patch shown below?
Thanx, Paul
------------------------------------------------------------------------
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 08059d8d4f5a7..937a58b3266bf 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -184,7 +184,7 @@ void rcu_tasks_trace_qs_blkd(struct task_struct *t);
if (likely(!READ_ONCE((t)->trc_reader_special.b.need_qs)) && \
likely(!___rttq_nesting)) { \
rcu_trc_cmpxchg_need_qs((t), 0, TRC_NEED_QS_CHECKED); \
- } else if (___rttq_nesting && \
+ } else if (___rttq_nesting && ___rttq_nesting != INT_MIN && \
!READ_ONCE((t)->trc_reader_special.b.blocked)) { \
rcu_tasks_trace_qs_blkd(t); \
} \
diff --git a/include/linux/rcupdate_trace.h b/include/linux/rcupdate_trace.h
index 6f9c358173989..9bc8cbb33340b 100644
--- a/include/linux/rcupdate_trace.h
+++ b/include/linux/rcupdate_trace.h
@@ -75,7 +75,7 @@ static inline void rcu_read_unlock_trace(void)
nesting = READ_ONCE(t->trc_reader_nesting) - 1;
barrier(); // Critical section before disabling.
// Disable IPI-based setting of .need_qs.
- WRITE_ONCE(t->trc_reader_nesting, INT_MIN);
+ WRITE_ONCE(t->trc_reader_nesting, INT_MIN + nesting);
if (likely(!READ_ONCE(t->trc_reader_special.s)) || nesting) {
WRITE_ONCE(t->trc_reader_nesting, nesting);
return; // We assume shallow reader nesting.
Powered by blists - more mailing lists