[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87lfdxsro7.fsf@nanos.tec.linutronix.de>
Date: Wed, 16 Dec 2020 16:21:12 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: Naresh Kamboju <naresh.kamboju@...aro.org>,
Jakub Kicinski <kuba@...nel.org>
Cc: "Paul E. McKenney" <paulmck@...nel.org>,
open list <linux-kernel@...r.kernel.org>,
linux-stable <stable@...r.kernel.org>, rcu@...r.kernel.org,
Linux ARM <linux-arm-kernel@...ts.infradead.org>,
lkft-triage@...ts.linaro.org, Netdev <netdev@...r.kernel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Sasha Levin <sashal@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Matthew Wilcox <willy@...radead.org>
Subject: Re: [stabe-rc 5.9 ] sched: core.c:7270 Illegal context switch in RCU-bh read-side critical section!
On Wed, Dec 16 2020 at 15:55, Naresh Kamboju wrote:
> On Tue, 15 Dec 2020 at 23:52, Jakub Kicinski <kuba@...nel.org> wrote:
>> > Or you could place checks for being in a BH-disable further up in
>> > the code. Or build with CONFIG_DEBUG_INFO=y to allow more precise
>> > interpretation of this stack trace.
>
> I will try to reproduce this warning with DEBUG_INFO=y enabled kernel and
> get back to you with a better crash log.
>
>>
>> My money would be on the option that whatever run on this workqueue
>> before forgot to re-enable BH, but we already have a check for that...
>> Naresh, do you have the full log? Is there nothing like "BUG: workqueue
>> leaked lock" above the splat?
No, because it's in the middle of the work. The workqueue bug triggers
when the work has finished.
So cleanup_up() net does
....
synchronize_rcu(); <- might sleep. So up to here it should be fine.
list_for_each_entry_continue_reverse(ops, &pernet_list, list)
ops_exit_list(ops, &net_exit_list);
ops_exit_list() is called for each ops which then either invokes
ops->exit() or ops->exit_batch().
So one of those callbacks fails to reenable BH, so adding a check after
each invocation of ops->exit() and ops->exit_batch() for
!local_bh_disabled() should be able to identify the buggy callback.
Thanks,
tglx
Powered by blists - more mailing lists