linux-kernel - Re: [stabe-rc 5.9 ] sched: core.c:7270 Illegal context switch in RCU-bh read-side critical section!

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <87lfdxsro7.fsf@nanos.tec.linutronix.de>
Date:   Wed, 16 Dec 2020 16:21:12 +0100
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Naresh Kamboju <naresh.kamboju@...aro.org>,
        Jakub Kicinski <kuba@...nel.org>
Cc:     "Paul E. McKenney" <paulmck@...nel.org>,
        open list <linux-kernel@...r.kernel.org>,
        linux-stable <stable@...r.kernel.org>, rcu@...r.kernel.org,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>,
        lkft-triage@...ts.linaro.org, Netdev <netdev@...r.kernel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Sasha Levin <sashal@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Matthew Wilcox <willy@...radead.org>
Subject: Re: [stabe-rc 5.9 ] sched: core.c:7270 Illegal context switch in RCU-bh read-side critical section!

On Wed, Dec 16 2020 at 15:55, Naresh Kamboju wrote:
> On Tue, 15 Dec 2020 at 23:52, Jakub Kicinski <kuba@...nel.org> wrote:
>> > Or you could place checks for being in a BH-disable further up in
>> > the code.  Or build with CONFIG_DEBUG_INFO=y to allow more precise
>> > interpretation of this stack trace.
>
> I will try to reproduce this warning with DEBUG_INFO=y enabled kernel and
> get back to you with a better crash log.
>
>>
>> My money would be on the option that whatever run on this workqueue
>> before forgot to re-enable BH, but we already have a check for that...
>> Naresh, do you have the full log? Is there nothing like "BUG: workqueue
>> leaked lock" above the splat?

No, because it's in the middle of the work. The workqueue bug triggers
when the work has finished.

So cleanup_up() net does

   ....
   synchronize_rcu();   <- might sleep. So up to here it should be fine.

   list_for_each_entry_continue_reverse(ops, &pernet_list, list)
   	ops_exit_list(ops, &net_exit_list);

ops_exit_list() is called for each ops which then either invokes
ops->exit() or ops->exit_batch().

So one of those callbacks fails to reenable BH, so adding a check after
each invocation of ops->exit() and ops->exit_batch() for
!local_bh_disabled() should be able to identify the buggy callback.

Thanks,

        tglx