linux-kernel - Re: INFO: rcu detected stall in sys

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACT4Y+b_U6YKujEk9X=NHX45KkL93dLsyu5gS44PpEDi94qS0w@mail.gmail.com>
Date:   Wed, 4 Mar 2020 09:59:38 +0100
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Kris Karas <linux-1993@...nlit-rail.com>
Cc:     syzbot <syzbot+0c5c2dbf76930df91489@...kaller.appspotmail.com>,
        David Miller <davem@...emloft.net>,
        Herbert Xu <herbert@...dor.apana.org.au>,
        "open list:HARDWARE RANDOM NUMBER GENERATOR CORE" 
        <linux-crypto@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        Kate Stewart <kstewart@...uxfoundation.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Eric Biggers <ebiggers@...nel.org>, allison@...utok.net
Subject: Re: INFO: rcu detected stall in sys_keyctl

On Wed, Mar 4, 2020 at 9:41 AM Kris Karas <linux-1993@...nlit-rail.com> wrote:
>
> Resending this to all the original CCs per suggestion of Dmitry.
> I'm not a member of linux-crypto, no idea if it will bounce; in any
> case, the OOPS I saw does not appear to be crypto related.
>
> Dmitry Vyukov wrote:
> > syzbot wrote:
> >> Call Trace:
> >>   <IRQ>
> >>   __dump_stack lib/dump_stack.c:77 [inline]
> >>   dump_stack+0x197/0x210 lib/dump_stack.c:118
> >>   nmi_cpu_backtrace.cold+0x70/0xb2 lib/nmi_backtrace.c:101
> >>   nmi_trigger_cpumask_backtrace+0x23b/0x28b lib/nmi_backtrace.c:62
> >>   arch_trigger_cpumask_backtrace+0x14/0x20
> >> arch/x86/kernel/apic/hw_nmi.c:38
> >>   trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
> >>   rcu_dump_cpu_stacks+0x183/0x1cf kernel/rcu/tree_stall.h:254
> >>   print_cpu_stall kernel/rcu/tree_stall.h:475 [inline]
> >>   check_cpu_stall kernel/rcu/tree_stall.h:549 [inline]
> >>   rcu_pending kernel/rcu/tree.c:3030 [inline]
> >>   rcu_sched_clock_irq.cold+0x51a/0xc37 kernel/rcu/tree.c:2276
> >>   update_process_times+0x2d/0x70 kernel/time/timer.c:1726
> >>   tick_sched_handle+0xa2/0x190 kernel/time/tick-sched.c:171
> >>   tick_sched_timer+0x53/0x140 kernel/time/tick-sched.c:1314
> >>   __run_hrtimer kernel/time/hrtimer.c:1517 [inline]
> >>   __hrtimer_run_queues+0x364/0xe40 kernel/time/hrtimer.c:1579
> >>   hrtimer_interrupt+0x314/0x770 kernel/time/hrtimer.c:1641
> >>   local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1119 [inline]
> >>   smp_apic_timer_interrupt+0x160/0x610 arch/x86/kernel/apic/apic.c:1144
> >>   apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829
> >>   </IRQ>
> >>
> > +lib/mpi maintainers
> >
> > I wonder if this can also be triggered by remote actors (tls, wifi,
> > usb, etc).
> >
>
> This looks somewhat similar to an OOPS + rcu stall I reported earlier in
> reply to Greg KH's announcement of 5.5.7:
>
>      rcu: INFO: rcu_sched self-detected stall on CPU
>      rcu:    14-....: (20999 ticks this GP)
> idle=216/1/0x4000000000000002 softirq=454/454 fqs=5250
>              (t=21004 jiffies g=-755 q=1327)
>      NMI backtrace for cpu 14
>      CPU: 14 PID: 520 Comm: pidof Tainted: G      D           5.5.7 #1
>      Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X470
> Taichi, BIOS P3.50 07/18/2019
>      Call Trace:
>       <IRQ>
>       dump_stack+0x50/0x70
>       nmi_cpu_backtrace.cold+0x14/0x53
>       ? lapic_can_unplug_cpu.cold+0x44/0x44
>       nmi_trigger_cpumask_backtrace+0x7b/0x88
>       rcu_dump_cpu_stacks+0x7b/0xa9
>       rcu_sched_clock_irq.cold+0x152/0x39b
>       update_process_times+0x1f/0x50
>       tick_sched_timer+0x40/0x90
>       ? tick_sched_do_timer+0x50/0x50
>       __hrtimer_run_queues+0xdd/0x180
>       hrtimer_interrupt+0x108/0x230
>       smp_apic_timer_interrupt+0x53/0xa0
>       apic_timer_interrupt+0xf/0x20
>       </IRQ>
>
> I don't have a reproducer for it, either.  It showed up in 5.5.7 (but
> might be from earlier as it reproduces so infrequently).

Hi Kris,

What follows after this stack? That's the most interesting part. The
part that you showed is common for all stalls and does not mean much,
besides the fact that there is a stall. These can well be very
different stalls in different parts of kernel.