linux-kernel - Re: WARNING: suspicious RCU usage in idtentry

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200528161143.GF2869@paulmck-ThinkPad-P72>
Date:   Thu, 28 May 2020 09:11:43 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     syzbot <syzbot+3ae5eaae0809ee311e75@...kaller.appspotmail.com>,
        Paolo Bonzini <pbonzini@...hat.com>, bp@...en8.de,
        hpa@...or.com, linux-kernel@...r.kernel.org, luto@...nel.org,
        mingo@...nel.org, syzkaller-bugs@...glegroups.com, x86@...nel.org
Subject: Re: WARNING: suspicious RCU usage in idtentry_exit

On Thu, May 28, 2020 at 03:33:44PM +0200, Thomas Gleixner wrote:
> syzbot <syzbot+3ae5eaae0809ee311e75@...kaller.appspotmail.com> writes:
> 
> + Paolo, Paul
> 
> > syzbot found the following crash on:
> >
> > HEAD commit:    7b4cb0a4 Add linux-next specific files for 20200525
> > git tree:       linux-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=13356016100000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=47b0740d89299c10
> > dashboard link: https://syzkaller.appspot.com/bug?extid=3ae5eaae0809ee311e75
> > compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> >
> > Unfortunately, I don't have any reproducer for this crash yet.
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+3ae5eaae0809ee311e75@...kaller.appspotmail.com
> >
> > =============================
> > WARNING: suspicious RCU usage
> > 5.7.0-rc7-next-20200525-syzkaller #0 Not tainted
> > -----------------------------
> > kernel/rcu/tree.c:715 RCU dynticks_nesting counter underflow/zero!!

So the nesting counter overflowed or got clobbered to either zero
or some negative number.  The usual cause of this is a misnesting of
rcu_nmi_enter() and rcu_nmi_exit().

If this were reproducible, I would suggest tracking this down by enabling
the rcu_dyntick trace event.  :-/

> > other info that might help us debug this:
> >
> >
> > RCU used illegally from idle CPU!

This might indicate that the aforementioned mismatch was having invoked
rcu_nmi_exit() in an exception that never invoked rcu_nmi_enter().
In this case, the lack of the rcu_nmi_enter() would leave the CPU
looking idle to RCU, and then the call to rcu_nmi_exit() would result in
a negative counter.  But I would have expected a pair of earlier splats
from rcu_nmi_exit() in that case:

	WARN_ON_ONCE(rdp->dynticks_nesting <= 0);
	WARN_ON_ONCE(rcu_dynticks_curr_cpu_in_eqs());

So another hypothesis is that neither rcu_nmi_enter() nor rcu_nmi_exit()
were invoked, leaving the ->dynticks_nesting counter at the value zero,
in turn causing rcu_irq_exit_preempt() to complain.

> > rcu_scheduler_active = 2, debug_locks = 1
> > RCU used illegally from extended quiescent state!

Huh.  This is a bit repetitive, isn't it?  I just queued a patch to say this
only once.  </distraction>

> > no locks held by syz-executor.5/24641.
> >
> > stack backtrace:
> > CPU: 1 PID: 24641 Comm: syz-executor.5 Not tainted 5.7.0-rc7-next-20200525-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > Call Trace:
> >  __dump_stack lib/dump_stack.c:77 [inline]
> >  dump_stack+0x18f/0x20d lib/dump_stack.c:118
> >  rcu_irq_exit_preempt+0x1fa/0x250 kernel/rcu/tree.c:715
> >  idtentry_exit+0x9e/0xc0 arch/x86/entry/common.c:583
> >  exc_general_protection+0x23d/0x520 arch/x86/kernel/traps.c:506
> >  asm_exc_general_protection+0x1e/0x30 arch/x86/include/asm/idtentry.h:353
> > RIP: 0010:kvm_fastop_exception+0xb68/0xfe8
> > Code: f2 ff ff ff 48 31 db e9 fb c9 2a f9 b8 f2 ff ff ff 48 31 f6 e9 ff c9 2a f9 31 c0 e9 ec 2c 2b f9 b8 fb ff ff ff e9 13 a9 31 f9 <b9> fb ff ff ff 31 c0 31 d2 e9 33 a9 31 f9 31 db e9 2a 0b 42 f9 31
> > RSP: 0018:ffffc90004a87a30 EFLAGS: 00010212
> > RAX: 0000000000040000 RBX: ffff88809cca4080 RCX: 0000000000000122
> > RDX: 00000000000063ff RSI: ffffc90004a87a98 RDI: 0000000000000122
> > RBP: 0000000000000122 R08: ffff888058486480 R09: fffffbfff131f481
> > R10: ffffffff898fa403 R11: fffffbfff131f480 R12: 0000000000000122
> > R13: 0000000000000078 R14: 0000000000000006 R15: ffffffff88244b5c
> >  paravirt_read_msr_safe arch/x86/include/asm/paravirt.h:178 [inline]
> >  vmx_create_vcpu+0x184/0x2b40 arch/x86/kvm/vmx/vmx.c:6827
> >  kvm_arch_vcpu_create+0x6a8/0xb30 arch/x86/kvm/x86.c:9427
> >  kvm_vm_ioctl_create_vcpu arch/x86/kvm/../../../virt/kvm/kvm_main.c:3043 [inline]
> >  kvm_vm_ioctl+0x15b7/0x2460 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3603
> >  vfs_ioctl fs/ioctl.c:48 [inline]
> >  ksys_ioctl+0x11a/0x180 fs/ioctl.c:753
> >  __do_sys_ioctl fs/ioctl.c:762 [inline]
> >  __se_sys_ioctl fs/ioctl.c:760 [inline]
> >  __x64_sys_ioctl+0x6f/0xb0 fs/ioctl.c:760
> >  do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:353
> >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > RIP: 0033:0x45ca29
> > Code: 0d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> > RSP: 002b:00007f2c93b11c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> > RAX: ffffffffffffffda RBX: 00000000004e73c0 RCX: 000000000045ca29
> > RDX: 0000000000000000 RSI: 000000000000ae41 RDI: 0000000000000004
> > RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
> > R13: 0000000000000396 R14: 00000000004c62c6 R15: 00007f2c93b126d4
> 
> Weird. I have no idea how that thing is an EQS here.

No argument on the "Weird" part!  ;-)

Is this a NO_HZ_FULL=y kernel?  If so, one possibility is that the call
to rcu_user_exit() went missing somehow.  If not, then RCU should have
been watching userspace execution.

Again, the only thing I can think of (should this prove to be
reproducible) is the rcu_dyntick trace event.

							Thanx, Paul