linux-kernel - Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160330141552.GA25236@fixme-laptop.cn.ibm.com>
Date:	Wed, 30 Mar 2016 22:15:52 +0800
From:	Boqun Feng <boqun.feng@...il.com>
To:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Jacob Pan <jacob.jun.pan@...ux.intel.com>,
	Josh Triplett <josh@...htriplett.org>,
	Ross Green <rgkernel@...il.com>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	John Stultz <john.stultz@...aro.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	lkml <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...nel.org>,
	Lai Jiangshan <jiangshanlai@...il.com>, dipankar@...ibm.com,
	Andrew Morton <akpm@...ux-foundation.org>,
	rostedt <rostedt@...dmis.org>,
	David Howells <dhowells@...hat.com>,
	Eric Dumazet <edumazet@...gle.com>,
	Darren Hart <dvhart@...ux.intel.com>,
	Frédéric Weisbecker <fweisbec@...il.com>,
	Oleg Nesterov <oleg@...hat.com>,
	pranith kumar <bobby.prani@...il.com>,
	"Chatre, Reinette" <reinette.chatre@...el.com>
Subject: Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17

On Wed, Mar 30, 2016 at 06:30:38AM -0700, Paul E. McKenney wrote:
[...]
> > > 
> > 
> > I ran several tests similar to this setup in a guest, the guest was
> > running in a 12-core Xeon X5690 host with kernel 4.4.5. The guest was
> > given 16 VCPUs and 2G memory.
> > 
> > Rather than RCU stall warnings, I was able to see a bug like this nearly

To be accurate, I did see RCU stall warnings, after the NULL pointer
dereference, but for every run, the NULL pointer dereference always
happened first. 

> > *everytime*:
> > 
> > [  308.291635] BUG: unable to handle kernel NULL pointer dereference at           (null)
> > [  308.292063] IP: [<ffffffff810b9452>] migrate_timer_list+0x32/0xc0
> 
> New one on me!
> 
> > [  308.292063] PGD 0 
> > [  308.292063] Oops: 0002 [#1] PREEMPT SMP 
> > [  308.292063] Modules linked in:
> > [  308.292063] CPU: 11 PID: 798 Comm: torture_onoff Not tainted 4.5.0-rc1+ #14
> > [  308.292063] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-20160216_104851-anatol 04/01/2014
> > [  308.292063] task: ffff88007cafcb00 ti: ffff88007c6f4000 task.ti: ffff88007c6f4000
> > [  308.292063] RIP: 0010:[<ffffffff810b9452>]  [<ffffffff810b9452>] migrate_timer_list+0x32/0xc0
> > [  308.292063] RSP: 0018:ffff88007c6f7c98  EFLAGS: 00010086
> > [  308.292063] RAX: dead000000000200 RBX: ffff88007fb6cc00 RCX: ffff88007fa2cc00
> > [  308.292063] RDX: 0000000000000000 RSI: ffff88007fa2d6d0 RDI: ffff88007fb6cc00
> > [  308.292063] RBP: ffff88007c6f7cc0 R08: 0000000000000000 R09: ffffffff81e4d360
> > [  308.292063] R10: ffffea0001f2daa0 R11: 0000000000000000 R12: ffff88007fa2d6d0
> > [  308.292063] R13: ffff88007fb6cc00 R14: 000000000000000b R15: ffff88007fa2c420
> > [  308.292063] FS:  0000000000000000(0000) GS:ffff88007fb60000(0000) knlGS:0000000000000000
> > [  308.292063] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [  308.292063] CR2: 0000000000000000 CR3: 0000000001e0f000 CR4: 00000000000006e0
> > [  308.292063] Stack:
> > [  308.292063]  ffff88007fb6cc00 ffff88007fa2d6d8 ffff88007fa2dad0 ffff88007fa2d8d0
> > [  308.292063]  ffff88007fa2d4d8 ffff88007c6f7d08 ffffffff810b95c4 ffff88007fa2cc00
> > [  308.292063]  ffff88007fa2d648 00000000fffffff0 ffffffff81e4e0c0 0000000000000007
> > [  308.292063] Call Trace:
> > [  308.292063]  [<ffffffff810b95c4>] timer_cpu_notify+0xe4/0x160
> > [  308.292063]  [<ffffffff81073ea5>] notifier_call_chain+0x45/0x70
> > [  308.292063]  [<ffffffff81073fc9>] __raw_notifier_call_chain+0x9/0x10
> > [  308.292063]  [<ffffffff8105759e>] cpu_notify_nofail+0x1e/0x30
> > [  308.292063]  [<ffffffff81057852>] _cpu_down+0x212/0x350
> > [  308.292063]  [<ffffffff810b3350>] ? __call_rcu.constprop.74+0x380/0x380
> > [  308.292063]  [<ffffffff810b33b0>] ? call_rcu_bh+0x20/0x20
> > [  308.292063]  [<ffffffff810a8950>] ? __rcu_read_unlock+0x50/0x50
> > [  308.292063]  [<ffffffff810a8950>] ? __rcu_read_unlock+0x50/0x50
> > [  308.292063]  [<ffffffff810579bf>] cpu_down+0x2f/0x50
> > [  308.292063]  [<ffffffff811266b5>] torture_onoff+0x115/0x380
> > [  308.292063]  [<ffffffff811265a0>] ? torture_kthread_stopping+0x70/0x70
> > [  308.292063]  [<ffffffff810732f6>] kthread+0xd6/0xf0
> > [  308.292063]  [<ffffffff81073220>] ? kthread_worker_fn+0x180/0x180
> > [  308.292063]  [<ffffffff818b499f>] ret_from_fork+0x3f/0x70
> > [  308.292063]  [<ffffffff81073220>] ? kthread_worker_fn+0x180/0x180
> > [  308.292063] Code: 56 41 55 41 54 53 48 8b 06 44 8b 77 40 48 85 c0 74 53 49 89 fd 49 89 f4 4d 8b 3c 24 66 66 66 66 90 49 8b 07 49 8b 57 08 48 85 c0 <48> 89 02 74 04 48 89 50 08 48 b8 00 02 00 00 00 00 ad de 4c 89 
> > [  308.292063] RIP  [<ffffffff810b9452>] migrate_timer_list+0x32/0xc0
> > [  308.292063]  RSP <ffff88007c6f7c98>
> > [  308.292063] CR2: 0000000000000000
> > [  308.292063] ---[ end trace 29cb5d70ad875872 ]---
> > 
> > 
> > I'm not sure whether this is relevant, maybe this was caused by my
> > setup? But I attach two kernel logs and my config in case this could
> > help us solve the problem.
> > 
> > Another interesting I observed was that the time stamps of printk went
> > back in a while in some run, please note the log named "dmesg.1" in the
> > attachments.
> 
> That can cause all sorts of problems, including RCU CPU stall warnings.
> 
> So I see the following in dmesg.1, courtesy of:
> 
> sed -e 's/].*$//' < /tmp/dmesg.1 | awk '{ if (oldtime != "" && oldtime > $2) print; oldtime = $2 }'
> 
> 1.128644 -> 1.128642:  Only two microseconds backwards, so not enough
> 	to cause a stall warning (need about 21 seconds for that).
> 1.143582 -> 1.143575:  Seven microseconds backwards.
> 316.386056 -> 421.191446: 104.8 seconds backwards, plenty to cause
> 	all sorts of problems, including stall warnings.
> 311.122004 -> 316.338053: 5.2 seconds backwards, can cause all sorts
> 	of problems, but not likely stall warnings.
> 
> > A quick analysis of this bug is the null dereference happened at:
> > 
> > 	migrate_timer_list()
> > 	  detach_timer();
> > 	    __list_del();
> > 	      WRITE_ONCE(*pprev, next); <- pprev is NULL
> > 
> > This seems highly unlikely, but as I said, I hit this almost every time
> > after a few minutes of starting rcutoture.
> 
> Looks like a malformed list, which does look quite ugly.  No idea
> whether or not this could be related to backwards-going time, but
> fixing the backwards-going time is a good first step.
> 
> > Any idea about whether this could be related to the RCU stall problem,
> > or this is another bug, or this actually happend by design for some
> > reasons?
> 
> One of your backwards-time incidents could cause both the RCU CPU stall
> warnings and also the soft lockup I see there.
> 
> If this is an older system, it might have an unreliable TSC.  If the
> kernel is nevertheless using TSC, this sort of thing could happen.
> The usual advice is to boot using HPET, but perhaps others on the
> list have more specific advice.
> 

I tried another run, the kernel was compiled with HPET=y and boot
parameters "hpet=force notsc" and I still hit the same problem, any
suggestion for me to go further to get rid of TSC's impact?

Kernel log attached.

Regards,
Boqun

> 							Thanx, Paul
> 

View attachment "dmesg.3" of type "text/plain" (158943 bytes)

Download attachment "signature.asc" of type "application/pgp-signature" (474 bytes)