lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20150806041504.GA14220@codemonkey.org.uk>
Date:	Thu, 6 Aug 2015 00:15:08 -0400
From:	Dave Jones <davej@...emonkey.org.uk>
To:	Frederic Weisbecker <fweisbec@...il.com>
Cc:	Sasha Levin <sasha.levin@...cle.com>, paulmck@...ux.vnet.ibm.com,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	Josh Triplett <josh@...htriplett.org>,
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: 4.2-rc5 rcu stalls.

On Wed, Aug 05, 2015 at 02:37:59PM +0200, Frederic Weisbecker wrote:
 > On Tue, Aug 04, 2015 at 08:12:50PM -0400, Dave Jones wrote:
 > > On Tue, Aug 04, 2015 at 12:54:35AM -0400, Sasha Levin wrote:
 > >  > On 08/03/2015 06:03 PM, Paul E. McKenney wrote:
 > >  > >> > Ugh, that doesn't revert cleanly.  Got something handy ?
 > >  > > I do not, but perhaps either Sasha or Frederic do.
 > >  > 
 > >  > I've attached a revert courtesy of Peter.
 > > 
 > > Thanks.  At first I thought this was doing the trick, but then I hit this again.
 > > 
 > > 
 > > [23643.545873] INFO: rcu_preempt detected stalls on CPUs/tasks:
 > > [23643.546031] 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P31722
 > > [23643.546173] 	(detected by 3, t=65002 jiffies, g=2256887, c=2256886, q=0)
 > > [23643.546326] trinity-watchdo R  running task    14336 31722  31721 0x00080000
 > > [23643.546488]  ffff8804fcfe7cc8 000000000000ded0 0000000000000002 ffff8804f58bb680
 > > [23643.546661]  ffff8800ce4951c0 ffff8804fcfe7cb8 ffff8804fcfe8000 ffff8804f6552608
 > > [23643.546830]  0000000000000009 ffff8804fcfe7e88 0000000000000009 ffff8804fcfe7ce8
 > > [23643.547001] Call Trace:
 > > [23643.547058]  [<ffffffff887fa2b2>] preempt_schedule_common+0x22/0x40
 > > [23643.547201]  [<ffffffff887fa2ef>] preempt_schedule+0x1f/0x30
 > > [23643.547329]  [<ffffffff88001058>] ___preempt_schedule+0x12/0x14
 > > [23643.547465]  [<ffffffff8808b76d>] ? do_send_sig_info+0x5d/0x80
 > > [23643.547599]  [<ffffffff887fff32>] ? _raw_spin_unlock_irqrestore+0x42/0x70
 > > [23643.547753]  [<ffffffff887fff50>] ? _raw_spin_unlock_irqrestore+0x60/0x70
 > > [23643.547910]  [<ffffffff8808b76d>] do_send_sig_info+0x5d/0x80
 > > [23643.548039]  [<ffffffff8808be62>] group_send_sig_info+0xb2/0x120
 > > [23643.548175]  [<ffffffff8808bdb5>] ? group_send_sig_info+0x5/0x120
 > > [23643.548314]  [<ffffffff880ea62f>] ? rcu_read_lock_held+0x4f/0x60
 > > [23643.548451]  [<ffffffff8808c05f>] kill_pid_info+0x7f/0x150
 > > [23643.548576]  [<ffffffff8808c000>] ? kill_pid_info+0x20/0x150
 > > [23643.548705]  [<ffffffff8808c244>] SYSC_kill+0xf4/0x2b0
 > > [23643.548821]  [<ffffffff8808c1ed>] ? SYSC_kill+0x9d/0x2b0
 > > [23643.548942]  [<ffffffff880d35cb>] ? trace_hardirqs_on_caller+0x14b/0x1e0
 > > [23643.549097]  [<ffffffff880d366d>] ? trace_hardirqs_on+0xd/0x10
 > > [23643.549231]  [<ffffffff88192f63>] ? context_tracking_user_exit+0x13/0x20
 > > [23643.549387]  [<ffffffff88012c47>] ? syscall_trace_enter_phase1+0xf7/0x150
 > > [23643.549540]  [<ffffffff88001017>] ? trace_hardirqs_on_thunk+0x17/0x19
 > > [23643.549687]  [<ffffffff8808e64e>] SyS_kill+0xe/0x10
 > > [23643.549799]  [<ffffffff88800997>] entry_SYSCALL_64_fastpath+0x12/0x6f
 > 
 > If it still happens after Sasha's revert, which basically revert all the offending
 > patches related to preempt lately, then the reason might be elsewhere.
 > 
 > How hard was it to reproduce? I see 23000 secs in your dmesg logs which is around 6 hours.

Interestingly, it happened again, but only after 7 hours.
I've yet to trigger it in a shorter timeframe. Frustrating.

[28190.798758] INFO: rcu_preempt detected stalls on CPUs/tasks:
[28190.798914] 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P32189
[28190.799054] 	(detected by 1, t=65002 jiffies, g=2137396, c=2137395, q=0)
[28190.799203] trinity-c224    R  running task    13856 32189  31964 0x00080000
[28190.799362]  ffff8804f2323da8 ffffffffa67fa4d1 ffff8804fe170000 ffff8804b66db680
[28190.799531]  ffff8804fe170000 ffff8804f2323d98 0000000000000000 ffff8804f2324000
[28190.799699]  0000000000000002 0000000000000000 0000000000000000 ffff8804f2323dc8
[28190.799866] Call Trace:
[28190.799921]  [<ffffffffa67fa4d1>] ? preempt_schedule_irq+0x41/0xa0
[28190.800058]  [<ffffffffa67fa4d7>] preempt_schedule_irq+0x47/0xa0
[28190.800191]  [<ffffffffa6801529>] retint_kernel+0x1b/0x2d
[28190.800312]  [<ffffffffa60d6319>] ? lock_acquire+0xd9/0x260
[28190.800438]  [<ffffffffa609d295>] ? __task_pid_nr_ns+0x5/0x190
[28190.800568]  [<ffffffffa680153b>] ? retint_kernel+0x2d/0x2d
[28190.800691]  [<ffffffffa609d2d2>] __task_pid_nr_ns+0x42/0x190
[28190.800820]  [<ffffffffa609d295>] ? __task_pid_nr_ns+0x5/0x190
[28190.800950]  [<ffffffffa6091f0b>] sys_gettid+0x1b/0x20
[28190.801064]  [<ffffffffa6800997>] entry_SYSCALL_64_fastpath+0x12/0x6f
[28190.801208] trinity-c224    R  running task    13856 32189  31964 0x00080000
[28190.801365]  ffff8804f2323da8 ffffffffa67fa4d1 ffff8804fe170000 ffff8804b66db680
[28190.801533]  ffff8804fe170000 ffff8804f2323d98 0000000000000000 ffff8804f2324000
[28190.801702]  0000000000000002 0000000000000000 0000000000000000 ffff8804f2323dc8
[28190.801870] Call Trace:
[28190.801923]  [<ffffffffa67fa4d1>] ? preempt_schedule_irq+0x41/0xa0
[28190.802060]  [<ffffffffa67fa4d7>] preempt_schedule_irq+0x47/0xa0
[28190.802193]  [<ffffffffa6801529>] retint_kernel+0x1b/0x2d
[28190.802313]  [<ffffffffa60d6319>] ? lock_acquire+0xd9/0x260
[28190.802436]  [<ffffffffa609d295>] ? __task_pid_nr_ns+0x5/0x190
[28190.802565]  [<ffffffffa680153b>] ? retint_kernel+0x2d/0x2d
[28190.802688]  [<ffffffffa609d2d2>] __task_pid_nr_ns+0x42/0x190
[28190.802815]  [<ffffffffa609d295>] ? __task_pid_nr_ns+0x5/0x190
[28190.802945]  [<ffffffffa6091f0b>] sys_gettid+0x1b/0x20
[28190.803058]  [<ffffffffa6800997>] entry_SYSCALL_64_fastpath+0x12/0x6f
[29929.492752] INFO: rcu_preempt detected stalls on CPUs/tasks:
[29929.492906] 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P289
[29929.493039] 	(detected by 0, t=65002 jiffies, g=2141006, c=2141005, q=0)
[29929.493188] systemd-journal R  running task    12464   289      1 0x00080000
[29929.493347]  ffff8804ff2bbae8 ffffffffa67fa4d1 ffff880501f81b40 ffff880503d43680
[29929.493515]  ffff880501f81b40 ffff8804ff2bbad8 0000000000000000 ffff8804ff2bc000
[29929.493683]  ffff8800d3e9f118 ffff8800d3e9eb40 0000000000000056 ffff8804ff2bbb08
[29929.493853] Call Trace:
[29929.493909]  [<ffffffffa67fa4d1>] ? preempt_schedule_irq+0x41/0xa0
[29929.494046]  [<ffffffffa67fa4d7>] preempt_schedule_irq+0x47/0xa0
[29929.494181]  [<ffffffffa6801529>] retint_kernel+0x1b/0x2d
[29929.494304]  [<ffffffffa67f9929>] ? __schedule+0x439/0xb20
[29929.494430]  [<ffffffffa6001058>] ? ___preempt_schedule+0x12/0x14
[29929.494568]  [<ffffffffa6001058>] ? ___preempt_schedule+0x12/0x14
[29929.494709]  [<ffffffffa66b8b11>] ? sock_def_readable+0x161/0x190
[29929.501118]  [<ffffffffa60ed468>] ? rcu_is_watching+0x38/0x60
[29929.507566]  [<ffffffffa60ed481>] ? rcu_is_watching+0x51/0x60
[29929.513987]  [<ffffffffa66b8b11>] sock_def_readable+0x161/0x190
[29929.520344]  [<ffffffffa66b89b5>] ? sock_def_readable+0x5/0x190
[29929.526678]  [<ffffffffa67ffe85>] ? _raw_spin_unlock+0x35/0x60
[29929.532988]  [<ffffffffa67986c9>] unix_dgram_sendmsg+0x4f9/0x570
[29929.539184]  [<ffffffffa66b509b>] ___sys_sendmsg+0x30b/0x320
[29929.545270]  [<ffffffffa60cfe7e>] ? put_lock_stats.isra.29+0xe/0x30
[29929.551331]  [<ffffffffa638a137>] ? debug_smp_processor_id+0x17/0x20
[29929.557285]  [<ffffffffa60cfe7e>] ? put_lock_stats.isra.29+0xe/0x30
[29929.563204]  [<ffffffffa60ad681>] ? get_parent_ip+0x11/0x50
[29929.569047]  [<ffffffffa60ad813>] ? preempt_count_sub+0xa3/0xf0
[29929.574796]  [<ffffffffa621a626>] ? __fget_light+0x66/0x90
[29929.580555]  [<ffffffffa6192d53>] ? context_tracking_exit+0x43/0x240
[29929.586253]  [<ffffffffa66b5712>] __sys_sendmsg+0x42/0x80
[29929.591843]  [<ffffffffa66b5762>] SyS_sendmsg+0x12/0x20
[29929.597385]  [<ffffffffa6800997>] entry_SYSCALL_64_fastpath+0x12/0x6f


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ