linux-kernel - Re: linux-next ppc64: RCU mods cause __might

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120507185017.GA21152@linux.vnet.ibm.com>
Date:	Mon, 7 May 2012 11:50:17 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Hugh Dickins <hughd@...gle.com>
Cc:	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	"Paul E. McKenney" <paul.mckenney@...aro.org>,
	Christoph Lameter <cl@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org
Subject: Re: linux-next ppc64: RCU mods cause __might_sleep BUGs

On Mon, May 07, 2012 at 09:21:54AM -0700, Hugh Dickins wrote:
> On Wed, 2 May 2012, Hugh Dickins wrote:
> > On Wed, 2 May 2012, Paul E. McKenney wrote:
> > > 
> > > In any case, I must confess that I feel quite silly about my series
> > > of patches.  I have reverted them aside from a couple that did useful
> > > optimizations, and they should show up in -next shortly.
> > 
> > A wee bit sad, but thank you - it was an experiment worth trying,
> > and perhaps there will be reason to come back to it future.
> 
> The revert indeed showed up in next-20120504: thanks, no problem now.
> 
> But although it's just history, and not worth anyone's time to
> investigate, I shouldn't let this thread die without an epilogue.
> 
> Although the patch I posted (this_cpu_inc in __rcu_read_lock,
> preempt_disable and enable in __rcu_read_unlock) ran well until
> I killed the test after 70 hours, it did not _entirely_ eliminate
> the sleeping function BUG messages.
> 
> In 70 hours I got six isolated messages like the below (but from
> different __might_sleep callsites) - where before I'd have flurries
> of hundreds(?) and freeze within the hour.
> 
> And the "rcu_nesting" debug line I'd added to the message was different:
> where before it was showing ffffffff on some tasks and 1 on others i.e.
> increment or decrement had been applied to the wrong task, these messages
> now all showed 0s throughout i.e. by the time the message was printed,
> there was no longer any justification for the message.
> 
> As if a memory barrier were missing somewhere, perhaps.

These fields should be updated only by the corresponding CPU, so
if memory barriers are needed, it seems to me that the cross-CPU
access is the bug, not the lack of a memory barrier.

Ah...  Is preemption disabled across the access to RCU's nesting level
when printing out the message?  If not, a preeemption at that point
could result in the value printed being inaccurate.

							Thanx, Paul

> BUG: sleeping function called from invalid context at arch/powerpc/mm/fault.c:305
> cpu=2 preempt_count=0 preempt_offset=0 rcu_nesting=0 nesting_save=0
> in_atomic(): 0, irqs_disabled(): 0, pid: 12266, name: cc1
> Call Trace:
> [c000000003affac0] [c00000000000f36c] .show_stack+0x6c/0x16c (unreliable)
> [c000000003affb70] [c000000000078788] .__might_sleep+0x150/0x170
> [c000000003affc00] [c0000000000255f4] .do_page_fault+0x288/0x664
> [c000000003affe30] [c000000000005868] handle_page_fault+0x10/0x30
> 
> Hugh
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/