linux-kernel - Re: linux-next ppc64: RCU mods cause __might

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LSU.2.00.1205070858450.1416@eggly.anvils>
Date:	Mon, 7 May 2012 09:21:54 -0700 (PDT)
From:	Hugh Dickins <hughd@...gle.com>
To:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
cc:	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	"Paul E. McKenney" <paul.mckenney@...aro.org>,
	Christoph Lameter <cl@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org
Subject: Re: linux-next ppc64: RCU mods cause __might_sleep BUGs

On Wed, 2 May 2012, Hugh Dickins wrote:
> On Wed, 2 May 2012, Paul E. McKenney wrote:
> > 
> > In any case, I must confess that I feel quite silly about my series
> > of patches.  I have reverted them aside from a couple that did useful
> > optimizations, and they should show up in -next shortly.
> 
> A wee bit sad, but thank you - it was an experiment worth trying,
> and perhaps there will be reason to come back to it future.

The revert indeed showed up in next-20120504: thanks, no problem now.

But although it's just history, and not worth anyone's time to
investigate, I shouldn't let this thread die without an epilogue.

Although the patch I posted (this_cpu_inc in __rcu_read_lock,
preempt_disable and enable in __rcu_read_unlock) ran well until
I killed the test after 70 hours, it did not _entirely_ eliminate
the sleeping function BUG messages.

In 70 hours I got six isolated messages like the below (but from
different __might_sleep callsites) - where before I'd have flurries
of hundreds(?) and freeze within the hour.

And the "rcu_nesting" debug line I'd added to the message was different:
where before it was showing ffffffff on some tasks and 1 on others i.e.
increment or decrement had been applied to the wrong task, these messages
now all showed 0s throughout i.e. by the time the message was printed,
there was no longer any justification for the message.

As if a memory barrier were missing somewhere, perhaps.

BUG: sleeping function called from invalid context at arch/powerpc/mm/fault.c:305
cpu=2 preempt_count=0 preempt_offset=0 rcu_nesting=0 nesting_save=0
in_atomic(): 0, irqs_disabled(): 0, pid: 12266, name: cc1
Call Trace:
[c000000003affac0] [c00000000000f36c] .show_stack+0x6c/0x16c (unreliable)
[c000000003affb70] [c000000000078788] .__might_sleep+0x150/0x170
[c000000003affc00] [c0000000000255f4] .do_page_fault+0x288/0x664
[c000000003affe30] [c000000000005868] handle_page_fault+0x10/0x30

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/