lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20161130210152.GL3924@linux.vnet.ibm.com>
Date:   Wed, 30 Nov 2016 13:01:52 -0800
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Guenter Roeck <linux@...ck-us.net>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        sparclinux@...r.kernel.org, davem@...emloft.net
Subject: Re: next: Commit 'mm: Prevent __alloc_pages_nodemask() RCU CPU stall
 ...' causing hang on sparc32 qemu

On Wed, Nov 30, 2016 at 11:21:59AM -0800, Guenter Roeck wrote:
> On Wed, Nov 30, 2016 at 04:03:33AM -0800, Paul E. McKenney wrote:
> > On Wed, Nov 30, 2016 at 02:52:11AM -0800, Guenter Roeck wrote:
> > > On 11/29/2016 11:02 PM, Paul E. McKenney wrote:
> > > >On Tue, Nov 29, 2016 at 08:32:51PM -0800, Guenter Roeck wrote:
> > > >>On 11/29/2016 05:28 PM, Paul E. McKenney wrote:
> > > >>>On Tue, Nov 29, 2016 at 01:23:08PM -0800, Guenter Roeck wrote:
> > > >>>>Hi Paul,
> > > >>>>
> > > >>>>most of my qemu tests for sparc32 targets started to fail in next-20161129.
> > > >>>>The problem is only seen in SMP builds; non-SMP builds are fine.
> > > >>>>Bisect points to commit 2d66cccd73436 ("mm: Prevent __alloc_pages_nodemask()
> > > >>>>RCU CPU stall warnings"); reverting that commit fixes the problem.
> > 
> > And I have dropped this patch.  Michal Hocko showed me the error of
> > my ways with this patch.
> > 
> 
> :-)
> 
> On another note, I still get RCU tracebacks in the s390 tests.
> 
> BUG: sleeping function called from invalid context at mm/page_alloc.c:3775
> 
> That is caused by 'rcu: Maintain special bits at bottom of ->dynticks counter';
> if I recall correctly we had discussed that earlier.

Indeed, I had missed a dyntick counter update back on Nov 11, which meant
that some of the code was still looking at the low-order bit instead of
the next bit up.  This is now fixed.

So to get to the error message you call out above, I need to have improperly
left the system in bh state or left irqs disabled, while the system was
running normally without an oops.  I am having a hard time seeing how this
patch can do that.

I would be more suspicious of f2a471ffc8a8 ("rcu: Allow boot-time use
of cond_resched_rcu_qs()").

So you bisected or did a revert to work out which was the offending commit?

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ