lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 22 May 2014 16:49:00 -0700
From:	Kevin Hilman <khilman@...aro.org>
To:	Vlastimil Babka <vbabka@...e.cz>
Cc:	Joonsoo Kim <iamjoonsoo.kim@....com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	David Rientjes <rientjes@...gle.com>,
	Hugh Dickins <hughd@...gle.com>,
	Greg Thelen <gthelen@...gle.com>,
	LKML <linux-kernel@...r.kernel.org>, linux-mm@...ck.org,
	Minchan Kim <minchan@...nel.org>, Mel Gorman <mgorman@...e.de>,
	Bartlomiej Zolnierkiewicz <b.zolnierkie@...sung.com>,
	Michal Nazarewicz <mina86@...a86.com>,
	Christoph Lameter <cl@...ux.com>,
	Rik van Riel <riel@...hat.com>,
	Olof Johansson <olof@...om.net>,
	Stephen Warren <swarren@...dotorg.org>
Subject: Re: [PATCH v2] mm, compaction: properly signal and act upon lock and
 need_sched() contention

On Fri, May 16, 2014 at 2:47 AM, Vlastimil Babka <vbabka@...e.cz> wrote:
> Compaction uses compact_checklock_irqsave() function to periodically check for
> lock contention and need_resched() to either abort async compaction, or to
> free the lock, schedule and retake the lock. When aborting, cc->contended is
> set to signal the contended state to the caller. Two problems have been
> identified in this mechanism.

This patch (or later version) has hit next-20140522 (in the form
commit 645ceea9331bfd851bc21eea456dda27862a10f4) and according to my
bisect, appears to be the culprit of several boot failures on ARM
platforms.

Unfortunately, there isn't much useful in the logs of the boot
failures/hangs since they mostly silently hang.  However, on one
platform (Marvell Armada 370 Mirabox), it reports a failure to
allocate memory, and the RCU stall detection kicks in:

[    1.298234] xhci_hcd 0000:02:00.0: xHCI Host Controller
[    1.303485] xhci_hcd 0000:02:00.0: new USB bus registered, assigned
bus number 1
[    1.310966] xhci_hcd 0000:02:00.0: Couldn't initialize memory
[   22.245395] INFO: rcu_sched detected stalls on CPUs/tasks: {}
(detected by 0, t=2102 jiffies, g=-282, c=-283, q=16)
[   22.255886] INFO: Stall ended before state dump start
[   48.095396] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s!
[swapper/0:1]

Reverting this commit makes them all happy again.

Kevin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists