linux-kernel - Re: BUG: mm, numa: test segfaults, only when NUMA balancing is on

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20131112212902.GA4725@sgi.com>
Date:	Tue, 12 Nov 2013 15:29:02 -0600
From:	Alex Thorlton <athorlton@....com>
To:	Mel Gorman <mgorman@...e.de>
Cc:	Rik van Riel <riel@...hat.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: BUG: mm, numa: test segfaults, only when NUMA balancing is on

On Fri, Nov 08, 2013 at 04:13:29PM -0600, Alex Thorlton wrote:
> On Fri, Nov 08, 2013 at 11:20:54AM +0000, Mel Gorman wrote:
> > On Thu, Nov 07, 2013 at 03:48:38PM -0600, Alex Thorlton wrote:
> > > > Try the following patch on top of 3.12. It's a patch that is expected to
> > > > be merged for 3.13. On its own it'll hurt automatic NUMA balancing in
> > > > -stable but corruption trumps performance and the full series is not
> > > > going to be considered acceptable for -stable
> > > 
> > > I gave this patch a shot, and it didn't seem to solve the problem.
> > > Actually I'm running into what appear to be *worse* problems on the 3.12
> > > kernel.  Here're a couple stack traces of what I get when I run the test
> > > on 3.12, 512 cores:
> > > 
> > 
> > Ok, so there are two issues at least. Whatever is causing your
> > corruption (which I still cannot reproduce) and the fact that 3.12 is
> > worse. The largest machine I've tested with is 40 cores. I'm trying to
> > get time on a 60 core machine to see if has a better chance. I will not
> > be able to get access to anything resembling 512 cores.
> 
> At this point, the smallest machine I've been able to recreate this
> issue on has been a 128 core, but it's rare on a machine that small.
> I'll kick off a really long run on a 64 core over the weekend and see if
> I can hit it on there at all, but I haven't been able to previously.

Just a quick update, I ran this test 500 times on 64 cores, allocating
512m per core, and every single test completed successfully.  At this
point, it looks like you definitely need at least 128 cores to reproduce
the issue.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/