lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131112212902.GA4725@sgi.com>
Date:	Tue, 12 Nov 2013 15:29:02 -0600
From:	Alex Thorlton <athorlton@....com>
To:	Mel Gorman <mgorman@...e.de>
Cc:	Rik van Riel <riel@...hat.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: BUG: mm, numa: test segfaults, only when NUMA balancing is on

On Fri, Nov 08, 2013 at 04:13:29PM -0600, Alex Thorlton wrote:
> On Fri, Nov 08, 2013 at 11:20:54AM +0000, Mel Gorman wrote:
> > On Thu, Nov 07, 2013 at 03:48:38PM -0600, Alex Thorlton wrote:
> > > > Try the following patch on top of 3.12. It's a patch that is expected to
> > > > be merged for 3.13. On its own it'll hurt automatic NUMA balancing in
> > > > -stable but corruption trumps performance and the full series is not
> > > > going to be considered acceptable for -stable
> > > 
> > > I gave this patch a shot, and it didn't seem to solve the problem.
> > > Actually I'm running into what appear to be *worse* problems on the 3.12
> > > kernel.  Here're a couple stack traces of what I get when I run the test
> > > on 3.12, 512 cores:
> > > 
> > 
> > Ok, so there are two issues at least. Whatever is causing your
> > corruption (which I still cannot reproduce) and the fact that 3.12 is
> > worse. The largest machine I've tested with is 40 cores. I'm trying to
> > get time on a 60 core machine to see if has a better chance. I will not
> > be able to get access to anything resembling 512 cores.
> 
> At this point, the smallest machine I've been able to recreate this
> issue on has been a 128 core, but it's rare on a machine that small.
> I'll kick off a really long run on a 64 core over the weekend and see if
> I can hit it on there at all, but I haven't been able to previously.

Just a quick update, I ran this test 500 times on 64 cores, allocating
512m per core, and every single test completed successfully.  At this
point, it looks like you definitely need at least 128 cores to reproduce
the issue.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ