lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 22 Oct 2011 14:21:23 +0800
From:	Nai Xia <nai.xia@...il.com>
To:	Paweł Sikora <pluto@...k.net>
Cc:	Hugh Dickins <hughd@...gle.com>, arekm@...-linux.org,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-mm@...ck.org, Mel Gorman <mgorman@...e.de>,
	jpiszcz@...idpixels.com, linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Andrea Arcangeli <aarcange@...hat.com>
Subject: Re: kernel 3.0: BUG: soft lockup: find_get_pages+0x51/0x110

On Saturday 22 October 2011 05:36:46 Paweł Sikora wrote:
> On Friday 21 of October 2011 11:07:56 Nai Xia wrote:
> > On Fri, Oct 21, 2011 at 4:07 PM, Pawel Sikora <pluto@...k.net> wrote:
> > > On Friday 21 of October 2011 14:22:37 Nai Xia wrote:
> > >
> > >> And as a side note. Since I notice that Pawel's workload may include OOM,
> > >
> > > my last tests on patched (3.0.4 + migrate.c fix + vserver) kernel produce full cpu load
> > > on dual 8-cores opterons like on this htop screenshot -> http://pluto.agmk.net/kernel/screen1.png
> > > afaics all userspace applications usualy don't use more than half of physical memory
> > > and so called "cache" on htop bar doesn't reach the 100%.
> > 
> > OK,did you logged any OOM killing if there was some memory usage burst?
> > But, well my above OOM reasoning is a direct short cut to imagined
> > root cause of "adjacent VMAs which
> > should have been merged but in fact not merged" case.
> > Maybe there are other cases that can lead to this or maybe it's
> > totally another bug....
> 
> i don't see any OOM killing with my conservative settings
> (vm.overcommit_memory=2, vm.overcommit_ratio=100).

OK, that does not matter now. Andrea showed us a simpler way to goto
this bug. 

> 
> > But still I think if my reasoning is good, similar bad things will
> > happen again some time in the future,
> > even if it was not your case here...
> > 
> > >
> > > the patched kernel with disabled CONFIG_TRANSPARENT_HUGEPAGE (new thing in 2.6.38)
> > > died at night, so now i'm going to disable also CONFIG_COMPACTION/MIGRATION in next
> > > steps and stress this machine again...
> > 
> > OK, it's smart to narrow down the range first....
> 
> disabling hugepage/compacting didn't help but disabling hugepage/compacting/migration keeps
> opterons stable for ~9h so far. userspace uses ~40GB (from 64) ram, caches reach 100% on htop bar,
> average load ~16. i wonder if it survive weekend...
> 

Maybe you should give another shot of Andrea's latest anon_vma_order_tail() patch. :)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ