lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45E8A677.7000205@redhat.com>
Date:	Fri, 02 Mar 2007 17:34:31 -0500
From:	Rik van Riel <riel@...hat.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
CC:	Bill Irwin <bill.irwin@...cle.com>,
	Christoph Lameter <clameter@...r.sgi.com>,
	Mel Gorman <mel@...net.ie>, npiggin@...e.de, mingo@...e.hu,
	jschopp@...tin.ibm.com, arjan@...radead.org,
	torvalds@...ux-foundation.org, mbligh@...igh.org,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: The performance and behaviour of the anti-fragmentation related
 patches

Andrew Morton wrote:
> On Fri, 02 Mar 2007 17:03:10 -0500
> Rik van Riel <riel@...hat.com> wrote:
> 
>> Andrew Morton wrote:
>>> On Fri, 02 Mar 2007 16:19:19 -0500
>>> Rik van Riel <riel@...hat.com> wrote:
>>>> Bill Irwin wrote:
>>>>> On Fri, Mar 02, 2007 at 01:23:28PM -0500, Rik van Riel wrote:
>>>>>> With 32 CPUs diving into the page reclaim simultaneously,
>>>>>> each trying to scan a fraction of memory, this is disastrous
>>>>>> for performance.  A 256GB system should be even worse.
>>>>> Thundering herds of a sort pounding the LRU locks from direct reclaim
>>>>> have set off the NMI oopser for users here.
>>>> Ditto here.
>>> Opterons?
>> It's happened on IA64, too.  Probably on Intel x86-64 as well.
> 
> Opterons seem to be particularly prone to lock starvation where a cacheline
> gets captured in a single package for ever.
> 
>>>> The main reason they end up pounding the LRU locks is the
>>>> swappiness heuristic.  They scan too much before deciding
>>>> that it would be a good idea to actually swap something
>>>> out, and with 32 CPUs doing such scanning simultaneously...
>>> What kernel version?
>> Customers are on the 2.6.9 based RHEL4 kernel, but I believe
>> we have reproduced the problem on 2.6.18 too during stress
>> tests.
> 
> The prev_priority fixes were post-2.6.18

We tested them.  They only alleviate the problem slightly in
good situations, but things still fall apart badly with less
friendly workloads.

>> I have no reason to believe we should stick our heads in the
>> sand and pretend it no longer exists on 2.6.21.
> 
> I have no reason to believe anything.  All I see is handwaviness,
> speculation and grand plans to rewrite vast amounts of stuff without even a
> testcase to demonstrate that said rewrite improved anything.

Your attitude is exactly why the VM keeps falling apart over
and over again.

Fixing "a testcase" in the VM tends to introduce problems for
other test cases, ad infinitum. There's a reason we end up
fixing the same bugs over and over again.

I have been looking through a few hundred VM related bugzillas
and have found the same bugs persist over many different
versions of Linux, sometimes temporarily fixed, but they seem
to always come back eventually...

> None of this is going anywhere, is is it?

I will test my changes before I send them to you, but I cannot
promise you that you'll have the computers or software needed
to reproduce the problems.  I doubt I'll have full time access
to such systems myself, either.

32GB is pretty much the minimum size to reproduce some of these
problems. Some workloads may need larger systems to easily trigger
them.

-- 
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ