linux-kernel - Re: [PATCH 4/4] zone_reclaim

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090520140045.GA29447@sgi.com>
Date:	Wed, 20 May 2009 09:00:45 -0500
From:	Robin Holt <holt@....com>
To:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
Cc:	Robin Holt <holt@....com>,
	Christoph Lameter <cl@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>,
	linux-mm <linux-mm@...ck.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Rik van Riel <riel@...hat.com>
Subject: Re: [PATCH 4/4] zone_reclaim_mode is always 0 by default

On Tue, May 19, 2009 at 11:53:44AM +0900, KOSAKI Motohiro wrote:
> Hi
> 
> > > Current linux policy is, zone_reclaim_mode is enabled by default if the machine
> > > has large remote node distance. it's because we could assume that large distance 
> > > mean large server until recently.
> > > 
> > > Unfortunately, recent modern x86 CPU (e.g. Core i7, Opeteron) have P2P transport
> > > memory controller. IOW it's seen as NUMA from software view.
> > > 
> > > Some Core i7 machine has large remote node distance, but zone_reclaim don't
> > > fit desktop and small file server. it cause performance degression.
> > > 
> > > Thus, zone_reclaim == 0 is better by default if the machine is small.
> > 
> > What if I had a node 0 with 32GB or 128GB of memory.  In that case,
> > we would have 3GB for DMA32, 125GB for Normal and then a node 1 with
> > 128GB.  I would suggest that zone reclaim would perform normally and
> > be beneficial.
> > 
> > You are unfairly classifying this as a size of machine problem when it is
> > really a problem with the underlying zone reclaim code being triggered
> > due to imbalanced node/zones, part of which is due to a single node
> > having multiple zones and those multiple zones setting up the conditions
> > for extremely agressive reclaim.  In other words, you are putting a
> > bandage in place to hide a problem on your particular hardware.
> > 
> > Can RECLAIM_DISTANCE be adjusted so your Ci7 boxes are no longer caught?
> > Aren't 4 node Ci7 boxes soon to be readily available?  How are your apps
> > different from my apps in that you are not impacted by node locality?
> > Are you being too insensitive to node locality?  Conversely am I being
> > too sensitive?
> > 
> > All that said, I would not stop this from going in.  I just think the
> > selection criteria is rather random.  I think we know the condition we
> > are trying to avoid which is a small Normal zone on one node and a larger
> > Normal zone on another causing zone reclaim to be overly agressive.
> > I don't know how to quantify "small" versus "large".  I would suggest
> > that a node 0 with 16 or more GB should have zone reclaim on by default
> > as well.  Can that be expressed in the selection criteria.
> 
> I post my opinion as another mail. please see it.

I don't think you addressed my actual question.  How much of this is
a result of having a node where 1/4 of the memory is in the 'Normal'
zone and 3/4 is in the DMA32 zone?  How much is due to the imbalance
between Node 0 'Normal' and Node 1 'Normal'?  Shouldn't that type of
sanity check be used for turning on zone reclaim instead of some random
number of nodes.  Even with 128 nodes and 256 cpus, I _NEVER_ see the
system swapping out before allocating off node so I can certainly not
reproduce the situation you are seeing.

The imbalance I have seen was when I had two small memory nodes and two
large memory nodes and then oversubscribed memory.  In that situation,
I noticed that the apps on the small memory nodes were more frequently
impacted.  This unfairness made sense to me and seemed perfectly
reasonable.

Thanks,
Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/