lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 21 May 2009 11:27:12 +0800
From:	"Zhang, Yanmin" <yanmin.zhang@...el.com>
To:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	LKML <linux-kernel@...r.kernel.org>,
	linux-mm <linux-mm@...ck.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Rik van Riel <riel@...hat.com>,
	Christoph Lameter <cl@...ux-foundation.org>,
	Robin Holt <holt@....com>,
	"Wu, Fengguang" <fengguang.wu@...el.com>
Subject: RE: [PATCH v3] zone_reclaim is always 0 by default

>>-----Original Message-----
>>From: KOSAKI Motohiro [mailto:kosaki.motohiro@...fujitsu.com]
>>Sent: 2009年5月21日 10:47
>>To: LKML; linux-mm; Andrew Morton; Rik van Riel; Christoph Lameter; Robin Holt;
>>Zhang, Yanmin; Wu, Fengguang
>>Cc: kosaki.motohiro@...fujitsu.com
>>Subject: [PATCH v3] zone_reclaim is always 0 by default
>>
>>
>>Subject: [PATCH v3] zone_reclaim is always 0 by default
>>
>>Current linux policy is, zone_reclaim_mode is enabled by default if the machine
>>has large remote node distance. it's because we could assume that large distance
>>mean large server until recently.
>>
>>Unfortunately, recent modern x86 CPU (e.g. Core i7, Opeteron) have P2P
>>transport
>>memory controller. IOW it's seen as NUMA from software view.
>>Some Core i7 machine has large remote node distance.
>>
>>Yanmin reported zone_reclaim_mode=1 cause large apache regression.
>>
>>    One Nehalem machine has 12GB memory,
>>    but there is always 2GB free although applications accesses lots of files.
>>    Eventually we located the root cause as zone_reclaim_mode=1.
>>
>>Actually, zone_reclaim_mode=1 mean "I dislike remote node allocation rather
>>than
>>disk access", it makes performance improvement to HPC workload.
>>but it makes performance degression desktop, file server and web server.
>>
>>In general, workload depended configration shouldn't put into default
>>settings.
>>Plus, desktop and file/web server eco-system is much larger than hpc's.
>>
>>Thus, zone_reclaim == 0 is better by default.
[YM] Thanks. I started a series of testing on 2 Nehalem machines by setting
zone_reclaim_mode=0 (The default is 1 on the 2 machines). I didn't find
regression with non-disk_I/O (mostly cpubound) benchmarks. disk I/O benchmarks 
could benefit a little from zone_reclaim_mode=0. As I start benchmark fio with 
numactl --interleave=all, so the fio improvement is not so bigger like before.

One thing I need mention is my testing with non-disk_I/O might be not good examples
for this patch, because every node has far more memory than the testing needs.
Only some disk I/O benchmarks have big requirement on page cache memory, so they could benefit from zone_reclaim_mode=0.


>>
>>
>>Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
>>Cc: Christoph Lameter <cl@...ux-foundation.org>
>>Cc: Rik van Riel <riel@...hat.com>
>>Cc: Robin Holt <holt@....com>
>>Tested-by: "Zhang, Yanmin" <yanmin.zhang@...el.com>
>>Acked-by: Wu Fengguang <fengguang.wu@...el.com>
>>---
>> arch/ia64/include/asm/topology.h |    5 -----
>> include/linux/topology.h         |    9 +--------
>> mm/page_alloc.c                  |    7 -------
>> 3 files changed, 1 insertion(+), 20 deletions(-)
>>
>>Index: b/mm/page_alloc.c
>>===================================================================
>>--- a/mm/page_alloc.c
>>+++ b/mm/page_alloc.c
>>@@ -2494,13 +2494,6 @@ static void build_zonelists(pg_data_t *p
>> 		int distance = node_distance(local_node, node);
>>
>> 		/*
>>-		 * If another node is sufficiently far away then it is better
>>-		 * to reclaim pages in a zone before going off node.
>>-		 */
>>-		if (distance > RECLAIM_DISTANCE)
>>-			zone_reclaim_mode = 1;
>>-
>>-		/*
>> 		 * We don't want to pressure a particular node.
>> 		 * So adding penalty to the first node in same
>> 		 * distance group to make it round-robin.
>>Index: b/arch/ia64/include/asm/topology.h
>>===================================================================
>>--- a/arch/ia64/include/asm/topology.h
>>+++ b/arch/ia64/include/asm/topology.h
>>@@ -21,11 +21,6 @@
>> #define PENALTY_FOR_NODE_WITH_CPUS 255
>>
>> /*
>>- * Distance above which we begin to use zone reclaim
>>- */
>>-#define RECLAIM_DISTANCE 15
>>-
>>-/*
>>  * Returns the number of the node containing CPU 'cpu'
>>  */
>> #define cpu_to_node(cpu) (int)(cpu_to_node_map[cpu])
>>Index: b/include/linux/topology.h
>>===================================================================
>>--- a/include/linux/topology.h
>>+++ b/include/linux/topology.h
>>@@ -53,14 +53,7 @@ int arch_update_cpu_topology(void);
>> #ifndef node_distance
>> #define node_distance(from,to)	((from) == (to) ? LOCAL_DISTANCE :
>>REMOTE_DISTANCE)
>> #endif
>>-#ifndef RECLAIM_DISTANCE
>>-/*
>>- * If the distance between nodes in a system is larger than RECLAIM_DISTANCE
>>- * (in whatever arch specific measurement units returned by node_distance())
>>- * then switch on zone reclaim on boot.
>>- */
>>-#define RECLAIM_DISTANCE 20
>>-#endif
>>+
>> #ifndef PENALTY_FOR_NODE_WITH_CPUS
>> #define PENALTY_FOR_NODE_WITH_CPUS	(1)
>> #endif
>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ