lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 14 May 2009 06:48:27 -0500
From:	Robin Holt <holt@....com>
To:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
Cc:	Rik van Riel <riel@...hat.com>,
	LKML <linux-kernel@...r.kernel.org>,
	linux-mm <linux-mm@...ck.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Christoph Lameter <cl@...ux-foundation.org>,
	Robin Holt <holt@....com>
Subject: Re: [PATCH 4/4] zone_reclaim_mode is always 0 by default

> Unfortunately no.
> zone reclaim has two weakness by design.
> 
> 1.
> zone reclaim don't works well when workingset size > local node size.
> but it can happen easily on small machine.
> if it happen, zone reclaim drop own process's memory.
> 
> Plus, zone reclaim also doesn't fit DB server. its process has large
> workingset.

Large DB server is not your typical desktop application either.

> 2.
> zone reclaim have inter zone balancing issue.
> 
> example: x86_64 2node 8G machine has following zone assignment
> 
>    zone 0 (DMA32):  3GB
>    zone 0 (Normal): 1GB
>    zone 1 (Normal): 4GB
> 
> if the page is allocated from DMA32, you are lucky. DMA32 isn't reclaimed
> so freqently. but if from zone0 Normal, you are unlucky.
> it is very frequent reclaimed although it is small than other zone.

I have seen that behavior on some of our mismatched large systems as well,
although never had one so imbalanced because ia64 only has Normal.

> I know my patch change large server default. but I believe linux
> default kernel parameter adapt to desktop and entry machine.

If this imbalance is an x86_64 only problem, then we could do something
simple like the following untested patch.  This leaves the default
for everyone except x86_64.

Robin

------------------------------------------------------------------------

Even if there is a great node distance on x86_64, disable zone reclaim
by default.  This was done to handle the imbalanced zone sizes where a
majority of the memory in zone 0 is DMA32 with a small remaining Normal
which will be aggressively reclaimed.

For other architectures, we leave the default behavior.

Signed-off-by: Robin Holt <holt@....com>
Cc: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
Cc: Christoph Lameter <cl@...ux-foundation.org>
Cc: Rik van Riel <riel@...hat.com>

---
 arch/x86/include/asm/topology.h |    2 ++
 include/linux/topology.h        |    5 +++++
 mm/page_alloc.c                 |    2 +-
 3 files changed, 8 insertions(+), 1 deletion(-)
Index: page_reclaim_mode/arch/x86/include/asm/topology.h
===================================================================
--- page_reclaim_mode.orig/arch/x86/include/asm/topology.h	2009-05-14 06:44:20.118925713 -0500
+++ page_reclaim_mode/arch/x86/include/asm/topology.h	2009-05-14 06:44:21.251067716 -0500
@@ -128,6 +128,8 @@ extern unsigned long node_remap_size[];
 
 #endif
 
+#define DEFAULT_ZONE_RECLAIM_MODE	0
+
 /* sched_domains SD_NODE_INIT for NUMA machines */
 #define SD_NODE_INIT (struct sched_domain) {		\
 	.min_interval		= 8,			\
Index: page_reclaim_mode/include/linux/topology.h
===================================================================
--- page_reclaim_mode.orig/include/linux/topology.h	2009-05-14 06:44:20.070919619 -0500
+++ page_reclaim_mode/include/linux/topology.h	2009-05-14 06:44:21.279071382 -0500
@@ -61,6 +61,11 @@ int arch_update_cpu_topology(void);
  */
 #define RECLAIM_DISTANCE 20
 #endif
+
+#ifndef DEFAULT_ZONE_RECLAIM_MODE
+#define DEFAULT_ZONE_RECLAIM_MODE	1
+#endif
+
 #ifndef PENALTY_FOR_NODE_WITH_CPUS
 #define PENALTY_FOR_NODE_WITH_CPUS	(1)
 #endif
Index: page_reclaim_mode/mm/page_alloc.c
===================================================================
--- page_reclaim_mode.orig/mm/page_alloc.c	2009-05-14 06:44:20.138928363 -0500
+++ page_reclaim_mode/mm/page_alloc.c	2009-05-14 06:44:21.311075244 -0500
@@ -2331,7 +2331,7 @@ static void build_zonelists(pg_data_t *p
 		 * to reclaim pages in a zone before going off node.
 		 */
 		if (distance > RECLAIM_DISTANCE)
-			zone_reclaim_mode = 1;
+			zone_reclaim_mode = DEFAULT_ZONE_RECLAIM_MODE;
 
 		/*
 		 * We don't want to pressure a particular node.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ