lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 01 Sep 2011 07:41:57 +0200 From: Stefan Priebe - Profihost AG <s.priebe@...fihost.ag> To: Wu Fengguang <fengguang.wu@...el.com> CC: Zhu Yanhai <zhu.yanhai@...il.com>, Pekka Enberg <penberg@...nel.org>, LKML <linux-kernel@...r.kernel.org>, "linux-mm@...ck.org" <linux-mm@...ck.org>, Andrew Morton <akpm@...ux-foundation.org>, Mel Gorman <mel@....ul.ie>, Jens Axboe <jaxboe@...ionio.com>, Linux Netdev List <netdev@...r.kernel.org>, KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com> Subject: Re: slow performance on disk/network i/o full speed after drop_caches Thanks! Am 01.09.2011 06:14, schrieb Wu Fengguang: > Hi Stefan, > > On Wed, Aug 31, 2011 at 03:11:02PM +0800, Stefan Priebe - Profihost AG wrote: >> Hi Fengguang, >> Hi Yanhai, >> >>> you're abssolutely corect zone_reclaim_mode is on - but why? >>> There must be some linux software which switches it on. >>> >>> ~# grep 'zone_reclaim_mode' /etc/sysctl.* -r -i >>> ~# >>> >>> also >>> ~# grep 'zone_reclaim_mode' /etc/sysctl.* -r -i >>> ~# >>> >>> tells us nothing. >>> >>> I've then read this: >>> >>> "zone_reclaim_mode is set during bootup to 1 if it is determined that >>> pages from remote zones will cause a measurable performance reduction. >>> The page allocator will then reclaim easily reusable pages (those page >>> cache pages that are currently not used) before allocating off node pages." >>> >>> Why does the kernel do that here in our case on these machines. >> >> Can nobody help why the kernel in this case set it to 1? > > It's determined by RECLAIM_DISTANCE. > > build_zonelists(): > > /* > * If another node is sufficiently far away then it is better > * to reclaim pages in a zone before going off node. > */ > if (distance> RECLAIM_DISTANCE) > zone_reclaim_mode = 1; > > Since Linux v3.0 RECLAIM_DISTANCE is increased from 20 to 30 by this commit. > It may well help your case, too. > > commit 32e45ff43eaf5c17f5a82c9ad358d515622c2562 > Author: KOSAKI Motohiro<kosaki.motohiro@...fujitsu.com> > Date: Wed Jun 15 15:08:20 2011 -0700 > > mm: increase RECLAIM_DISTANCE to 30 > > Recently, Robert Mueller reported (http://lkml.org/lkml/2010/9/12/236) > that zone_reclaim_mode doesn't work properly on his new NUMA server (Dual > Xeon E5520 + Intel S5520UR MB). He is using Cyrus IMAPd and it's built on > a very traditional single-process model. > > * a master process which reads config files and manages the other > process > * multiple imapd processes, one per connection > * multiple pop3d processes, one per connection > * multiple lmtpd processes, one per connection > * periodical "cleanup" processes. > > There are thousands of independent processes. The problem is, recent > Intel motherboard turn on zone_reclaim_mode by default and traditional > prefork model software don't work well on it. Unfortunatelly, such models > are still typical even in the 21st century. We can't ignore them. > > This patch raises the zone_reclaim_mode threshold to 30. 30 doesn't have > any specific meaning. but 20 means that one-hop QPI/Hypertransport and > such relatively cheap 2-4 socket machine are often used for traditional > servers as above. The intention is that these machines don't use > zone_reclaim_mode. > > Note: ia64 and Power have arch specific RECLAIM_DISTANCE definitions. > This patch doesn't change such high-end NUMA machine behavior. > > Dave Hansen said: > > : I know specifically of pieces of x86 hardware that set the information > : in the BIOS to '21' *specifically* so they'll get the zone_reclaim_mode > : behavior which that implies. > : > : They've done performance testing and run very large and scary benchmarks > : to make sure that they _want_ this turned on. What this means for them > : is that they'll probably be de-optimized, at least on newer versions of > : the kernel. > : > : If you want to do this for particular systems, maybe _that_'s what we > : should do. Have a list of specific configurations that need the > : defaults overridden either because they're buggy, or they have an > : unusual hardware configuration not really reflected in the distance > : table. > > And later said: > > : The original change in the hardware tables was for the benefit of a > : benchmark. Said benchmark isn't going to get run on mainline until the > : next batch of enterprise distros drops, at which point the hardware where > : this was done will be irrelevant for the benchmark. I'm sure any new > : hardware will just set this distance to another yet arbitrary value to > : make the kernel do what it wants. :) > : > : Also, when the hardware got _set_ to this initially, I complained. So, I > : guess I'm getting my way now, with this patch. I'm cool with it. > > diff --git a/include/linux/topology.h b/include/linux/topology.h > index b91a40e..fc839bf 100644 > --- a/include/linux/topology.h > +++ b/include/linux/topology.h > @@ -60,7 +60,7 @@ int arch_update_cpu_topology(void); > * (in whatever arch specific measurement units returned by node_distance()) > * then switch on zone reclaim on boot. > */ > -#define RECLAIM_DISTANCE 20 > +#define RECLAIM_DISTANCE 30 > #endif > #ifndef PENALTY_FOR_NODE_WITH_CPUS > #define PENALTY_FOR_NODE_WITH_CPUS (1) > > Thanks, > Fengguang > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists