lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 19 May 2009 11:38:26 +0800 From: "Zhang, Yanmin" <yanmin.zhang@...el.com> To: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>, "Wu, Fengguang" <fengguang.wu@...el.com> CC: LKML <linux-kernel@...r.kernel.org>, linux-mm <linux-mm@...ck.org>, Andrew Morton <akpm@...ux-foundation.org>, Rik van Riel <riel@...hat.com>, Christoph Lameter <cl@...ux-foundation.org> Subject: RE: [PATCH 4/4] zone_reclaim_mode is always 0 by default >>-----Original Message----- >>From: KOSAKI Motohiro [mailto:kosaki.motohiro@...fujitsu.com] >>Sent: 2009年5月19日 10:54 >>To: Wu, Fengguang >>Cc: kosaki.motohiro@...fujitsu.com; LKML; linux-mm; Andrew Morton; Rik van >>Riel; Christoph Lameter; Zhang, Yanmin >>Subject: Re: [PATCH 4/4] zone_reclaim_mode is always 0 by default >> >>> On Wed, May 13, 2009 at 12:08:12PM +0900, KOSAKI Motohiro wrote: >>> > Subject: [PATCH] zone_reclaim_mode is always 0 by default >>> > >>> > Current linux policy is, if the machine has large remote node distance, >>> > zone_reclaim_mode is enabled by default because we've be able to assume >> >>ok, I would explain zone reclaim design and performance tendency. >> >>Firstly, we can make classification of linux eco system, roughly. >> - HPC >> - high-end server >> - volume server >> - desktop >> - embedded >> >>it is separated by typical workload mainly. >> >>Secondly, zone_reclaim mean "I strongly dislike remote node access than >>disk access". >>it is very fitting on HPC workload. it because >> - HPC workload typically make the number of the same as cpus of processess >>(or thread). >> IOW, the workload typically use memory equally each node. >> - HPC workload is typically CPU bounded job. CPU migration is rare. >> - HPC workload is typically long lived. (possible >1 year) >> IOW, remote node allocation makes _very_ _very_ much remote node access. >> >>but zone_reclaim don't fit typical server workload. >> - server workload often make thread pool and some thread is sleeping until >> a request receved. >> IOW, when thread waking-up, the thread might move another cpu. >> node distance tendency don't make sense on weak cpu locality workload. >> >>Plus, disk-cache is the file-server's identity. we shouldn't think it's not >>important. >>Plus, DB software can consume almost system memory and (In general) RDB data >>makes >>harder to split equally as hpc. >> >>desktop workload is special. desktop peopole can run various workload beyond >>our assumption. So, we shouldn't have any workload assumption to desktop >>people. >>However, AFAIK almost desktop software use memory as UMA. >> >>we don't need to care embedded. it is typically UMA. >> >> >>IOW, the benefit of zone reclaim depend on "strong cpu locality" and >>"workload is cpu bounded" and "thead is long lived". >>but many workload don't fill above requirement. IOW, zone reclaim is >>workload depended feature (as Wu said). >> >> >>In general, the feature of workload depended don't fit default option. >>we can't know end-user run what workload anyway. >> >>Fortunately (or Unfortunately), typical workload and machine size had >>significant mutuality. >>Thus, the current default setting calculation had worked well in past days. [YM] Your analysis is clear and deep. >> >>Now, it was breaked. What should we do? >>Yanmin, We know 99% linux people use intel cpu and you are one of >>most hard repeated testing [YM] It's very easy to reproduce them on my machines. :) Sometimes, because the issues only exist on machines with lots of cpu while other community developers have no such environments. guy in lkml and you have much test. >>May I ask your tested machine and benchmark? [YM] Usually I started lots of benchmark testing against the latest kernel, but as for this issue, it's reported by a customer firstly. The customer runs apache on Nehalem machines to access lots of files. So the issue is an example of file server. BTW, I found many test cases of fio have big drop after I upgraded BIOS of one Nehalem machine. By checking vmstat data, I found almost a half memory is always free. It's also related to zone_reclaim_mode because new BIOS changes the node distance to a large value. I use numactl --interleave=all to walkaround the problem temporarily. I have no HPC environment. >> >>if zone_reclaim=0 tendency workload is much than zone_reclaim=1 tendency >>workload, >> we can drop our afraid and we would prioritize your opinion, of cource. So it seems only file servers have the issue currently. Yanmin
Powered by blists - more mailing lists