linux-kernel - Re: [PATCH 4/4] zone_reclaim

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 19 May 2009 13:30:40 +0900 (JST)
From:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
To:	"Zhang, Yanmin" <yanmin.zhang@...el.com>
Cc:	kosaki.motohiro@...fujitsu.com,
	"Wu, Fengguang" <fengguang.wu@...el.com>,
	LKML <linux-kernel@...r.kernel.org>,
	linux-mm <linux-mm@...ck.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Rik van Riel <riel@...hat.com>,
	Christoph Lameter <cl@...ux-foundation.org>
Subject: Re: [PATCH 4/4] zone_reclaim_mode is always 0 by default

> >>-----Original Message-----
> >>From: KOSAKI Motohiro [mailto:kosaki.motohiro@...fujitsu.com]
> >>Sent: 2009ト�.ヤツ19ネユ 10:54
> >>To: Wu, Fengguang
> >>Cc: kosaki.motohiro@...fujitsu.com; LKML; linux-mm; Andrew Morton; Rik van
> >>Riel; Christoph Lameter; Zhang, Yanmin
> >>Subject: Re: [PATCH 4/4] zone_reclaim_mode is always 0 by default
> >>
> >>> On Wed, May 13, 2009 at 12:08:12PM +0900, KOSAKI Motohiro wrote:
> >>> > Subject: [PATCH] zone_reclaim_mode is always 0 by default
> >>> >
> >>> > Current linux policy is, if the machine has large remote node distance,
> >>> >  zone_reclaim_mode is enabled by default because we've be able to assume
> 
> >>
> >>ok, I would explain zone reclaim design and performance tendency.
> >>
> >>Firstly, we can make classification of linux eco system, roughly.
> >> - HPC
> >> - high-end server
> >> - volume server
> >> - desktop
> >> - embedded
> >>
> >>it is separated by typical workload mainly.
> >>
> >>Secondly, zone_reclaim mean "I strongly dislike remote node access than
> >>disk access".
> >>it is very fitting on HPC workload. it because
> >>  - HPC workload typically make the number of the same as cpus of processess
> >>(or thread).
> >>    IOW, the workload typically use memory equally each node.
> >>  - HPC workload is typically CPU bounded job. CPU migration is rare.
> >>  - HPC workload is typically long lived. (possible >1 year)
> >>    IOW, remote node allocation makes _very_ _very_ much remote node access.
> >>
> >>but zone_reclaim don't fit typical server workload.
> >>  - server workload often make thread pool and some thread is sleeping until
> >>    a request receved.
> >>    IOW, when thread waking-up, the thread might move another cpu.
> >>    node distance tendency don't make sense on weak cpu locality workload.
> >>
> >>Plus, disk-cache is the file-server's identity. we shouldn't think it's not
> >>important.
> >>Plus, DB software can consume almost system memory and (In general) RDB data
> >>makes
> >>harder to split equally as hpc.
> >>
> >>desktop workload is special. desktop peopole can run various workload beyond
> >>our assumption. So, we shouldn't have any workload assumption to desktop
> >>people.
> >>However, AFAIK almost desktop software use memory as UMA.
> >>
> >>we don't need to care embedded. it is typically UMA.
> >>
> >>
> >>IOW, the benefit of zone reclaim depend on "strong cpu locality" and
> >>"workload is cpu bounded" and "thead is long lived".
> >>but many workload don't fill above requirement. IOW, zone reclaim is
> >>workload depended feature (as Wu said).
> >>
> >>
> >>In general, the feature of workload depended don't fit default option.
> >>we can't know end-user run what workload anyway.
> >>
> >>Fortunately (or Unfortunately), typical workload and machine size had
> >>significant mutuality.
> >>Thus, the current default setting calculation had worked well in past days.
> [YM] Your analysis is clear and deep.

Thanks!


> >>Now, it was breaked. What should we do?
> >>Yanmin, We know 99% linux people use intel cpu and you are one of
> >>most hard repeated testing
> [YM] It's very easy to reproduce them on my machines. :) Sometimes, because the 
> issues only exist on machines with lots of cpu while other community developers
> have no such environments. 
>
> 
>  guy in lkml and you have much test.
> >>May I ask your tested machine and benchmark?
> [YM] Usually I started lots of benchmark testing against the latest kernel, but 
> as for this issue, it's reported by a customer firstly. The customer runs apache
> on Nehalem machines to access lots of files. So the issue is an example of file 
> server.

hmmm. 
I'm surprised this report. I didn't know this problem. oh..

Actually, I don't think apache is only file server.
apache is one of killer application in linux. it run on very widely organization.
you think large machine don't run apache? I don't think so.



> BTW, I found many test cases of fio have big drop after I upgraded BIOS of one 
> Nehalem machine. By checking vmstat data, I found almost a half memory is always free. It's also related to zone_reclaim_mode because new BIOS changes the node
> distance to a large value. I use numactl --interleave=all to walkaround the problem temporarily.
> 
> I have no HPC environment.

Yeah, that's ok. I and cristoph have. My worries is my unknown workload become regression.
so, May I assume you run your benchmark both zonre reclaim 0 and 1 and you 
haven't seen regression by non-zone reclaim mode?
if so, it encourage very much to me.

if zone reclaim mode disabling don't have regression, I'll pushing to 
remove default zone reclaim mode completely again.


> >>if zone_reclaim=0 tendency workload is much than zone_reclaim=1 tendency
> >>workload,
> >> we can drop our afraid and we would prioritize your opinion, of cource.
> So it seems only file servers have the issue currently.





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/