linux-kernel - Re: Default zone_reclaim_mode = 1 on NUMA kernel is bad forfile/email/web servers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <52C8765522A740A4A5C027E8FDFFDFE3@jem>
Date:	Tue, 21 Sep 2010 09:41:21 +1000
From:	"Rob Mueller" <robm@...tmail.fm>
To:	"Mel Gorman" <mel@....ul.ie>,
	"KOSAKI Motohiro" <kosaki.motohiro@...fujitsu.com>
Cc:	<linux-kernel@...r.kernel.org>,
	"Bron Gondwana" <brong@...tmail.fm>,
	"linux-mm" <linux-mm@...ck.org>,
	"Christoph Lameter" <cl@...ux-foundation.org>
Subject: Re: Default zone_reclaim_mode = 1 on NUMA kernel is bad forfile/email/web servers

> I don't think we will ever get the default value for this tunable right.
> I would also worry that avoiding the reclaim_mode for file-backed
> cache will hurt HPC applications that are dumping their data to disk
> and depending on the existing default for zone_reclaim_mode to not
> pollute other nodes.
>
> The ideal would be if distribution packages for mail, web servers
> and others that are heavily IO orientated would prompt for a change
> to the default value of zone_reclaim_mode in sysctl.

I would argue that there's a lot more mail/web/file servers out there than 
HPC machines. And HPC machines tend to have a team of people to 
monitor/tweak them. I think it would be much more sane to default this to 0 
which works best for most people, and get the HPC people to change it.

However there's still another question, why is this problem happening at all 
for us? I know almost nothing about NUMA, but from other posts, it sounds 
like the problem is the memory allocations are all happening on one node? 
But I don't understand why that would be happening. The machine runs the 
cyrus IMAP server, which is a classic unix forking server with 1000's of 
processes. Each process will mmap lots of different files to access them. 
Why would that all be happening on one node, not spread around?

One thing is that the machine is vastly more IO loaded than CPU loaded, in 
fact it uses very little CPU at all (a few % usually). Does the kernel 
prefer to run processes on one particular node if it's available? So if a 
machine has very little CPU load, every process will generally end up 
running on the same node?

Rob

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/