[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1285677740.30176.1397281937@webmail.messagingengine.com>
Date: Tue, 28 Sep 2010 22:42:20 +1000
From: "Bron Gondwana" <brong@...tmail.fm>
To: "Christoph Lameter" <cl@...ux.com>,
"Robert Mueller" <robm@...tmail.fm>
Cc: "KOSAKI Motohiro" <kosaki.motohiro@...fujitsu.com>,
"Mel Gorman" <mel@....ul.ie>,
"Linux Kernel Mailing List" <linux-kernel@...r.kernel.org>,
"linux-mm" <linux-mm@...ck.org>
Subject: Re: Default zone_reclaim_mode = 1 on NUMA kernel is bad
forfile/email/web servers
On Tue, 28 Sep 2010 07:35 -0500, "Christoph Lameter" <cl@...ux.com> wrote:
> > The problem we saw was purely with file caching. The application wasn't
> > actually allocating much memory itself, but it was reading lots of files
> > from disk (via mmap'ed memory mostly), and as most people would, we
> > expected that data would be cached in memory to reduce future reads from
> > disk. That was not happening.
>
> Obviously and you have stated that numerous times. Problem that the use
> of
> a remote memory will reduced performance of reads so the OS (with
> zone_reclaim=1) defaults to the use of local memory and favors reclaim of
> local memory over the allocation from the remote node. This is fine if
> you have multiple applications running on both nodes because then each
> application will get memory local to it and therefore run faster. That
> does not work with a single app that only allocates from one node.
Is this what's happening, or is IO actually coming from disk in preference
to the remote node? I can certainly see the logic behind preferring to
reclaim the local node if that's all that's happening - though the OS should
be allocating the different tasks more evenly across the nodes in that case.
> Control over memory allocations over the various nodes under NUMA
> for a process can occur via the numactl ctl or the libnuma C apis.
>
> F.e.e
>
> numactl --interleave ... command
>
> will address that issue for a specific command that needs to go
Gosh what a pain. While it won't kill us too much to add to our
startup, it does feel a lot like the tail is wagging the dog from here
still. A task that doesn't ask for anything special should get sane
defaults, and the cost of data from the other node should be a lot
less than the cost of the same data from spinning rust.
Bron.
--
Bron Gondwana
brong@...tmail.fm
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists