[<prev] [next>] [day] [month] [year] [list]
Message-ID: <53440BD6.5030008@agliodbs.com>
Date: Tue, 08 Apr 2014 10:46:46 -0400
From: Josh Berkus <josh@...iodbs.com>
To: Christoph Lameter <cl@...ux.com>, Vlastimil Babka <vbabka@...e.cz>
CC: Mel Gorman <mgorman@...e.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Robert Haas <robertmhaas@...il.com>,
Andres Freund <andres@...quadrant.com>,
Linux-MM <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>, sivanich@....com
Subject: Re: [PATCH 0/2] Disable zone_reclaim_mode by default
On 04/08/2014 10:17 AM, Christoph Lameter wrote:
> Another solution here would be to increase the threshhold so that
> 4 socket machines do not enable zone reclaim by default. The larger the
> NUMA system is the more memory is off node from the perspective of a
> processor and the larger the hit from remote memory.
8 and 16 socket machines aren't common for nonspecialist workloads
*now*, but by the time these changes make it into supported distribution
kernels, they may very well be. So having zone_reclaim_mode
automatically turn itself on if you have more than 8 sockets would still
be a booby-trap ("Boss, I dunno. I installed the additional processors
and memory performance went to hell!")
For zone_reclaim_mode=1 to be useful on standard servers, both of the
following need to be true:
1. the user has to have set CPU affinity for their applications;
2. the applications can't need more than one memory bank worth of cache.
The thing is, there is *no way* for Linux to know if the above is true.
Now, I can certainly imagine non-HPC workloads for which both of the
above would be true; for example, I've set up VMware ESX servers where
each VM has one socket and one memory bank. However, if the user knows
enough to set up socket affinity, they know enough to set
zone_reclaim_mode = 1. The default should cover the know-nothing case,
not the experienced specialist case.
I'd also argue that there's a fundamental false assumption in the entire
algorithm of zone_reclaim_mode, because there is no memory bank which is
as distant as disk is, ever. However, if it's off by default, then I
don't care.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists