lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131217152954.GA24067@suse.de>
Date:	Tue, 17 Dec 2013 15:29:54 +0000
From:	Mel Gorman <mgorman@...e.de>
To:	Johannes Weiner <hannes@...xchg.org>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Dave Hansen <dave.hansen@...el.com>,
	Rik van Riel <riel@...hat.com>,
	Linux-MM <linux-mm@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 5/7] mm: page_alloc: Make zone distribution page aging
 policy configurable

On Mon, Dec 16, 2013 at 03:42:15PM -0500, Johannes Weiner wrote:
> On Fri, Dec 13, 2013 at 02:10:05PM +0000, Mel Gorman wrote:
> > Commit 81c0a2bb ("mm: page_alloc: fair zone allocator policy") solved a
> > bug whereby new pages could be reclaimed before old pages because of
> > how the page allocator and kswapd interacted on the per-zone LRU lists.
> > Unfortunately it was missed during review that a consequence is that
> > we also round-robin between NUMA nodes. This is bad for two reasons
> > 
> > 1. It alters the semantics of MPOL_LOCAL without telling anyone
> > 2. It incurs an immediate remote memory performance hit in exchange
> >    for a potential performance gain when memory needs to be reclaimed
> >    later
> > 
> > No cookies for the reviewers on this one.
> > 
> > This patch makes the behaviour of the fair zone allocator policy
> > configurable.  By default it will only distribute pages that are going
> > to exist on the LRU between zones local to the allocating process. This
> > preserves the historical semantics of MPOL_LOCAL.
> > 
> > By default, slab pages are not distributed between zones after this patch is
> > applied. It can be argued that they should get similar treatment but they
> > have different lifecycles to LRU pages, the shrinkers are not zone-aware
> > and the interaction between the page allocator and kswapd is different
> > for slabs. If it turns out to be an almost universal win, we can change
> > the default.
> > 
> > Signed-off-by: Mel Gorman <mgorman@...e.de>
> > ---
> >  Documentation/sysctl/vm.txt |  32 ++++++++++++++
> >  include/linux/mmzone.h      |   2 +
> >  include/linux/swap.h        |   2 +
> >  kernel/sysctl.c             |   8 ++++
> >  mm/page_alloc.c             | 102 ++++++++++++++++++++++++++++++++++++++------
> >  5 files changed, 134 insertions(+), 12 deletions(-)
> > 
> > diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
> > index 1fbd4eb..8eaa562 100644
> > --- a/Documentation/sysctl/vm.txt
> > +++ b/Documentation/sysctl/vm.txt
> > @@ -56,6 +56,7 @@ Currently, these files are in /proc/sys/vm:
> >  - swappiness
> >  - user_reserve_kbytes
> >  - vfs_cache_pressure
> > +- zone_distribute_mode
> >  - zone_reclaim_mode
> >  
> >  ==============================================================
> > @@ -724,6 +725,37 @@ causes the kernel to prefer to reclaim dentries and inodes.
> >  
> >  ==============================================================
> >  
> > +zone_distribute_mode
> > +
> > +Pages allocation and reclaim are managed on a per-zone basis. When the
> > +system needs to reclaim memory, candidate pages are selected from these
> > +per-zone lists.  Historically, a potential consequence was that recently
> > +allocated pages were considered reclaim candidates. From a zone-local
> > +perspective, page aging was preserved but from a system-wide perspective
> > +there was an age inversion problem.
> > +
> > +A similar problem occurs on a node level where young pages may be reclaimed
> > +from the local node instead of allocating remote memory. Unforuntately, the
> > +cost of accessing remote nodes is higher so the system must choose by default
> > +between favouring page aging or node locality. zone_distribute_mode controls
> > +how the system will distribute page ages between zones.
> > +
> > +0	= Never round-robin based on age
> 
> I think we should be very conservative with the userspace interface we
> export on a mechanism we are obviously just figuring out.
> 

And we have a proposal on how to limit this. I'll be layering another
patch on top and removes this interface again. That will allows us to
rollback one patch and still have a usable interface if necessary.

> > +Otherwise the values are ORed together
> > +
> > +1	= Distribute anon pages between zones local to the allocating node
> > +2	= Distribute file pages between zones local to the allocating node
> > +4	= Distribute slab pages between zones local to the allocating node
> 
> Zone fairness within a node does not affect mempolicy or remote
> reference costs.  Is there a reason to have this configurable?
> 

Symmetry

> > +The following three flags effectively alter MPOL_DEFAULT, be careful.
> > +
> > +8	= Distribute anon pages between zones remote to the allocating node
> > +16	= Distribute file pages between zones remote to the allocating node
> > +32	= Distribute slab pages between zones remote to the allocating node
> 
> Yes, it's conceivable that somebody might want to disable remote
> distribution because of the extra references.
> 
> But at this point, I'd much rather back out anon and slab distribution
> entirely, it was a mistake to include them.
> 
> That would leave us with a single knob to disable remote page cache
> placement.
> 

When looking at this closer I found that sysv is a weird exception. It's
file-backed as far as most of the VM is concerned but looks anonymous to
most applications that care. That and MAP_SHARED anonymous pages should
not be treated like files but we still want tmpfs to be treated as
files. Details will be in the changelog of the next series.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ