linux-kernel - Re: [RFC PATCH 0/6] Configurable fair allocation zone policy v3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131219112051.GH11295@suse.de>
Date:	Thu, 19 Dec 2013 11:20:51 +0000
From:	Mel Gorman <mgorman@...e.de>
To:	Johannes Weiner <hannes@...xchg.org>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Dave Hansen <dave.hansen@...el.com>,
	Rik van Riel <riel@...hat.com>,
	Linux-MM <linux-mm@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 0/6] Configurable fair allocation zone policy v3

On Wed, Dec 18, 2013 at 02:48:13PM -0500, Johannes Weiner wrote:
> > <SNIP>
> > 
> > Sure about the name?
> > 
> > This is a boolean and "mode" implies it might be a bitmask. That said, I
> > recognise that my own naming also sucked because complaining about yours
> > I can see that mine also sucks.
> 
> Is it because of how we use zone_reclaim_mode? I don't see anything
> wrong with a "mode" toggle that switches between only two modes of
> operation instead of three or more.  But English being a second
> language and all...
> 

It's not just zone_reclaim_mode. Most references to mode in the VM (but
not all because who needs consistentcy) refer to either a mask or multiple
potential values. isolate_mode_t, gfp masks referred to as mode, memory
policies described as mode, migration modes etc.

Intuitively, I expect "mode" to not be a binary value.

> > > @@ -1816,7 +1833,7 @@ static void zlc_clear_zones_full(struct zonelist *zonelist)
> > >  
> > >  static bool zone_local(struct zone *local_zone, struct zone *zone)
> > >  {
> > > -	return node_distance(local_zone->node, zone->node) == LOCAL_DISTANCE;
> > > +	return local_zone->node == zone->node;
> > >  }
> > 
> > Does that not break on !CONFIG_NUMA?
> > 
> > It's why I used zone_to_nid
> 
> There is a separate definition for !CONFIG_NUMA, it fit nicely next to
> the zlc stuff.
> 

Ah, fair enough.

> > >  static bool zone_allows_reclaim(struct zone *local_zone, struct zone *zone)
> > > @@ -1908,22 +1925,25 @@ zonelist_scan:
> > >  		if (unlikely(alloc_flags & ALLOC_NO_WATERMARKS))
> > >  			goto try_this_zone;
> > >  		/*
> > > -		 * Distribute pages in proportion to the individual
> > > -		 * zone size to ensure fair page aging.  The zone a
> > > -		 * page was allocated in should have no effect on the
> > > -		 * time the page has in memory before being reclaimed.
> > > +		 * Distribute pagecache pages in proportion to the
> > > +		 * individual zone size to ensure fair page aging.
> > > +		 * The zone a page was allocated in should have no
> > > +		 * effect on the time the page has in memory before
> > > +		 * being reclaimed.
> > >  		 *
> > > -		 * When zone_reclaim_mode is enabled, try to stay in
> > > -		 * local zones in the fastpath.  If that fails, the
> > > +		 * When pagecache_mempolicy_mode or zone_reclaim_mode
> > > +		 * is enabled, try to allocate from zones within the
> > > +		 * preferred node in the fastpath.  If that fails, the
> > >  		 * slowpath is entered, which will do another pass
> > >  		 * starting with the local zones, but ultimately fall
> > >  		 * back to remote zones that do not partake in the
> > >  		 * fairness round-robin cycle of this zonelist.
> > >  		 */
> > > -		if (alloc_flags & ALLOC_WMARK_LOW) {
> > > +		if ((alloc_flags & ALLOC_WMARK_LOW) &&
> > > +		    (gfp_mask & __GFP_PAGECACHE)) {
> > >  			if (zone_page_state(zone, NR_ALLOC_BATCH) <= 0)
> > >  				continue;
> > 
> > NR_ALLOC_BATCH is updated regardless of zone_reclaim_mode or
> > pagecache_mempolicy_mode. We only reset batch in the prepare_slowpath in
> > some cases. Looks a bit fishy even though I can't quite put my finger on it.
> > 
> > I also got details wrong here in the v3 of the series. In an unreleased
> > v4 of the series I had corrected the treatment of slab pages in line
> > with your wishes and reused the broken out helper in prepare_slowpath to
> > keep the decision in sync.
> > 
> > It's still in development but even if it gets rejected it'll act as a
> > comparison point to yours.
> > 
> > > -			if (zone_reclaim_mode &&
> > > +			if ((zone_reclaim_mode || pagecache_mempolicy_mode) &&
> > >  			    !zone_local(preferred_zone, zone))
> > >  				continue;
> > >  		}
> > 
> > Documention says "enabling pagecache_mempolicy_mode, in which case page cache
> > allocations will be placed according to the configured memory policy". Should
> > that be !pagecache_mempolicy_mode? I'm getting confused with the double nots.
> 
> Yes, it's a bit weird.
> 
> We want to consider the round-robin batches for local zones but at the
> same time avoid exhausted batches from pushing the allocation off-node
> when either of those modes are enabled.  So in the fastpath we filter
> for both and in the slowpath, once kswapd has been woken at the same
> time that the batches have been reset to launch the new aging cycle,
> we try in order of zonelist preference.
> 
> However, to answer your question above, if the slowpath still has to
> fall back to a remote zone, we don't want to reset its batch because
> we didn't verify it was actually exhausted in the fastpath and we
> could risk cutting short the aging cycle for that particular zone.

Understood, thanks.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/