linux-kernel - Re: [patch v2] mm, compaction: avoid isolating pinned pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140206000504.GA17465@lge.com>
Date:	Thu, 6 Feb 2014 09:05:04 +0900
From:	Joonsoo Kim <iamjoonsoo.kim@....com>
To:	Hugh Dickins <hughd@...gle.com>
Cc:	David Rientjes <rientjes@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Mel Gorman <mgorman@...e.de>, Rik van Riel <riel@...hat.com>,
	Greg Thelen <gthelen@...gle.com>, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org
Subject: Re: [patch v2] mm, compaction: avoid isolating pinned pages

On Wed, Feb 05, 2014 at 12:56:40PM -0800, Hugh Dickins wrote:
> On Tue, 4 Feb 2014, David Rientjes wrote:
> 
> > Page migration will fail for memory that is pinned in memory with, for
> > example, get_user_pages().  In this case, it is unnecessary to take
> > zone->lru_lock or isolating the page and passing it to page migration
> > which will ultimately fail.
> > 
> > This is a racy check, the page can still change from under us, but in
> > that case we'll just fail later when attempting to move the page.
> > 
> > This avoids very expensive memory compaction when faulting transparent
> > hugepages after pinning a lot of memory with a Mellanox driver.
> > 
> > On a 128GB machine and pinning ~120GB of memory, before this patch we
> > see the enormous disparity in the number of page migration failures
> > because of the pinning (from /proc/vmstat):
> > 
> > 	compact_pages_moved 8450
> > 	compact_pagemigrate_failed 15614415
> > 
> > 0.05% of pages isolated are successfully migrated and explicitly 
> > triggering memory compaction takes 102 seconds.  After the patch:
> > 
> > 	compact_pages_moved 9197
> > 	compact_pagemigrate_failed 7
> > 
> > 99.9% of pages isolated are now successfully migrated in this 
> > configuration and memory compaction takes less than one second.
> > 
> > Signed-off-by: David Rientjes <rientjes@...gle.com>
> > ---
> >  v2: address page count issue per Joonsoo
> > 
> >  mm/compaction.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -578,6 +578,15 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> >  			continue;
> >  		}
> >  
> > +		/*
> > +		 * Migration will fail if an anonymous page is pinned in memory,
> > +		 * so avoid taking lru_lock and isolating it unnecessarily in an
> > +		 * admittedly racy check.
> > +		 */
> > +		if (!page_mapping(page) &&
> > +		    page_count(page) > page_mapcount(page))
> > +			continue;
> > +
> >  		/* Check if it is ok to still hold the lock */
> >  		locked = compact_checklock_irqsave(&zone->lru_lock, &flags,
> 
> Much better, maybe good enough as an internal patch to fix a particular
> problem you're seeing; but not yet good enough to go upstream.
> 
> Anonymous pages are not the only pages which might be pinned,
> and your test doesn't mention PageAnon, so does not match your comment.
> 
> I've remembered is_page_cache_freeable() in mm/vmscan.c, which gives
> more assurance that a page_count - page_has_private test is appropriate,
> whatever the filesystem and migrate method to be used.
> 
> So I think the test you're looking for is
> 
> 		pincount = page_count(page) - page_mapcount(page);
> 		if (page_mapping(page))
> 			pincount -= 1 + page_has_private(page);
> 		if (pincount > 0)
> 			continue;
> 
> but please cross-check and test that out, it's easy to be off-by-one etc.

Hello, Hugh.

I don't think that this is right.
One of migratepage function, aio_migratepage(), pass extra count 1 to
migrate_page_move_mapping(). So it can be migrated when pincount == 1 in
above test.

I think that we should not be aggressive here. This is just for prediction
so that it is better not to skip apropriate pages at most. Just for anon case
that we are sure easily is the right solution for me.

Thanks.

> 
> For a moment I thought a PageWriteback test would be useful too,
> but no, that should already appear in the pincount.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/