lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 25 Sep 2012 16:36:54 +0900
From:	Minchan Kim <minchan@...nel.org>
To:	Mel Gorman <mgorman@...e.de>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Richard Davies <richard@...chsys.com>,
	Shaohua Li <shli@...nel.org>, Rik van Riel <riel@...hat.com>,
	Avi Kivity <avi@...hat.com>,
	QEMU-devel <qemu-devel@...gnu.org>, KVM <kvm@...r.kernel.org>,
	Linux-MM <linux-mm@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 6/9] mm: compaction: Acquire the zone->lock as late as
 possible

On Mon, Sep 24, 2012 at 09:52:38AM +0100, Mel Gorman wrote:
> On Fri, Sep 21, 2012 at 02:35:57PM -0700, Andrew Morton wrote:
> > On Fri, 21 Sep 2012 11:46:20 +0100
> > Mel Gorman <mgorman@...e.de> wrote:
> > 
> > > Compactions free scanner acquires the zone->lock when checking for PageBuddy
> > > pages and isolating them. It does this even if there are no PageBuddy pages
> > > in the range.
> > > 
> > > This patch defers acquiring the zone lock for as long as possible. In the
> > > event there are no free pages in the pageblock then the lock will not be
> > > acquired at all which reduces contention on zone->lock.
> > >
> > > ...
> > >
> > > --- a/mm/compaction.c
> > > +++ b/mm/compaction.c
> > > @@ -93,6 +93,28 @@ static inline bool compact_trylock_irqsave(spinlock_t *lock,
> > >  	return compact_checklock_irqsave(lock, flags, false, cc);
> > >  }
> > >  
> > > +/* Returns true if the page is within a block suitable for migration to */
> > > +static bool suitable_migration_target(struct page *page)
> > > +{
> > > +
> > 
> > stray newline
> > 
> 
> Fixed.
> 
> > > +	int migratetype = get_pageblock_migratetype(page);
> > > +
> > > +	/* Don't interfere with memory hot-remove or the min_free_kbytes blocks */
> > > +	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
> > > +		return false;
> > > +
> > > +	/* If the page is a large free page, then allow migration */
> > > +	if (PageBuddy(page) && page_order(page) >= pageblock_order)
> > > +		return true;
> > > +
> > > +	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
> > > +	if (migrate_async_suitable(migratetype))
> > > +		return true;
> > > +
> > > +	/* Otherwise skip the block */
> > > +	return false;
> > > +}
> > > +
> > >
> > > ...
> > >
> > > @@ -168,23 +193,38 @@ static unsigned long isolate_freepages_block(unsigned long blockpfn,
> > >  		int isolated, i;
> > >  		struct page *page = cursor;
> > >  
> > > -		if (!pfn_valid_within(blockpfn)) {
> > > -			if (strict)
> > > -				return 0;
> > > -			continue;
> > > -		}
> > > +		if (!pfn_valid_within(blockpfn))
> > > +			goto strict_check;
> > >  		nr_scanned++;
> > >  
> > > -		if (!PageBuddy(page)) {
> > > -			if (strict)
> > > -				return 0;
> > > -			continue;
> > > -		}
> > > +		if (!PageBuddy(page))
> > > +			goto strict_check;
> > > +
> > > +		/*
> > > +		 * The zone lock must be held to isolate freepages. This
> > > +		 * unfortunately this is a very coarse lock and can be
> > 
> > this this
> > 
> 
> Fixed.
> 
> > > +		 * heavily contended if there are parallel allocations
> > > +		 * or parallel compactions. For async compaction do not
> > > +		 * spin on the lock and we acquire the lock as late as
> > > +		 * possible.
> > > +		 */
> > > +		locked = compact_checklock_irqsave(&cc->zone->lock, &flags,
> > > +								locked, cc);
> > > +		if (!locked)
> > > +			break;
> > > +
> > > +		/* Recheck this is a suitable migration target under lock */
> > > +		if (!strict && !suitable_migration_target(page))
> > > +			break;
> > > +
> > > +		/* Recheck this is a buddy page under lock */
> > > +		if (!PageBuddy(page))
> > > +			goto strict_check;
> > >  
> > >  		/* Found a free page, break it into order-0 pages */
> > >  		isolated = split_free_page(page);
> > >  		if (!isolated && strict)
> > > -			return 0;
> > > +			goto strict_check;
> > >  		total_isolated += isolated;
> > >  		for (i = 0; i < isolated; i++) {
> > >  			list_add(&page->lru, freelist);
> > > @@ -196,9 +236,23 @@ static unsigned long isolate_freepages_block(unsigned long blockpfn,
> > >  			blockpfn += isolated - 1;
> > >  			cursor += isolated - 1;
> > >  		}
> > > +
> > > +		continue;
> > > +
> > > +strict_check:
> > > +		/* Abort isolation if the caller requested strict isolation */
> > > +		if (strict) {
> > > +			total_isolated = 0;
> > > +			goto out;
> > > +		}
> > >  	}
> > >  
> > >  	trace_mm_compaction_isolate_freepages(nr_scanned, total_isolated);
> > > +
> > > +out:
> > > +	if (locked)
> > > +		spin_unlock_irqrestore(&cc->zone->lock, flags);
> > > +
> > >  	return total_isolated;
> > >  }
> > 
> > A a few things about this function.
> > 
> > Would it be cleaner if we did
> > 
> > 	if (!strict) {
> > 		if (!suitable_migration_target(page))
> > 			break;
> > 	} else {
> > 		if (!PageBuddy(page))
> > 			goto strict_check;
> > 	}
> > 
> > and then remove the test of `strict' from strict_check (which then
> > should be renamed)?
> > 
> 
> I was not able to implement what you suggested without breakage.
> However, I can do something very similar if "strict" mode does not bail
> out ASAP and instead double checks at the end that everything was
> isolated correctly. This does mean that CMAs failure case is slightly
> more expensive but that really is a corner case and I think it's
> acceptable if the code is easier to follow as a result.
> 
> > Which perhaps means that the whole `strict_check' block can go away:
> > 
> > 	if (!strict) {
> > 		if (!suitable_migration_target(page))
> > 			break;
> > 	} else {
> > 		if (!PageBuddy(page)) {
> > 			total_isolated = 0;
> > 			goto out;
> > 	}
> > 
> > Have a think about it?  The function is a little straggly.
> > 
> 
> Proposal below.
> 
> > Secondly, it is correct/desirable to skip the (now misnamed
> > `trace_mm_compaction_isolate_freepages') tracepoint generation if we
> > baled out of the loop?  The fact that we entered
> > isolate_freepages_block() but failed to isolate anything is data which
> > people might be interested in?
> > 
> 
> You're right, it may be interesting for someone debugging CMA to know that
> nr_scanned != nr_isolated at the time of allocation failure. 
> 
> > Thirdly, that (existing) comment "This assumes the block is valid" is
> > either too vague or wrong.  If it's valid, why wo we check
> > pfn_valid_within()?
> > 
> 
> Comment is stale and should be removed.
> 
> > > @@ -218,13 +272,18 @@ static unsigned long isolate_freepages_block(unsigned long blockpfn,
> > >  unsigned long
> > >  isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn)
> > >  {
> > > -	unsigned long isolated, pfn, block_end_pfn, flags;
> > > +	unsigned long isolated, pfn, block_end_pfn;
> > >  	struct zone *zone = NULL;
> > >  	LIST_HEAD(freelist);
> > > +	struct compact_control cc;
> > >  
> > >  	if (pfn_valid(start_pfn))
> > >  		zone = page_zone(pfn_to_page(start_pfn));
> > >  
> > > +	/* cc needed for isolate_freepages_block to acquire zone->lock */
> > > +	cc.zone = zone;
> > > +	cc.sync = true;
> > 
> > We initialise two of cc's fields, leave the other 12 fields containing
> > random garbage, then start using it.  I see no bug here, but...
> > 
> 
> I get your point. At the very least we should initialise it with zeros.
> 
> How about this?
> 
> ---8<---
> mm: compaction: Iron out isolate_freepages_block() and isolate_freepages_range()
> 
> Andrew pointed out that isolate_freepages_block() is "straggly" and
> isolate_freepages_range() is making assumptions on how compact_control is
> used which is delicate. This patch straightens isolate_freepages_block()
> and makes it fly straight and initialses compact_control to zeros in
> isolate_freepages_range(). The code should be easier to follow and
> is functionally equivalent. The CMA failure path is now a little more
> expensive but that is a marginal corner-case.
> 
> Signed-off-by: Mel Gorman <mgorman@...e.de>
Acked-by: Minchan Kim <minchan@...nel.org>

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ