lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120809001212.GB17835@bbox>
Date:	Thu, 9 Aug 2012 09:12:12 +0900
From:	Minchan Kim <minchan@...nel.org>
To:	Mel Gorman <mgorman@...e.de>
Cc:	Linux-MM <linux-mm@...ck.org>, Rik van Riel <riel@...hat.com>,
	Jim Schutt <jaschut@...dia.gov>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 5/5] mm: have order > 0 compaction start near a pageblock
 with free pages

Hi Mel,

On Wed, Aug 08, 2012 at 08:08:44PM +0100, Mel Gorman wrote:
> commit [7db8889a: mm: have order > 0 compaction start off where it left]
> introduced a caching mechanism to reduce the amount work the free page
> scanner does in compaction. However, it has a problem. Consider two process
> simultaneously scanning free pages
> 
> 				    			C
> Process A		M     S     			F
> 		|---------------------------------------|
> Process B		M 	FS
> 
> C is zone->compact_cached_free_pfn
> S is cc->start_pfree_pfn
> M is cc->migrate_pfn
> F is cc->free_pfn
> 
> In this diagram, Process A has just reached its migrate scanner, wrapped
> around and updated compact_cached_free_pfn accordingly.
> 
> Simultaneously, Process B finishes isolating in a block and updates
> compact_cached_free_pfn again to the location of its free scanner.
> 
> Process A moves to "end_of_zone - one_pageblock" and runs this check
> 
>                 if (cc->order > 0 && (!cc->wrapped ||
>                                       zone->compact_cached_free_pfn >
>                                       cc->start_free_pfn))
>                         pfn = min(pfn, zone->compact_cached_free_pfn);
> 
> compact_cached_free_pfn is above where it started so the free scanner skips
> almost the entire space it should have scanned. When there are multiple
> processes compacting it can end in a situation where the entire zone is
> not being scanned at all.  Further, it is possible for two processes to
> ping-pong update to compact_cached_free_pfn which is just random.
> 
> Overall, the end result wrecks allocation success rates.
> 
> There is not an obvious way around this problem without introducing new
> locking and state so this patch takes a different approach.
> 
> First, it gets rid of the skip logic because it's not clear that it matters
> if two free scanners happen to be in the same block but with racing updates
> it's too easy for it to skip over blocks it should not.
> 
> Second, it updates compact_cached_free_pfn in a more limited set of
> circumstances.
> 
> If a scanner has wrapped, it updates compact_cached_free_pfn to the end
> 	of the zone. When a wrapped scanner isolates a page, it updates
> 	compact_cached_free_pfn to point to the highest pageblock it
> 	can isolate pages from.

Okay until here.

> 
> If a scanner has not wrapped when it has finished isolated pages it
> 	checks if compact_cached_free_pfn is pointing to the end of the
> 	zone. If so, the value is updated to point to the highest
> 	pageblock that pages were isolated from. This value will not
> 	be updated again until a free page scanner wraps and resets
> 	compact_cached_free_pfn.

I tried to understand your intention of this part but unfortunately failed.
By this part, the problem you mentioned could happen again?

 				    			C
 Process A		M     S     			F
 		|---------------------------------------|
 Process B		M 	FS
 
 C is zone->compact_cached_free_pfn
 S is cc->start_pfree_pfn
 M is cc->migrate_pfn
 F is cc->free_pfn

In this diagram, Process A has just reached its migrate scanner, wrapped
around and updated compact_cached_free_pfn to end of the zone accordingly.

Simultaneously, Process B finishes isolating in a block and peek 
compact_cached_free_pfn position and know it's end of the zone so
update compact_cached_free_pfn to highest pageblock that pages were
isolated from.

Process A updates compact_cached_free_pfn to the highest pageblock which
was set by process B because process A has wrapped. It ends up big jump
without any scanning in process A.

No?

> 
> This is not optimal and it can still race but the compact_cached_free_pfn
> will be pointing to or very near a pageblock with free pages.
> 
> Signed-off-by: Mel Gorman <mgorman@...e.de>
> Reviewed-by: Rik van Riel <riel@...hat.com>
> ---
>  mm/compaction.c |   54 ++++++++++++++++++++++++++++--------------------------
>  1 file changed, 28 insertions(+), 26 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index be310f1..df50f73 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -419,6 +419,20 @@ static bool suitable_migration_target(struct page *page)
>  }
>  
>  /*
> + * Returns the start pfn of the last page block in a zone.  This is the starting
> + * point for full compaction of a zone.  Compaction searches for free pages from
> + * the end of each zone, while isolate_freepages_block scans forward inside each
> + * page block.
> + */
> +static unsigned long start_free_pfn(struct zone *zone)
> +{
> +	unsigned long free_pfn;
> +	free_pfn = zone->zone_start_pfn + zone->spanned_pages;
> +	free_pfn &= ~(pageblock_nr_pages-1);
> +	return free_pfn;
> +}
> +
> +/*
>   * Based on information in the current compact_control, find blocks
>   * suitable for isolating free pages from and then isolate them.
>   */
> @@ -457,17 +471,6 @@ static void isolate_freepages(struct zone *zone,
>  					pfn -= pageblock_nr_pages) {
>  		unsigned long isolated;
>  
> -		/*
> -		 * Skip ahead if another thread is compacting in the area
> -		 * simultaneously. If we wrapped around, we can only skip
> -		 * ahead if zone->compact_cached_free_pfn also wrapped to
> -		 * above our starting point.
> -		 */
> -		if (cc->order > 0 && (!cc->wrapped ||
> -				      zone->compact_cached_free_pfn >
> -				      cc->start_free_pfn))
> -			pfn = min(pfn, zone->compact_cached_free_pfn);
> -
>  		if (!pfn_valid(pfn))
>  			continue;
>  
> @@ -510,7 +513,15 @@ static void isolate_freepages(struct zone *zone,
>  		 */
>  		if (isolated) {
>  			high_pfn = max(high_pfn, pfn);
> -			if (cc->order > 0)
> +
> +			/*
> +			 * If the free scanner has wrapped, update
> +			 * compact_cached_free_pfn to point to the highest
> +			 * pageblock with free pages. This reduces excessive
> +			 * scanning of full pageblocks near the end of the
> +			 * zone
> +			 */
> +			if (cc->order > 0 && cc->wrapped)
>  				zone->compact_cached_free_pfn = high_pfn;
>  		}
>  	}
> @@ -520,6 +531,11 @@ static void isolate_freepages(struct zone *zone,
>  
>  	cc->free_pfn = high_pfn;
>  	cc->nr_freepages = nr_freepages;
> +
> +	/* If compact_cached_free_pfn is reset then set it now */
> +	if (cc->order > 0 && !cc->wrapped &&
> +			zone->compact_cached_free_pfn == start_free_pfn(zone))
> +		zone->compact_cached_free_pfn = high_pfn;
>  }
>  
>  /*
> @@ -607,20 +623,6 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  	return ISOLATE_SUCCESS;
>  }
>  
> -/*
> - * Returns the start pfn of the last page block in a zone.  This is the starting
> - * point for full compaction of a zone.  Compaction searches for free pages from
> - * the end of each zone, while isolate_freepages_block scans forward inside each
> - * page block.
> - */
> -static unsigned long start_free_pfn(struct zone *zone)
> -{
> -	unsigned long free_pfn;
> -	free_pfn = zone->zone_start_pfn + zone->spanned_pages;
> -	free_pfn &= ~(pageblock_nr_pages-1);
> -	return free_pfn;
> -}
> -
>  static int compact_finished(struct zone *zone,
>  			    struct compact_control *cc)
>  {
> -- 
> 1.7.9.2
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ