linux-kernel - Re: [PATCH v2 9/9] mm/compaction: new threshold for compaction depleted zone

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 14 Oct 2015 14:28:31 +0200
From:	Vlastimil Babka <vbabka@...e.cz>
To:	Joonsoo Kim <js1304@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Cc:	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	Mel Gorman <mgorman@...e.de>, Rik van Riel <riel@...hat.com>,
	David Rientjes <rientjes@...gle.com>,
	Minchan Kim <minchan@...nel.org>,
	Joonsoo Kim <iamjoonsoo.kim@....com>
Subject: Re: [PATCH v2 9/9] mm/compaction: new threshold for compaction
 depleted zone

On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
> Now, compaction algorithm become powerful. Migration scanner traverses
> whole zone range. So, old threshold for depleted zone which is designed
> to imitate compaction deferring approach isn't appropriate for current
> compaction algorithm. If we adhere to current threshold, 1, we can't
> avoid excessive overhead caused by compaction, because one compaction
> for low order allocation would be easily successful in any situation.
> 
> This patch re-implements threshold calculation based on zone size and
> allocation requested order. We judge whther compaction possibility is
> depleted or not by number of successful compaction. Roughly, 1/100
> of future scanned area should be allocated for high order page during
> one comaction iteration in order to determine whether zone's compaction
> possiblity is depleted or not.

Finally finishing my review, sorry it took that long...

> Below is test result with following setup.
> 
> Memory is artificially fragmented to make order 3 allocation hard. And,
> most of pageblocks are changed to movable migratetype.
> 
>   System: 512 MB with 32 MB Zram
>   Memory: 25% memory is allocated to make fragmentation and 200 MB is
>   	occupied by memory hogger. Most pageblocks are movable
>   	migratetype.
>   Fragmentation: Successful order 3 allocation candidates may be around
>   	1500 roughly.
>   Allocation attempts: Roughly 3000 order 3 allocation attempts
>   	with GFP_NORETRY. This value is determined to saturate allocation
>   	success.
> 
> Test: hogger-frag-movable
> 
> Success(N)                    94              83
> compact_stall               3642            4048
> compact_success              144             212
> compact_fail                3498            3835
> pgmigrate_success       15897219          216387
> compact_isolated        31899553          487712
> compact_migrate_scanned 59146745         2513245
> compact_free_scanned    49566134         4124319

The decrease in scanned/isolated/migrated counts looks definitely nice, but why
did success regress when compact_success improved substantially?

> This change results in greatly decreasing compaction overhead when
> zone's compaction possibility is nearly depleted. But, I should admit
> that it's not perfect because compaction success rate is decreased.
> More precise tuning threshold would restore this regression, but,
> it highly depends on workload so I'm not doing it here.
> 
> Other test doesn't show big regression.
> 
>   System: 512 MB with 32 MB Zram
>   Memory: 25% memory is allocated to make fragmentation and kernel
>   	build is running on background. Most pageblocks are movable
>   	migratetype.
>   Fragmentation: Successful order 3 allocation candidates may be around
>   	1500 roughly.
>   Allocation attempts: Roughly 3000 order 3 allocation attempts
>   	with GFP_NORETRY. This value is determined to saturate allocation
>   	success.
> 
> Test: build-frag-movable
> 
> Success(N)                    89              87
> compact_stall               4053            3642
> compact_success              264             202
> compact_fail                3788            3440
> pgmigrate_success        6497642          153413
> compact_isolated        13292640          353445
> compact_migrate_scanned 69714502         2307433
> compact_free_scanned    20243121         2325295

Here compact_success decreased relatively a lot, while success just barely.
Less counterintuitive than the first result, but still a bit.

> This looks like reasonable trade-off.
> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@....com>
> ---
>  mm/compaction.c | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index e61ee77..e1b44a5 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -129,19 +129,24 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
>  
>  /* Do not skip compaction more than 64 times */
>  #define COMPACT_MAX_FAILED 4
> -#define COMPACT_MIN_DEPLETE_THRESHOLD 1UL
> +#define COMPACT_MIN_DEPLETE_THRESHOLD 4UL
>  #define COMPACT_MIN_SCAN_LIMIT (pageblock_nr_pages)
>  
>  static bool compaction_depleted(struct zone *zone)
>  {
> -	unsigned long threshold;
> +	unsigned long nr_possible;
>  	unsigned long success = zone->compact_success;
> +	unsigned long threshold;
>  
> -	/*
> -	 * Now, to imitate current compaction deferring approach,
> -	 * choose threshold to 1. It will be changed in the future.
> -	 */
> -	threshold = COMPACT_MIN_DEPLETE_THRESHOLD;
> +	nr_possible = zone->managed_pages >> zone->compact_order_failed;
> +
> +	/* Migration scanner normally scans less than 1/4 range of zone */
> +	nr_possible >>= 2;
> +
> +	/* We hope to succeed more than 1/100 roughly */
> +	threshold = nr_possible >> 7;
> +
> +	threshold = max(threshold, COMPACT_MIN_DEPLETE_THRESHOLD);
>  	if (success >= threshold)
>  		return false;

I wonder if compact_depletion_depth should play some "positive" role here. The
bigger the depth, the lower the migration_scan_limit, which means higher chance
of failing and so on. Ideally, the system should stabilize itself, so that
migration_scan_limit is set based how many pages on average have to be scanned
to succeed?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/