lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 11 Jun 2012 11:05:22 +0900
From:	Minchan Kim <minchan@...nel.org>
To:	Andrew Morton <akpm@...ux-foundation.org>
CC:	Bartlomiej Zolnierkiewicz <b.zolnierkie@...sung.com>,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	Hugh Dickins <hughd@...gle.com>,
	KOSAKI Motohiro <kosaki.motohiro@...il.com>,
	Dave Jones <davej@...hat.com>, Cong Wang <amwang@...hat.com>,
	Markus Trippelsdorf <markus@...ppelsdorf.de>,
	Mel Gorman <mgorman@...e.de>, Rik van Riel <riel@...hat.com>,
	Marek Szyprowski <m.szyprowski@...sung.com>,
	Kyungmin Park <kyungmin.park@...sung.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE
 type pageblocks

Hi Andrew,

On 06/09/2012 06:18 AM, Andrew Morton wrote:

> On Fri, 08 Jun 2012 10:46:32 +0200
> Bartlomiej Zolnierkiewicz <b.zolnierkie@...sung.com> wrote:
> 
>>
>> Hi,
>>
>> This version is much simpler as it just uses __count_immobile_pages()
>> instead of using its own open coded version and it integrates changes
>> from Minchan Kim (without page_count change as it doesn't seem correct

>> and __count_immobile_pages() does the check in the standard way; if it

>> still is a problem I think that removing 1st phase check altogether
>> would be better instead of adding more locking complexity).
>>
>> The patch also adds compact_rescued_unmovable_blocks vmevent to vmstats
>> to make it possible to easily check if the code is working in practice.
>>
>> Best regards,
>> --
>> Bartlomiej Zolnierkiewicz
>> Samsung Poland R&D Center
>>
>>
>> From: Bartlomiej Zolnierkiewicz <b.zolnierkie@...sung.com>
>> Subject: [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
>>
>> When MIGRATE_UNMOVABLE pages are freed from MIGRATE_UNMOVABLE
>> type pageblock (and some MIGRATE_MOVABLE pages are left in it)
>> waiting until an allocation takes ownership of the block may
>> take too long.  The type of the pageblock remains unchanged
>> so the pageblock cannot be used as a migration target during
>> compaction.
>>
>> Fix it by:
>>
>> * Adding enum compact_mode (COMPACT_ASYNC_[MOVABLE,UNMOVABLE],
>>   and COMPACT_SYNC) and then converting sync field in struct
>>   compact_control to use it.
>>
>> * Adding nr_pageblocks_skipped field to struct compact_control
>>   and tracking how many destination pageblocks were of
>>   MIGRATE_UNMOVABLE type.  If COMPACT_ASYNC_MOVABLE mode compaction
>>   ran fully in try_to_compact_pages() (COMPACT_COMPLETE) it implies
>>   that there is not a suitable page for allocation.  In this case
>>   then check how if there were enough MIGRATE_UNMOVABLE pageblocks
>>   to try a second pass in COMPACT_ASYNC_UNMOVABLE mode.
>>
>> * Scanning the MIGRATE_UNMOVABLE pageblocks (during COMPACT_SYNC
>>   and COMPACT_ASYNC_UNMOVABLE compaction modes) and building
>>   a count based on finding PageBuddy pages, page_count(page) == 0
>>   or PageLRU pages.  If all pages within the MIGRATE_UNMOVABLE
>>   pageblock are in one of those three sets change the whole
>>   pageblock type to MIGRATE_MOVABLE.
>>
>> My particular test case (on a ARM EXYNOS4 device with 512 MiB,
>> which means 131072 standard 4KiB pages in 'Normal' zone) is to:
>> - allocate 95000 pages for kernel's usage
>> - free every second page (47500 pages) of memory just allocated
>> - allocate and use 60000 pages from user space
>> - free remaining 60000 pages of kernel memory
>> (now we have fragmented memory occupied mostly by user space pages)
>> - try to allocate 100 order-9 (2048 KiB) pages for kernel's usage
>>
>> The results:
>> - with compaction disabled I get 10 successful allocations
>> - with compaction enabled - 11 successful allocations
>> - with this patch I'm able to get 25 successful allocations
>>
>> NOTE: If we can make kswapd aware of order-0 request during
>> compaction, we can enhance kswapd with changing mode to
>> COMPACT_ASYNC_FULL (COMPACT_ASYNC_MOVABLE + COMPACT_ASYNC_UNMOVABLE).
>> Please see the following thread:
>>
>> 	http://marc.info/?l=linux-mm&m=133552069417068&w=2
>>
>>
>> ...
>>
>> --- a/include/linux/compaction.h	2012-06-08 09:01:32.041681656 +0200
>> +++ b/include/linux/compaction.h	2012-06-08 09:01:35.697681651 +0200
>> @@ -1,6 +1,8 @@
>>  #ifndef _LINUX_COMPACTION_H
>>  #define _LINUX_COMPACTION_H
>>  
>> +#include <linux/node.h>
> 
> Why was this addition needed?  (I think I asked this before)
> 
>>  /* Return values for compact_zone() and try_to_compact_pages() */
>>  /* compaction didn't start as it was not possible or direct reclaim was more suitable */
>>  #define COMPACT_SKIPPED		0
>>
>> ...
>>
>> +static bool can_rescue_unmovable_pageblock(struct page *page)
>> +{
>> +	struct zone *zone;
>> +	unsigned long pfn, start_pfn, end_pfn;
>> +	struct page *start_page;
>> +
>> +	zone = page_zone(page);
>> +	pfn = page_to_pfn(page);
>> +	start_pfn = pfn & ~(pageblock_nr_pages - 1);
>> +	start_page = pfn_to_page(start_pfn);
>> +
>> +	/*
>> +	 * Race with page allocator/reclaimer can happen so that it can
>> +	 * deceive unmovable block to migratable type on this pageblock.
>> +	 * It could regress on anti-fragmentation but it's rare and not
>> +	 * critical.
>> +	 */
> 
> This is quite ungramattical and needs a rewrite, please.  Suggest the


My bad.

> use of well-understood terms MIGRATE_UNMOVABLE, MIGRATE_MOVABLE etc
> rather than "unmovable block", etc.


Bartlomiej, Please rewrite.
It seems I have no talent for English. :(

> 
> Please explain "could regress" and also explain why it is "not critical".


Regress: It is possible that MOVABLE_BLOCK has unmovable pages.
NOT critical: We are already in danger of that because allocator fallback
              mechanism can allocate unmovable pages in movable block.

> 
>> +	return __count_immobile_pages(zone, start_page, 0);
>> +}
>> +
>> +static void rescue_unmovable_pageblock(struct page *page)
>> +{
>> +	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
>> +	move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
>> +
>> +	count_vm_event(COMPACT_RESCUED_UNMOVABLE_BLOCKS);
>> +}
>> +
>> +/*
>> + * MIGRATE_TARGET : good for migration target
>> + * RESCUE_UNMOVABLE_TARTET : good only if we can rescue the unmovable pageblock.
> 
> s/TARTET/TARGET/
> 
>> + * UNMOVABLE_TARGET : can't migrate because it's a page in unmovable pageblock.
>> + * SKIP_TARGET : can't migrate by another reasons.
>> + */
>> +enum smt_result {
>> +	MIGRATE_TARGET,
>> +	RESCUE_UNMOVABLE_TARGET,
>> +	UNMOVABLE_TARGET,
>> +	SKIP_TARGET,
>> +};
>>  
>>
>> ...
>>
>> @@ -5476,8 +5476,7 @@ void set_pageblock_flags_group(struct pa
>>   * page allocater never alloc memory from ISOLATE block.
>>   */
>>  
>> -static int
>> -__count_immobile_pages(struct zone *zone, struct page *page, int count)
>> +int __count_immobile_pages(struct zone *zone, struct page *page, int count)
> 
> We may as well fix the return type of this while we're in there.  It
> should be bool.
> 

> Also, the comment over __count_immobile_pages() is utter rubbish.  Can

> we please cook up a new one?  Something human-readable which also
> describes the return value.



The function name is rather awkward, too.
I will try that clean up.

> 
>>  {
>>  	unsigned long pfn, iter, found;
>>  	int mt;
>> @@ -5500,6 +5499,11 @@ __count_immobile_pages(struct zone *zone
>>  			continue;
>>  
>>  		page = pfn_to_page(check);
>> +
>> +		/* Do not deal with pageblocks that overlap zones */
>> +		if (page_zone(page) != zone)
>> +			return false;
> 
> I don't really understand this bit.  Wasn't it wrong to walk across
> zones in the original code?  Did you do something which will newly
> cause this to walk between zones?  It doesn't seem to be changelogged,
> and the comment commits the common mistake of explaining "what" but not
> "why".


I saw similar function in isolate_freepages and remember Mel said. 

"
Node-0 Node-1 Node-0
DMA DMA DMA
0-1023 1024-2047 2048-4096

In that case, a PFN scanner can enter a new node and zone but the migrate
and free scanners have not necessarily met. This configuration is *extremely*
rare but it happens on messed-up LPAR configurations on POWER
"
http://lkml.indiana.edu/hypermail/linux/kernel/1002.2/01140.html

> 

> 
>>  		if (!page_count(page)) {
>>  			if (PageBuddy(page))
>>  				iter += (1 << page_order(page)) - 1;
>>
>> ...
>>
> 
> A few tweaks:
> 
> --- a/mm/compaction.c~mm-compaction-handle-incorrect-migrate_unmovable-type-pageblocks-fix
> +++ a/mm/compaction.c
> @@ -394,10 +394,10 @@ static void rescue_unmovable_pageblock(s
>  }
>  
>  /*
> - * MIGRATE_TARGET : good for migration target
> - * RESCUE_UNMOVABLE_TARTET : good only if we can rescue the unmovable pageblock.
> - * UNMOVABLE_TARGET : can't migrate because it's a page in unmovable pageblock.
> - * SKIP_TARGET : can't migrate by another reasons.
> + * MIGRATE_TARGET: good for migration target
> + * RESCUE_UNMOVABLE_TARGET: good only if we can rescue the unmovable pageblock.
> + * UNMOVABLE_TARGET: can't migrate because it's a page in unmovable pageblock.
> + * SKIP_TARGET: can't migrate by another reasons.
>   */
>  enum smt_result {
>  	MIGRATE_TARGET,
> @@ -407,9 +407,9 @@ enum smt_result {
>  };
>  
>  /*
> - * Returns MIGRATE_TARGET if the page is within a block
> - * suitable for migration to, UNMOVABLE_TARGET if the page
> - * is within a MIGRATE_UNMOVABLE block, SKIP_TARGET otherwise.
> + * Returns MIGRATE_TARGET if the page is within a block suitable for migration
> + * to, UNMOVABLE_TARGET if the page is within a MIGRATE_UNMOVABLE block,
> + * SKIP_TARGET otherwise.
>   */
>  static enum smt_result suitable_migration_target(struct page *page,
>  			      struct compact_control *cc)
> --- a/mm/internal.h~mm-compaction-handle-incorrect-migrate_unmovable-type-pageblocks-fix
> +++ a/mm/internal.h
> @@ -97,7 +97,7 @@ extern void putback_lru_page(struct page
>  extern void set_pageblock_migratetype(struct page *page, int migratetype);
>  extern int move_freepages_block(struct zone *zone, struct page *page,
>  				int migratetype);
> -extern int __count_immobile_pages(struct zone *zone, struct page *page,
> +extern bool __count_immobile_pages(struct zone *zone, struct page *page,
>  				  int count);
>  extern void __free_pages_bootmem(struct page *page, unsigned int order);
>  extern void prep_compound_page(struct page *page, unsigned long order);
> --- a/mm/page_alloc.c~mm-compaction-handle-incorrect-migrate_unmovable-type-pageblocks-fix
> +++ a/mm/page_alloc.c
> @@ -221,7 +221,6 @@ int page_group_by_mobility_disabled __re
>  
>  void set_pageblock_migratetype(struct page *page, int migratetype)
>  {
> -
>  	if (unlikely(page_group_by_mobility_disabled))
>  		migratetype = MIGRATE_UNMOVABLE;
>  
> @@ -5472,7 +5471,7 @@ void set_pageblock_flags_group(struct pa
>   * page allocater never alloc memory from ISOLATE block.
>   */
>  
> -int __count_immobile_pages(struct zone *zone, struct page *page, int count)
> +bool __count_immobile_pages(struct zone *zone, struct page *page, int count)
>  {
>  	unsigned long pfn, iter, found;
>  	int mt;
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>
> 



-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ