lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 11 Sep 2012 09:37:54 +0800
From:	Wen Congyang <wency@...fujitsu.com>
To:	Minchan Kim <minchan@...nel.org>
CC:	Kamezawa Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Yasuaki Ishimatsu <isimatu.yasuaki@...fujitsu.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Michal Nazarewicz <mina86@...a86.com>,
	Mel Gorman <mel@....ul.ie>,
	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
Subject: Re: [RFC v2] memory-hotplug: remove MIGRATE_ISOLATE from free_area->free_list

At 09/11/2012 08:52 AM, Minchan Kim Wrote:
> Hello Wen,
> 
> On Fri, Sep 07, 2012 at 03:28:22PM +0800, Wen Congyang wrote:
>> At 09/06/2012 10:53 AM, Minchan Kim Wrote:
>>> Normally, MIGRATE_ISOLATE type is used for memory-hotplug.
>>> But it's irony type because the pages isolated would exist
>>> as free page in free_area->free_list[MIGRATE_ISOLATE] so people
>>> can think of it as allocatable pages but it is *never* allocatable.
>>> It ends up confusing NR_FREE_PAGES vmstat so it would be
>>> totally not accurate so some of place which depend on such vmstat
>>> could reach wrong decision by the context.
>>>
>>> There were already report about it.[1]
>>> [1] 702d1a6e, memory-hotplug: fix kswapd looping forever problem
>>>
>>> Then, there was other report which is other problem.[2]
>>> [2] http://www.spinics.net/lists/linux-mm/msg41251.html
>>>
>>> I believe it can make problems in future, too.
>>> So I hope removing such irony type by another design.
>>>
>>> I hope this patch solves it and let's revert [1] and doesn't need [2].
>>>
>>> * Changelog v1
>>>  * Fix from Michal's many suggestion
>>>
>>> Cc: Michal Nazarewicz <mina86@...a86.com>
>>> Cc: Mel Gorman <mel@....ul.ie>
>>> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
>>> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@...fujitsu.com>
>>> Cc: Wen Congyang <wency@...fujitsu.com>
>>> Cc: Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
>>> Signed-off-by: Minchan Kim <minchan@...nel.org>
>>> ---
>>> It's very early version which show the concept so I still marked it with RFC.
>>> I just tested it with simple test and works.
>>> This patch is needed indepth review from memory-hotplug guys from fujitsu
>>> because I saw there are lots of patches recenlty they sent to about
>>> memory-hotplug change. Please take a look at this patch.
>>>
>>>  drivers/xen/balloon.c          |    2 +
>>>  include/linux/mmzone.h         |    4 +-
>>>  include/linux/page-isolation.h |   11 ++-
>>>  mm/internal.h                  |    3 +
>>>  mm/memory_hotplug.c            |   38 ++++++----
>>>  mm/page_alloc.c                |   33 ++++----
>>>  mm/page_isolation.c            |  162 +++++++++++++++++++++++++++++++++-------
>>>  mm/vmstat.c                    |    1 -
>>>  8 files changed, 193 insertions(+), 61 deletions(-)
>>>
>>> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
>>> index 31ab82f..df0f5f3 100644
>>> --- a/drivers/xen/balloon.c
>>> +++ b/drivers/xen/balloon.c
>>> @@ -50,6 +50,7 @@
>>>  #include <linux/notifier.h>
>>>  #include <linux/memory.h>
>>>  #include <linux/memory_hotplug.h>
>>> +#include <linux/page-isolation.h>
>>>  
>>>  #include <asm/page.h>
>>>  #include <asm/pgalloc.h>
>>> @@ -268,6 +269,7 @@ static void xen_online_page(struct page *page)
>>>  	else
>>>  		--balloon_stats.balloon_hotplug;
>>>  
>>> +	delete_from_isolated_list(page);
>>>  	mutex_unlock(&balloon_mutex);
>>>  }
>>>  
>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>> index 2daa54f..438bab8 100644
>>> --- a/include/linux/mmzone.h
>>> +++ b/include/linux/mmzone.h
>>> @@ -57,8 +57,8 @@ enum {
>>>  	 */
>>>  	MIGRATE_CMA,
>>>  #endif
>>> -	MIGRATE_ISOLATE,	/* can't allocate from here */
>>> -	MIGRATE_TYPES
>>> +	MIGRATE_TYPES,
>>> +	MIGRATE_ISOLATE
>>>  };
>>>  
>>>  #ifdef CONFIG_CMA
>>> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
>>> index 105077a..1ae2cd6 100644
>>> --- a/include/linux/page-isolation.h
>>> +++ b/include/linux/page-isolation.h
>>> @@ -1,11 +1,16 @@
>>>  #ifndef __LINUX_PAGEISOLATION_H
>>>  #define __LINUX_PAGEISOLATION_H
>>>  
>>> +extern struct list_head isolated_pages;
>>>  
>>>  bool has_unmovable_pages(struct zone *zone, struct page *page, int count);
>>>  void set_pageblock_migratetype(struct page *page, int migratetype);
>>>  int move_freepages_block(struct zone *zone, struct page *page,
>>>  				int migratetype);
>>> +
>>> +void isolate_free_page(struct page *page, unsigned int order);
>>> +void delete_from_isolated_list(struct page *page);
>>> +
>>>  /*
>>>   * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
>>>   * If specified range includes migrate types other than MOVABLE or CMA,
>>> @@ -20,9 +25,13 @@ start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>>>  			 unsigned migratetype);
>>>  
>>>  /*
>>> - * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
>>> + * Changes MIGRATE_ISOLATE to @migratetype.
>>>   * target range is [start_pfn, end_pfn)
>>>   */
>>> +void
>>> +undo_isolate_pageblocks(unsigned long start_pfn, unsigned long end_pfn,
>>> +			unsigned migratetype);
>>> +
>>>  int
>>>  undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>>>  			unsigned migratetype);
>>> diff --git a/mm/internal.h b/mm/internal.h
>>> index 3314f79..393197e 100644
>>> --- a/mm/internal.h
>>> +++ b/mm/internal.h
>>> @@ -144,6 +144,9 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
>>>   * function for dealing with page's order in buddy system.
>>>   * zone->lock is already acquired when we use these.
>>>   * So, we don't need atomic page->flags operations here.
>>> + *
>>> + * Page order should be put on page->private because
>>> + * memory-hotplug depends on it. Look mm/page_isolation.c.
>>>   */
>>>  static inline unsigned long page_order(struct page *page)
>>>  {
>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>> index 3ad25f9..30c36d5 100644
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -410,26 +410,29 @@ void __online_page_set_limits(struct page *page)
>>>  	unsigned long pfn = page_to_pfn(page);
>>>  
>>>  	if (pfn >= num_physpages)
>>> -		num_physpages = pfn + 1;
>>> +		num_physpages = pfn + (1 << page_order(page));
>>>  }
>>>  EXPORT_SYMBOL_GPL(__online_page_set_limits);
>>>  
>>>  void __online_page_increment_counters(struct page *page)
>>>  {
>>> -	totalram_pages++;
>>> +	totalram_pages += (1 << page_order(page));
>>>  
>>>  #ifdef CONFIG_HIGHMEM
>>>  	if (PageHighMem(page))
>>> -		totalhigh_pages++;
>>> +		totalhigh_pages += (1 << page_order(page));
>>>  #endif
>>>  }
>>>  EXPORT_SYMBOL_GPL(__online_page_increment_counters);
>>>  
>>>  void __online_page_free(struct page *page)
>>>  {
>>> -	ClearPageReserved(page);
>>> -	init_page_count(page);
>>> -	__free_page(page);
>>> +	int i;
>>> +	unsigned long order = page_order(page);
>>> +	for (i = 0; i < (1 << order); i++)
>>> +		ClearPageReserved(page + i);
>>> +	set_page_private(page, 0);
>>> +	__free_pages(page, order);
>>>  }
>>>  EXPORT_SYMBOL_GPL(__online_page_free);
>>>  
>>> @@ -437,26 +440,29 @@ static void generic_online_page(struct page *page)
>>>  {
>>>  	__online_page_set_limits(page);
>>>  	__online_page_increment_counters(page);
>>> +	delete_from_isolated_list(page);
>>>  	__online_page_free(page);
>>>  }
>>>  
>>>  static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
>>>  			void *arg)
>>>  {
>>> -	unsigned long i;
>>> +	unsigned long pfn;
>>> +	unsigned long end_pfn = start_pfn + nr_pages;
>>>  	unsigned long onlined_pages = *(unsigned long *)arg;
>>> -	struct page *page;
>>> -	if (PageReserved(pfn_to_page(start_pfn)))
>>> -		for (i = 0; i < nr_pages; i++) {
>>> -			page = pfn_to_page(start_pfn + i);
>>> -			(*online_page_callback)(page);
>>> -			onlined_pages++;
>>> +	struct page *cursor, *tmp;
>>> +	list_for_each_entry_safe(cursor, tmp, &isolated_pages, lru) {
>>> +		pfn = page_to_pfn(cursor);
>>> +		if (pfn >= start_pfn && pfn < end_pfn) {
>>> +			(*online_page_callback)(cursor);
>>> +			onlined_pages += (1 << page_order(cursor));
>>>  		}
>>> +	}
>>> +
>>
>> If the memory is hotpluged, the pages are not in isolated_pages, and they
>> can't be onlined.
> 
> Hmm, I can't parse your point.
> Could you elaborate it a bit?

The driver for hotplugable memory device is acpi-memhotplug. When a memory
device is hotpluged, add_memory() will be called. The pages of the memory
is added into kernel in the function __add_pages():

__add_pages()
    __add_section()
        __add_zone()
            memmap_init_zone() // pages are initialized here

These pages are not added into isolated_pages, so we can't online
them because you only online isolated pages.

Thanks
Wen Congyang

> 
>>
>>>  	*(unsigned long *)arg = onlined_pages;
>>>  	return 0;
>>>  }
>>>  
>>> -
>>>  int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
>>>  {
>>>  	unsigned long onlined_pages = 0;
>>> @@ -954,11 +960,11 @@ repeat:
>>>  		goto failed_removal;
>>>  	}
>>>  	printk(KERN_INFO "Offlined Pages %ld\n", offlined_pages);
>>> -	/* Ok, all of our target is islaoted.
>>> +	/* Ok, all of our target is isolated.
>>>  	   We cannot do rollback at this point. */
>>>  	offline_isolated_pages(start_pfn, end_pfn);
>>>  	/* reset pagetype flags and makes migrate type to be MOVABLE */
>>> -	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
>>> +	undo_isolate_pageblocks(start_pfn, end_pfn, MIGRATE_MOVABLE);
>>>  	/* removal success */
>>>  	zone->present_pages -= offlined_pages;
>>>  	zone->zone_pgdat->node_present_pages -= offlined_pages;
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index ba3100a..3e516c5 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -721,6 +721,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
>>>  {
>>>  	unsigned long flags;
>>>  	int wasMlocked = __TestClearPageMlocked(page);
>>> +	int migratetype;
>>>  
>>>  	if (!free_pages_prepare(page, order))
>>>  		return;
>>> @@ -729,8 +730,14 @@ static void __free_pages_ok(struct page *page, unsigned int order)
>>>  	if (unlikely(wasMlocked))
>>>  		free_page_mlock(page);
>>>  	__count_vm_events(PGFREE, 1 << order);
>>> -	free_one_page(page_zone(page), page, order,
>>> -					get_pageblock_migratetype(page));
>>> +
>>> +	migratetype = get_pageblock_migratetype(page);
>>> +	if (likely(migratetype != MIGRATE_ISOLATE))
>>> +		free_one_page(page_zone(page), page, order,
>>> +				migratetype);
>>> +	else
>>> +		isolate_free_page(page, order);
>>> +
>>>  	local_irq_restore(flags);
>>>  }
>>>  
>>> @@ -906,7 +913,6 @@ static int fallbacks[MIGRATE_TYPES][4] = {
>>>  	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE,   MIGRATE_RESERVE },
>>>  #endif
>>>  	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
>>> -	[MIGRATE_ISOLATE]     = { MIGRATE_RESERVE }, /* Never used */
>>>  };
>>>  
>>>  /*
>>> @@ -948,8 +954,13 @@ static int move_freepages(struct zone *zone,
>>>  		}
>>>  
>>>  		order = page_order(page);
>>> -		list_move(&page->lru,
>>> -			  &zone->free_area[order].free_list[migratetype]);
>>> +		if (migratetype != MIGRATE_ISOLATE) {
>>> +			list_move(&page->lru,
>>> +				&zone->free_area[order].free_list[migratetype]);
>>> +		} else {
>>> +			list_del(&page->lru);
>>> +			isolate_free_page(page, order);
>>> +		}
>>>  		page += 1 << order;
>>>  		pages_moved += 1 << order;
>>>  	}
>>> @@ -1316,7 +1327,7 @@ void free_hot_cold_page(struct page *page, int cold)
>>>  	 */
>>>  	if (migratetype >= MIGRATE_PCPTYPES) {
>>>  		if (unlikely(migratetype == MIGRATE_ISOLATE)) {
>>> -			free_one_page(zone, page, 0, migratetype);
>>> +			isolate_free_page(page, 0);
>>>  			goto out;
>>>  		}
>>>  		migratetype = MIGRATE_MOVABLE;
>>> @@ -5908,7 +5919,6 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
>>>  	struct zone *zone;
>>>  	int order, i;
>>>  	unsigned long pfn;
>>> -	unsigned long flags;
>>>  	/* find the first valid pfn */
>>>  	for (pfn = start_pfn; pfn < end_pfn; pfn++)
>>>  		if (pfn_valid(pfn))
>>> @@ -5916,7 +5926,6 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
>>>  	if (pfn == end_pfn)
>>>  		return;
>>>  	zone = page_zone(pfn_to_page(pfn));
>>> -	spin_lock_irqsave(&zone->lock, flags);
>>>  	pfn = start_pfn;
>>>  	while (pfn < end_pfn) {
>>>  		if (!pfn_valid(pfn)) {
>>> @@ -5924,23 +5933,15 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
>>>  			continue;
>>>  		}
>>>  		page = pfn_to_page(pfn);
>>> -		BUG_ON(page_count(page));
>>> -		BUG_ON(!PageBuddy(page));
>>>  		order = page_order(page);
>>>  #ifdef CONFIG_DEBUG_VM
>>>  		printk(KERN_INFO "remove from free list %lx %d %lx\n",
>>>  		       pfn, 1 << order, end_pfn);
>>>  #endif
>>> -		list_del(&page->lru);
>>> -		rmv_page_order(page);
>>> -		zone->free_area[order].nr_free--;
>>> -		__mod_zone_page_state(zone, NR_FREE_PAGES,
>>> -				      - (1UL << order));
>>>  		for (i = 0; i < (1 << order); i++)
>>>  			SetPageReserved((page+i));
>>>  		pfn += (1 << order);
>>>  	}
>>> -	spin_unlock_irqrestore(&zone->lock, flags);
>>>  }
>>>  #endif
>>>  
>>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>>> index 247d1f1..27cf59e 100644
>>> --- a/mm/page_isolation.c
>>> +++ b/mm/page_isolation.c
>>> @@ -8,6 +8,90 @@
>>>  #include <linux/memory.h>
>>>  #include "internal.h"
>>>  
>>> +LIST_HEAD(isolated_pages);
>>> +static DEFINE_SPINLOCK(lock);
>>> +
>>> +/*
>>> + * Add the page into isolated_pages which is sort of pfn ascending list.
>>> + */
>>> +static void __add_isolated_page(struct page *page)
>>> +{
>>> +	struct page *cursor;
>>> +	unsigned long pfn;
>>> +	unsigned long new_pfn = page_to_pfn(page);
>>> +
>>> +	list_for_each_entry_reverse(cursor, &isolated_pages, lru) {
>>> +		pfn = page_to_pfn(cursor);
>>> +		if (pfn < new_pfn)
>>> +			break;
>>> +	}
>>> +
>>> +	list_add(&page->lru, &cursor->lru);
>>> +}
>>> +
>>> +/*
>>> + * Isolate free page. It is used by memory-hotplug for stealing
>>> + * free page from free_area or freeing path of allocator.
>>> + */
>>> +void isolate_free_page(struct page *page, unsigned int order)
>>> +{
>>> +	unsigned long flags;
>>> +
>>> +	/*
>>> +	 * We increase refcount for further freeing when online_pages
>>> +	 * happens and record order into @page->private so that
>>> +	 * online_pages can know what order page freeing.
>>> +	 */
>>> +	set_page_refcounted(page);
>>> +	set_page_private(page, order);
>>> +
>>> +	/* move_freepages is alredy hold zone->lock */
>>> +	if (PageBuddy(page))
>>> +		__ClearPageBuddy(page);
>>> +
>>> +	spin_lock_irqsave(&lock, flags);
>>> +	__add_isolated_page(page);
>>> +	spin_unlock_irqrestore(&lock, flags);
>>> +}
>>> +
>>> +void delete_from_isolated_list(struct page *page)
>>> +{
>>> +	unsigned long flags;
>>> +
>>> +	spin_lock_irqsave(&lock, flags);
>>> +	list_del(&page->lru);
>>> +	spin_unlock_irqrestore(&lock, flags);
>>> +}
>>> +
>>> +/* free pages in the pageblock which include @page */
>>> +static void free_isolated_pageblock(struct page *page)
>>> +{
>>> +	struct page *cursor, *tmp;
>>> +	unsigned long start_pfn, end_pfn, pfn;
>>> +	unsigned long flags;
>>> +	LIST_HEAD(pages);
>>> +
>>> +	start_pfn = page_to_pfn(page);
>>> +	start_pfn = start_pfn & ~(pageblock_nr_pages-1);
>>> +	end_pfn = start_pfn + pageblock_nr_pages;
>>> +
>>> +	spin_lock_irqsave(&lock, flags);
>>> +	list_for_each_entry_safe(cursor, tmp, &isolated_pages, lru) {
>>> +		pfn = page_to_pfn(cursor);
>>> +		if (pfn >= end_pfn)
>>> +			break;
>>> +		if (pfn >= start_pfn)
>>> +			list_move(&cursor->lru, &pages);
>>> +	}
>>> +	spin_unlock_irqrestore(&lock, flags);
>>> +
>>> +	list_for_each_entry_safe(cursor, tmp, &pages, lru) {
>>> +		int order = page_order(cursor);
>>> +		list_del(&cursor->lru);
>>> +		__free_pages(cursor, order);
>>> +	}
>>> +}
>>> +
>>>  /* called while holding zone->lock */
>>>  static void set_pageblock_isolate(struct page *page)
>>>  {
>>> @@ -91,13 +175,12 @@ void unset_migratetype_isolate(struct page *page, unsigned migratetype)
>>>  	struct zone *zone;
>>>  	unsigned long flags;
>>>  	zone = page_zone(page);
>>> +
>>>  	spin_lock_irqsave(&zone->lock, flags);
>>> -	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
>>> -		goto out;
>>> -	move_freepages_block(zone, page, migratetype);
>>> -	restore_pageblock_isolate(page, migratetype);
>>> -out:
>>> +	if (get_pageblock_migratetype(page) == MIGRATE_ISOLATE)
>>> +		restore_pageblock_isolate(page, migratetype);
>>>  	spin_unlock_irqrestore(&zone->lock, flags);
>>> +	free_isolated_pageblock(page);
>>>  }
>>>  
>>>  static inline struct page *
>>> @@ -155,6 +238,30 @@ undo:
>>>  	return -EBUSY;
>>>  }
>>>  
>>> +void undo_isolate_pageblocks(unsigned long start_pfn, unsigned long end_pfn,
>>> +		unsigned migratetype)
>>> +{
>>> +	unsigned long pfn;
>>> +	struct page *page;
>>> +	struct zone *zone;
>>> +	unsigned long flags;
>>> +
>>> +	BUG_ON(start_pfn & (pageblock_nr_pages - 1));
>>> +	BUG_ON(end_pfn & (pageblock_nr_pages - 1));
>>> +
>>> +	for (pfn = start_pfn;
>>> +			pfn < end_pfn;
>>> +			pfn += pageblock_nr_pages) {
>>> +		page = __first_valid_page(pfn, pageblock_nr_pages);
>>> +		if (!page || get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
>>> +			continue;
>>> +		zone = page_zone(page);
>>> +		spin_lock_irqsave(&zone->lock, flags);
>>> +		restore_pageblock_isolate(page, migratetype);
>>> +		spin_unlock_irqrestore(&zone->lock, flags);
>>> +	}
>>> +}
>>> +
>>>  /*
>>>   * Make isolated pages available again.
>>>   */
>>> @@ -180,30 +287,35 @@ int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>>>   * all pages in [start_pfn...end_pfn) must be in the same zone.
>>>   * zone->lock must be held before call this.
>>>   *
>>> - * Returns 1 if all pages in the range are isolated.
>>> + * Returns true if all pages in the range are isolated.
>>>   */
>>> -static int
>>> -__test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn)
>>> +static bool
>>> +__test_page_isolated_in_pageblock(unsigned long start_pfn, unsigned long end_pfn)
>>
>> This function fails and the pages can't be offlined in my test. I will investigate
>> it if I have time.
>>
>> Thanks
>> Wen Congyang
> 
> Thanks for the testing, Wen.
> I also want to take a look but not now due to other urgent task.
> Shortly, I will revisit this issue, Thanks!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ