[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4FCDA1B4.9050301@kernel.org>
Date: Tue, 05 Jun 2012 15:05:40 +0900
From: Minchan Kim <minchan@...nel.org>
To: KOSAKI Motohiro <kosaki.motohiro@...il.com>
CC: undisclosed-recipients:;
Bartlomiej Zolnierkiewicz <b.zolnierkie@...sung.com>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Hugh Dickins <hughd@...gle.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Kyungmin Park <kyungmin.park@...sung.com>,
Marek Szyprowski <m.szyprowski@...sung.com>,
Mel Gorman <mgorman@...e.de>, Rik van Riel <riel@...hat.com>,
Dave Jones <davej@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Cong Wang <amwang@...hat.com>,
Markus Trippelsdorf <markus@...ppelsdorf.de>
Subject: Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE
type pageblocks
On 06/05/2012 01:35 PM, KOSAKI Motohiro wrote:
>>>> Minchan, are you interest this patch? If yes, can you please rewrite
>>>> it?
>>>
>>> Can do it but I want to give credit to Bartlomiej.
>>> Bartlomiej, if you like my patch, could you resend it as formal patch
>>> after you do broad testing?
>>>
>>> Frankly speaking, I don't want to merge it without any data which
>>> prove it's really good for real practice.
>>>
>>> When the patch firstly was submitted, it wasn't complicated so I was
>>> okay at that time but it has been complicated
>>> than my expectation. So if Andrew might pass the decision to me, I'm
>>> totally NACK if author doesn't provide
>>> any real data or VOC of some client.
>
> I agree. And you don't need to bother this patch if you are not interest
> this one. I'm sorry.
Never mind.
> Let's throw it away until the author send us data.
>
I guess it's hard to make such workload to prove it's useful normally.
But we can't make sure there isn't such workload in the world.
So I hope listen VOC. At least, Mel might require it.
If anyone doesn't support it, I hope let's add some vmstat like stuff for proving
this patch's effect. If we can't see the benefit through vmstat, we can deprecate
it later.
>>> 1) Any comment?
>>>
>>> Anyway, I fixed some bugs and clean up something I found during review.
>>>
>>> Minor thing.
>>> 1. change smt_result naming - I never like such long non-consistent
>>> naming. How about this?
>>> 2. fix can_rescue_unmovable_pageblock
>>> 2.1 pfn valid check for page_zone
>>>
>>> Major thing.
>>>
>>> 2.2 add lru_lock for stablizing PageLRU
>>> If we don't hold lru_lock, there is possibility that
>>> unmovable(non-LRU) page can put in movable pageblock.
>>> It can make compaction/CMA's regression. But there is a
>>> concern about deadlock between lru_lock and lock.
>>> As I look the code, I can't find allocation trial with
>>> holding lru_lock so it might be safe(but not sure,
>>> I didn't test it. It need more careful review/testing) but it
>>> makes new locking dependency(not sure, too.
>>> We already made such rule but I didn't know that until now
>>> ;-) ) Why I thought so is we can allocate
>>> GFP_ATOMIC with holding lru_lock, logically which might be
>>> crazy idea.
>>>
>>> 2.3 remove zone->lock in first phase.
>>> We do rescue unmovable pageblock by 2-phase. In first-phase,
>>> we just peek pages so we don't need locking.
>>> If we see non-stablizing value, it would be caught by 2-phase
>>> with needed lock or
>>> can_rescue_unmovable_pageblock can return out of loop by
>>> stale page_order(cursor_page).
>>> It couldn't make unmovable pageblock to movable but we can do
>>> it next time, again.
>>> It's not critical.
>>>
>>> 2) Any comment?
>>>
>>> Now I can't inline the code so sorry but attach patch.
>>> It's not a formal patch/never tested.
>>>
>>
>>
>> Attached patch has a BUG in can_rescue_unmovable_pageblock.
>> Resend. I hope it is fixed.
>>
>>
>>
>>
>>
>> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
>> index 51a90b7..e988037 100644
>> --- a/include/linux/compaction.h
>> +++ b/include/linux/compaction.h
>> @@ -1,6 +1,8 @@
>> #ifndef _LINUX_COMPACTION_H
>> #define _LINUX_COMPACTION_H
>>
>> +#include <linux/node.h>
>> +
>> /* Return values for compact_zone() and try_to_compact_pages() */
>> /* compaction didn't start as it was not possible or direct reclaim
>> was more suitable */
>> #define COMPACT_SKIPPED 0
>> @@ -11,6 +13,23 @@
>> /* The full zone was compacted */
>> #define COMPACT_COMPLETE 3
>>
>> +/*
>> + * compaction supports three modes
>> + *
>> + * COMPACT_ASYNC_MOVABLE uses asynchronous migration and only scans
>> + * MIGRATE_MOVABLE pageblocks as migration sources and targets.
>> + * COMPACT_ASYNC_UNMOVABLE uses asynchronous migration and only scans
>> + * MIGRATE_MOVABLE pageblocks as migration sources.
>> + * MIGRATE_UNMOVABLE pageblocks are scanned as potential migration
>> + * targets and convers them to MIGRATE_MOVABLE if possible
>> + * COMPACT_SYNC uses synchronous migration and scans all pageblocks
>> + */
>> +enum compact_mode {
>> + COMPACT_ASYNC_MOVABLE,
>> + COMPACT_ASYNC_UNMOVABLE,
>> + COMPACT_SYNC,
>> +};
>> +
>> #ifdef CONFIG_COMPACTION
>> extern int sysctl_compact_memory;
>> extern int sysctl_compaction_handler(struct ctl_table *table, int write,
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 7ea259d..dd02f25 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -236,7 +236,7 @@ isolate_migratepages_range(struct zone *zone,
>> struct compact_control *cc,
>> */
>> while (unlikely(too_many_isolated(zone))) {
>> /* async migration should just abort */
>> - if (!cc->sync)
>> + if (cc->mode != COMPACT_SYNC)
>> return 0;
>>
>> congestion_wait(BLK_RW_ASYNC, HZ/10);
>> @@ -304,7 +304,8 @@ isolate_migratepages_range(struct zone *zone,
>> struct compact_control *cc,
>> * satisfies the allocation
>> */
>> pageblock_nr = low_pfn >> pageblock_order;
>> - if (!cc->sync && last_pageblock_nr != pageblock_nr &&
>> + if (cc->mode != COMPACT_SYNC &&
>> + last_pageblock_nr != pageblock_nr &&
>> !migrate_async_suitable(get_pageblock_migratetype(page))) {
>> low_pfn += pageblock_nr_pages;
>> low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
>> @@ -325,7 +326,7 @@ isolate_migratepages_range(struct zone *zone,
>> struct compact_control *cc,
>> continue;
>> }
>>
>> - if (!cc->sync)
>> + if (cc->mode != COMPACT_SYNC)
>> mode |= ISOLATE_ASYNC_MIGRATE;
>>
>> lruvec = mem_cgroup_page_lruvec(page, zone);
>> @@ -360,27 +361,121 @@ isolate_migratepages_range(struct zone *zone,
>> struct compact_control *cc,
>>
>> #endif /* CONFIG_COMPACTION || CONFIG_CMA */
>> #ifdef CONFIG_COMPACTION
>> +/*
>> + * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
>> + * converted to MIGRATE_MOVABLE type, false otherwise.
>> + */
>> +static bool can_rescue_unmovable_pageblock(struct page *page, bool
>> need_lrulock)
>> +{
>> + struct zone *zone;
>> + unsigned long pfn, start_pfn, end_pfn;
>> + struct page *start_page, *end_page, *cursor_page;
>> + bool lru_locked = false;
>> +
>> + zone = page_zone(page);
>> + pfn = page_to_pfn(page);
>> + start_pfn = pfn & ~(pageblock_nr_pages - 1);
>> + end_pfn = start_pfn + pageblock_nr_pages - 1;
>> +
>> + start_page = pfn_to_page(start_pfn);
>> + end_page = pfn_to_page(end_pfn);
>> +
>> + for (cursor_page = start_page, pfn = start_pfn; cursor_page <=
>> end_page;
>> + pfn++, cursor_page++) {
>>
>> -/* Returns true if the page is within a block suitable for migration
>> to */
>> -static bool suitable_migration_target(struct page *page)
>> + if (!pfn_valid_within(pfn))
>> + continue;
>> +
>> + /* Do not deal with pageblocks that overlap zones */
>> + if (page_zone(cursor_page) != zone)
>> + goto out;
>> +
>> + if (PageBuddy(cursor_page)) {
>> + unsigned long order = page_order(cursor_page);
>> +
>> + pfn += (1 << order) - 1;
>> + cursor_page += (1 << order) - 1;
>> + continue;
>> + } else if (page_count(cursor_page) == 0) {
>> + continue;
>
> Can we assume freed tail page always have page_count()==0? if yes, why
> do we
> need dangerous PageBuddy(cursor_page) check? ok, but this may be harmless.
>
page_count check is for pcp pages.
Am I missing your point?
> But if no, this code is seriously dangerous. think following scenario,
>
> 1) cursor page points free page
>
> +----------------+------------------+
> | free (order-1) | used (order-1) |
> +----------------+------------------+
> |
> cursor
>
> 2) moved cursor
>
> +----------------+------------------+
> | free (order-1) | used (order-1) |
> +----------------+------------------+
> |
> cursor
>
> 3) neighbor block was freed
>
>
> +----------------+------------------+
> | free (order-2) |
> +----------------+------------------+
> |
> cursor
>
> now, cursor points to middle of free block.
>
>
> Anyway, I recommend to avoid dangerous no zone->lock game and change
> can_rescue_unmovable_pageblock() is only called w/ zone->lock. I have
I can't understand your point.
If the page is middle of free block, what's the problem in can_rescue_unmovable_pageblock
at first trial of can_rescue_xxx?
I think we can stabilize it in second trial of can_rescue_unmovable_pageblock with zone->lock.
> no seen any worth to include this high complex for mere minor optimization.
>
>
>> + } else if (PageLRU(cursor_page)) {
>> + if (!need_lrulock)
>> + continue;
>> + else if (lru_locked)
>> + continue;
>> + else {
>> + spin_lock(&zone->lru_lock);
>
> Hmm...
> I don't like to take lru_lock. 1) Until now, we carefully avoid to take
> both zone->lock and zone->lru_lock. they are both performance critical
> lock. And I think pageblock migratetype don't need strictly correct. It
> is only optimization of anti fragmentation. Why do we need take it?
>
movable_block has unmovable page can make regression of anti-fragmentation.
So I did it. I agree it's a sort of optimization.
If others don't want it at the cost of regression anti-fragmentation, we can remove the lock.
>
>
>> + lru_locked = true;
>> + if (PageLRU(page))
>> + continue;
>> + }
>> + }
>> +
>> + goto out;
>> + }
>> +
>
> Why don't we need to release lru_lock when returning true.
Because my brain has gone. :(
--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists