linux-kernel - Re: [PATCH 2/6] mm: add get_pageblock_migratetype_nolock() for cases where locking is undesirable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 03 Mar 2014 14:54:09 +0100
From:	Vlastimil Babka <vbabka@...e.cz>
To:	Joonsoo Kim <iamjoonsoo.kim@....com>
CC:	Andrew Morton <akpm@...ux-foundation.org>,
	Mel Gorman <mgorman@...e.de>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, Rik van Riel <riel@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Minchan Kim <minchan@...nel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Subject: Re: [PATCH 2/6] mm: add get_pageblock_migratetype_nolock() for cases
 where locking is undesirable

On 03/03/2014 09:22 AM, Joonsoo Kim wrote:
> On Fri, Feb 28, 2014 at 03:15:00PM +0100, Vlastimil Babka wrote:
>> In order to prevent race with set_pageblock_migratetype, most of calls to
>> get_pageblock_migratetype have been moved under zone->lock. For the remaining
>> call sites, the extra locking is undesirable, notably in free_hot_cold_page().
>>
>> This patch introduces a _nolock version to be used on these call sites, where
>> a wrong value does not affect correctness. The function makes sure that the
>> value does not exceed valid migratetype numbers. Such too-high values are
>> assumed to be a result of race and caller-supplied fallback value is returned
>> instead.
>>
>> Signed-off-by: Vlastimil Babka <vbabka@...e.cz>
>> ---
>>   include/linux/mmzone.h | 24 ++++++++++++++++++++++++
>>   mm/compaction.c        | 14 +++++++++++---
>>   mm/memory-failure.c    |  3 ++-
>>   mm/page_alloc.c        | 22 +++++++++++++++++-----
>>   mm/vmstat.c            |  2 +-
>>   5 files changed, 55 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index fac5509..7c3f678 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -75,6 +75,30 @@ enum {
>>
>>   extern int page_group_by_mobility_disabled;
>>
>> +/*
>> + * When called without zone->lock held, a race with set_pageblock_migratetype
>> + * may result in bogus values. Use this variant only when this does not affect
>> + * correctness, and taking zone->lock would be costly. Values >= MIGRATE_TYPES
>> + * are considered to be a result of this race and the value of race_fallback
>> + * argument is returned instead.
>> + */
>> +static inline int get_pageblock_migratetype_nolock(struct page *page,
>> +	int race_fallback)
>> +{
>> +	int ret = get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
>> +
>> +	if (unlikely(ret >= MIGRATE_TYPES))
>> +		ret = race_fallback;
>> +
>> +	return ret;
>> +}
>
> Hello, Vlastimil.
>
> First of all, thanks for nice work!
> I have another opinion about this implementation. It can be wrong, so if it
> is wrong, please let me know.

Thanks, all opinions/reviewing is welcome :)

> Although this implementation would close the race which triggers NULL dereference,
> I think that this isn't enough if you have a plan to add more
> {start,undo}_isolate_page_range().
>
> Consider that there are lots of {start,undo}_isolate_page_range() calls
> on the system without CMA.
>
> bit representation of migratetype is like as following.
>
> MIGRATE_MOVABLE = 010
> MIGRATE_ISOLATE = 100
>
> We could read following values as migratetype of the page on movable pageblock
> if race occurs.
>
> start_isolate_page_range() case: 010 -> 100
> 010, 000, 100
>
> undo_isolate_page_range() case: 100 -> 010
> 100, 110, 010
>
> Above implementation prevents us from getting 110, but, it can't prevent us from
> getting 000, that is, MIGRATE_UNMOVABLE. If this race occurs in free_hot_cold_page(),
> this page would go into unmovable pcp and then allocated for that migratetype.
> It results in more fragmented memory.

Yes, that can happen. But I would expect it to be negligible to other 
causes of fragmentation. But I'm not at this moment sure how often 
{start,undo}_isolate_page_range() would be called in the end. Certainly
not as often as in the development patch which is just to see if that 
can improve anything. Because it will have its own overhead (mostly for 
zone->lock) that might be too large. But good point, I will try to 
quantify this.

>
> Consider another case that system enables CONFIG_CMA,
>
> MIGRATE_MOVABLE = 010
> MIGRATE_ISOLATE = 101
>
> start_isolate_page_range() case: 010 -> 101
> 010, 011, 001, 101
>
> undo_isolate_page_range() case: 101 -> 010
> 101, 100, 110, 010
>
> This can results in totally different values and this also makes the problem
> mentioned above. And, although this doesn't cause any problem on CMA for now,
> if another migratetype is introduced or some migratetype is removed, it can cause
> CMA typed page to go into other migratetype and makes CMA permanently failed.

This should actually be no problem for free_hot_cold_page() as any 
migratetype >= MIGRATE_PCPTYPES will defer to free_one_page() which will 
reread migratetype under zone->lock. So as long as MIGRATE_PCPTYPES does 
not include a migratetype with such dangerous "permanently failed" 
properties, it should be good. And I doubt such migratetype would be 
added to pcptypes. But of course, anyone adding new migratetype would 
have to reconsider each get_pageblock_migratetype_nolock() call for such 
potential problems.

> To close this kind of races without dependency how many pageblock isolation occurs,
> I recommend that you use separate pageblock bits for MIGRATE_CMA, MIGRATE_ISOLATE
> and use accessor function whenver we need to check migratetype. IMHO, it may not
> impose much overhead.

That could work in case the fragmentation is confirmed to be a problem.

Thanks,
Vlastimil

> How about it?
>
> Thanks.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/