linux-kernel - Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <53E37953.4000506@suse.cz>
Date:	Thu, 07 Aug 2014 15:04:19 +0200
From:	Vlastimil Babka <vbabka@...e.cz>
To:	Joonsoo Kim <js1304@...il.com>
CC:	Joonsoo Kim <iamjoonsoo.kim@....com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	Rik van Riel <riel@...hat.com>, Mel Gorman <mgorman@...e.de>,
	Johannes Weiner <hannes@...xchg.org>,
	Minchan Kim <minchan@...nel.org>,
	Yasuaki Ishimatsu <isimatu.yasuaki@...fujitsu.com>,
	Zhang Yanfei <zhangyanfei@...fujitsu.com>,
	"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>,
	Tang Chen <tangchen@...fujitsu.com>,
	Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
	Bartlomiej Zolnierkiewicz <b.zolnierkie@...sung.com>,
	Wen Congyang <wency@...fujitsu.com>,
	Marek Szyprowski <m.szyprowski@...sung.com>,
	Michal Nazarewicz <mina86@...a86.com>,
	Laura Abbott <lauraa@...eaurora.org>,
	Heesub Shin <heesub.shin@...sung.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
	Ritesh Harjani <ritesh.list@...il.com>,
	t.stanislaws@...sung.com, Gioh Kim <gioh.kim@....com>,
	Linux Memory Management List <linux-mm@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic
 to fix freepage counting bugs

On 08/07/2014 02:26 PM, Joonsoo Kim wrote:
> 2014-08-07 17:53 GMT+09:00 Vlastimil Babka <vbabka@...e.cz>:
>> Ah, right. I thought that everything going to pcp lists would be through
>> freeing which would already observe the isolate migratetype and skip
>> pcplist. I forgot about the direct filling of pcplists from buddy list.
>> You're right that we don't want extra hooks there.
>>
>> Still, couldn't this be solved in a simpler way via another pcplist drain
>> after the pages are moved from normal to isolate buddy list? Should be even
>> faster because instead of disable - drain - enable (5 all-cpu kicks, since
>> each pageset_update does 2 kicks) you have drain - drain (2 kicks). While
>> it's true that pageset_update is single-zone operation, I guess we would
>> easily benefit from having a single-zone drain operation as well.
>
> I hope so, but, it's not possible. Consider following situation.
>
> Page A: on pcplist of CPU2 and it is on isolate pageblock.
>
> CPU 1                   CPU 2
> drain pcplist
> wait IPI finished     move A to normal buddy list
> finish IPI
>                              A is moved to pcplist by allocation request
>
> move doesn't catch A,
> because it is on pcplist.
>
> drain pcplist
> wait IPI finished     move A to normal buddy list
> finish IPI
>                              A is moved to pcplist by allocation request
>
> repeat!!
>
> It could happen infinitely, though, low possibility.

Hm I see. Not a correctness issue, but still a failure to isolate. 
Probably not impossible with enough CPU's and considering the fact that 
after pcplists are drained, the next allocation request will try to 
refill them. And during the drain, the pages are added to the beginning 
of the free_list AFAICS, so they will be in the first refill batch.

OK, another attempt for alternative solution proposal :) It's not that I 
would think disabling pcp would be so bad, just want to be sure there is 
no better alternative.

What if the drain operation had a flag telling it to recheck pageblock 
migratetype and don't assume it's on the correct pcplist. Then the 
problem would go away I think? Would it be possible to do without 
affecting the normal drain-pcplist-when-full path? So that the cost is 
only applied to isolation, but lower cost than pcplist disabling.

Actually I look that free_pcppages_bulk() doesn't consider migratetype 
of the pcplist, but uses get_freepage_migratetype(page). So the pcplist 
drain could first scan the pcplists and rewrite the freepage_migratetype 
according to pageblock_migratetype. Then the free_pcppages_bulk() 
operation would be unchanged for normal operation.

Or is this too clumsy? We could be also smart and have an alternative to 
free_pcppages_bulk() which would omit the round-robin stuff (not needed 
for this kind of drain), and have a pfn range to limit its operation to 
pages that we are isolating.

Hm I guess with this approach some pages might still escape us if they 
were moving between normal buddy list and pcplist through rmqueue_bulk() 
and free_pcppages_bulk() (and not through our drain) at the wrong 
moments, but I guess that would require a really specific workload 
(alternating between burst of allocations and deallocations) and 
consistently unlucky timing.

> Thanks.
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/