[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <53E37953.4000506@suse.cz>
Date: Thu, 07 Aug 2014 15:04:19 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Joonsoo Kim <js1304@...il.com>
CC: Joonsoo Kim <iamjoonsoo.kim@....com>,
Andrew Morton <akpm@...ux-foundation.org>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Rik van Riel <riel@...hat.com>, Mel Gorman <mgorman@...e.de>,
Johannes Weiner <hannes@...xchg.org>,
Minchan Kim <minchan@...nel.org>,
Yasuaki Ishimatsu <isimatu.yasuaki@...fujitsu.com>,
Zhang Yanfei <zhangyanfei@...fujitsu.com>,
"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>,
Tang Chen <tangchen@...fujitsu.com>,
Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
Bartlomiej Zolnierkiewicz <b.zolnierkie@...sung.com>,
Wen Congyang <wency@...fujitsu.com>,
Marek Szyprowski <m.szyprowski@...sung.com>,
Michal Nazarewicz <mina86@...a86.com>,
Laura Abbott <lauraa@...eaurora.org>,
Heesub Shin <heesub.shin@...sung.com>,
"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
Ritesh Harjani <ritesh.list@...il.com>,
t.stanislaws@...sung.com, Gioh Kim <gioh.kim@....com>,
Linux Memory Management List <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic
to fix freepage counting bugs
On 08/07/2014 02:26 PM, Joonsoo Kim wrote:
> 2014-08-07 17:53 GMT+09:00 Vlastimil Babka <vbabka@...e.cz>:
>> Ah, right. I thought that everything going to pcp lists would be through
>> freeing which would already observe the isolate migratetype and skip
>> pcplist. I forgot about the direct filling of pcplists from buddy list.
>> You're right that we don't want extra hooks there.
>>
>> Still, couldn't this be solved in a simpler way via another pcplist drain
>> after the pages are moved from normal to isolate buddy list? Should be even
>> faster because instead of disable - drain - enable (5 all-cpu kicks, since
>> each pageset_update does 2 kicks) you have drain - drain (2 kicks). While
>> it's true that pageset_update is single-zone operation, I guess we would
>> easily benefit from having a single-zone drain operation as well.
>
> I hope so, but, it's not possible. Consider following situation.
>
> Page A: on pcplist of CPU2 and it is on isolate pageblock.
>
> CPU 1 CPU 2
> drain pcplist
> wait IPI finished move A to normal buddy list
> finish IPI
> A is moved to pcplist by allocation request
>
> move doesn't catch A,
> because it is on pcplist.
>
> drain pcplist
> wait IPI finished move A to normal buddy list
> finish IPI
> A is moved to pcplist by allocation request
>
> repeat!!
>
> It could happen infinitely, though, low possibility.
Hm I see. Not a correctness issue, but still a failure to isolate.
Probably not impossible with enough CPU's and considering the fact that
after pcplists are drained, the next allocation request will try to
refill them. And during the drain, the pages are added to the beginning
of the free_list AFAICS, so they will be in the first refill batch.
OK, another attempt for alternative solution proposal :) It's not that I
would think disabling pcp would be so bad, just want to be sure there is
no better alternative.
What if the drain operation had a flag telling it to recheck pageblock
migratetype and don't assume it's on the correct pcplist. Then the
problem would go away I think? Would it be possible to do without
affecting the normal drain-pcplist-when-full path? So that the cost is
only applied to isolation, but lower cost than pcplist disabling.
Actually I look that free_pcppages_bulk() doesn't consider migratetype
of the pcplist, but uses get_freepage_migratetype(page). So the pcplist
drain could first scan the pcplists and rewrite the freepage_migratetype
according to pageblock_migratetype. Then the free_pcppages_bulk()
operation would be unchanged for normal operation.
Or is this too clumsy? We could be also smart and have an alternative to
free_pcppages_bulk() which would omit the round-robin stuff (not needed
for this kind of drain), and have a pfn range to limit its operation to
pages that we are isolating.
Hm I guess with this approach some pages might still escape us if they
were moving between normal buddy list and pcplist through rmqueue_bulk()
and free_pcppages_bulk() (and not through our drain) at the wrong
moments, but I guess that would require a really specific workload
(alternating between burst of allocations and deallocations) and
consistently unlucky timing.
> Thanks.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists