lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140716084333.GA20359@js1304-P5Q-DELUXE>
Date:	Wed, 16 Jul 2014 17:43:33 +0900
From:	Joonsoo Kim <iamjoonsoo.kim@....com>
To:	Vlastimil Babka <vbabka@...e.cz>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	Rik van Riel <riel@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Mel Gorman <mgorman@...e.de>,
	Johannes Weiner <hannes@...xchg.org>,
	Minchan Kim <minchan@...nel.org>,
	Yasuaki Ishimatsu <isimatu.yasuaki@...fujitsu.com>,
	Zhang Yanfei <zhangyanfei@...fujitsu.com>,
	"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>,
	Tang Chen <tangchen@...fujitsu.com>,
	Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
	Bartlomiej Zolnierkiewicz <b.zolnierkie@...sung.com>,
	Wen Congyang <wency@...fujitsu.com>,
	Marek Szyprowski <m.szyprowski@...sung.com>,
	Michal Nazarewicz <mina86@...a86.com>,
	Laura Abbott <lauraa@...eaurora.org>,
	Heesub Shin <heesub.shin@...sung.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
	Ritesh Harjani <ritesh.list@...il.com>,
	t.stanislaws@...sung.com, Gioh Kim <gioh.kim@....com>,
	linux-mm@...ck.org, Lisa Du <cldu@...vell.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 00/10] fix freepage count problems due to memory isolation

On Tue, Jul 15, 2014 at 10:36:35AM +0200, Vlastimil Babka wrote:
> >>A non-trivial fix that comes to mind (and I might have overlooked
> >>something) is something like:
> >>
> >>- distinguish MIGRATETYPE_ISOLATING and MIGRATETYPE_ISOLATED
> >>- CPU1 first sets MIGRATETYPE_ISOLATING before the drain
> >>- when CPU2 sees MIGRATETYPE_ISOLATING, it just puts the page on
> >>special unbounded pcplist and that's it
> >>- CPU1 does the drain as usual, potentially misplacing some pages
> >>that move_freepages_block() will then fix. But no wrong merging can
> >>occur.
> >>- after move_freepages_block(), CPU1 changes MIGRATETYPE_ISOLATING
> >>to MIGRATETYPE_ISOLATED
> >>- CPU2 can then start freeing directly on isolate buddy list. There
> >>might be some pages still on the special pcplist of CPU2/CPUx but
> >>that means they won't merge yet.
> >>- CPU1 executes on all CPU's a new operation that flushes the
> >>special pcplist on isolate buddy list and merge as needed.
> >>
> >
> >Really thanks for sharing idea.
> 
> Ah, you didn't find a hole yet, good sign :D
> 
> >It looks possible but I guess that it needs more branches related to
> >pageblock isolation. Now I have a quick thought to prevent merging,
> >but, I'm not sure that it is better than current patchset. After more
> >thinking, I will post rough idea here.
> 
> I was thinking about it more and maybe it wouldn't need a new
> migratetype after all. But it would always need to free isolate
> pages on the special pcplist. That means this pcplist would be used
> not only during the call to start_isolate_page_range, but all the
> way until undo_isolate_page_range(). I don't think it's a problem
> and it simplifies things. The only way to move to isolate freelist
> is through the new isolate pcplist flush operation initiated by a
> single CPU at well defined time.
> 
> The undo would look like:
> - (migratetype is still set to MIGRATETYPE_ISOLATE, CPU2 frees
> affected pages to the special freelist)
> - CPU1 does move_freepages_block() to put pages back from isolate
> freelist to e.g. MOVABLE or CMA. At this point, nobody will put new
> pages on isolate freelist.
> - CPU1 changes migratetype of the pageblock to e.g. MOVABLE. CPU2
> and others start freeing normally. Merging can occur only on the
> MOVABLE freelist, as isolate freelist is empty and nobody puts pages
> there.
> - CPU1 flushes the isolate pcplists of all CPU's on the MOVABLE
> freelist. Merging is again correct.
> 
> I think your plan of multiple parallel CMA allocations (and thus
> multiple parallel isolations) is also possible. The isolate pcplists
> can be shared by pages coming from multiple parallel isolations. But
> the flush operation needs a pfn start/end parameters to only flush
> pages belonging to the given isolation. That might mean a bit of
> inefficient list traversing, but I don't think it's a problem.

I think that special pcplist would cause a problem if we should check
pfn range. If there are too many pages on this pcplist, move pages from
this pcplist to isolate freelist takes too long time in irq context and
system could be broken. This operation cannot be easily stopped because
it is initiated by IPI on other cpu and starter of this IPI expect that
all pages on other cpus' pcplist are moved properly when returning
from on_each_cpu().

And, if there are so many pages, serious lock contention would happen
in this case.

Anyway, my idea's key point is using PageIsolated() to distinguish
isolated page, instead of using PageBuddy(). If page is PageIsolated(),
it isn't handled as freepage although it is in buddy allocator. During free,
page with MIGRATETYPE_ISOLATE will be marked as PageIsolated() and
won't be merged and counted for freepage.

When we move pages from normal buddy list to isolate buddy
list, we check PageBuddy() and subtract number of PageBuddy() pages
from number of freepage. And, change page from PageBuddy() to PageIsolated()
since it is handled as isolated page at this point. In this way, freepage
count will be correct.

Unisolation can be done by similar approach.

I made prototype of this approach and it isn't intrusive to core
allocator compared to my previous patchset.

Make sense?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ