linux-kernel - Re: [PATCH 1/2] lumpy reclaim: clean up and write lumpy reclaim

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20090610153238.DDC0.A69D9226@jp.fujitsu.com>
Date:	Wed, 10 Jun 2009 15:32:58 +0900 (JST)
From:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
To:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Cc:	kosaki.motohiro@...fujitsu.com,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	apw@...onical.com, riel@...hat.com, minchan.kim@...il.com,
	mel@....ul.ie
Subject: Re: [PATCH 1/2] lumpy reclaim: clean up and write lumpy reclaim

> On Wed, 10 Jun 2009 15:11:21 +0900 (JST)
> KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com> wrote:
> 
> > > I think lumpy reclaim should be updated to meet to current split-lru.
> > > This patch includes bugfix and cleanup. How do you think ?
> > > 
> > > ==
> > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
> > > 
> > > In lumpty reclaim, "cursor_page" is found just by pfn. Then, we don't know
> > > where "cursor" page came from. Then, putback it to "src" list is BUG.
> > > And as pointed out, current lumpy reclaim doens't seem to
> > > work as originally designed and a bit complicated. This patch adds a
> > > function try_lumpy_reclaim() and rewrite the logic.
> > > 
> > > The major changes from current lumpy reclaim is
> > >   - check migratetype before aggressive retry at failure.
> > >   - check PG_unevictable at failure.
> > >   - scan is done in buddy system order. This is a help for creating
> > >     a lump around targeted page. We'll create a continuous pages for buddy
> > >     allocator as far as we can _around_ reclaim target page.
> > > 
> > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
> > > ---
> > >  mm/vmscan.c |  120 +++++++++++++++++++++++++++++++++++-------------------------
> > >  1 file changed, 71 insertions(+), 49 deletions(-)
> > > 
> > > Index: mmotm-2.6.30-Jun10/mm/vmscan.c
> > > ===================================================================
> > > --- mmotm-2.6.30-Jun10.orig/mm/vmscan.c
> > > +++ mmotm-2.6.30-Jun10/mm/vmscan.c
> > > @@ -850,6 +850,69 @@ int __isolate_lru_page(struct page *page
> > >  	return ret;
> > >  }
> > >  
> > > +static int
> > > +try_lumpy_reclaim(struct page *page, struct list_head *dst, int request_order)
> > > +{
> > > +	unsigned long buddy_base, buddy_idx, buddy_start_pfn, buddy_end_pfn;
> > > +	unsigned long pfn, page_pfn, page_idx;
> > > +	int zone_id, order, type;
> > > +	int do_aggressive = 0;
> > > +	int nr = 0;
> > > +	/*
> > > +	 * Lumpy reqraim. Try to take near pages in requested order to
> > > +	 * create free continous pages. This algorithm tries to start
> > > +	 * from order 0 and scan buddy pages up to request_order.
> > > +	 * If you are unsure about buddy position calclation, please see
> > > +	 * mm/page_alloc.c
> > > +	 */
> > > +	zone_id = page_zone_id(page);
> > > +	page_pfn = page_to_pfn(page);
> > > +	buddy_base = page_pfn & ~((1 << MAX_ORDER) - 1);
> > > +
> > > +	/* Can we expect succesful reclaim ? */
> > > +	type = get_pageblock_migratetype(page);
> > > +	if ((type == MIGRATE_MOVABLE) || (type == MIGRATE_RECLAIMABLE))
> > > +		do_aggressive = 1;
> > > +
> > > +	for (order = 0; order < request_order; ++order) {
> > > +		/* offset in this buddy region */
> > > +		page_idx = page_pfn & ~buddy_base;
> > > +		/* offset of buddy can be calculated by xor */
> > > +		buddy_idx = page_idx ^ (1 << order);
> > > +		buddy_start_pfn = buddy_base + buddy_idx;
> > > +		buddy_end_pfn = buddy_start_pfn + (1 << order);
> > > +
> > > +		/* scan range [buddy_start_pfn...buddy_end_pfn) */
> > > +		for (pfn = buddy_start_pfn; pfn < buddy_end_pfn; ++pfn) {
> > > +			/* Avoid holes within the zone. */
> > > +			if (unlikely(!pfn_valid_within(pfn)))
> > > +				break;
> > > +			page = pfn_to_page(pfn);
> > > +			/*
> > > +			 * Check that we have not crossed a zone boundary.
> > > +			 * Some arch have zones not aligned to MAX_ORDER.
> > > +			 */
> > > +			if (unlikely(page_zone_id(page) != zone_id))
> > > +				break;
> > > +
> > > +			/* we are always under ISOLATE_BOTH */
> > > +			if (__isolate_lru_page(page, ISOLATE_BOTH, 0) == 0) {
> > > +				list_move(&page->lru, dst);
> > > +				nr++;
> > > +			} else if (do_aggressive && !PageUnevictable(page))
> > 
> > Could you explain this branch intention more?
> > 
> __isolate_lru_page() can fail in following case
>   - the page is not on LRU.
>         This implies
> 		(a) the page is not for anon/file-cache
> 		(b) the page is taken off from LRU by shirnk_list or pagevec.
> 		(c) the page is free.
>    - the page is temorarlly busy.
> 
> So, aborting this loop here directly is not very good. But if the page is for
> kernel' usage or unevictable,  contuning this loop just wastes time.
> 
> Then, I used migrate_type attribute for the target page.
> migrate_type is determined per pageblock_order (This itself detemined by
> sizeo of hugepage at el. see  include/linux/pageblock-flags.h)
> 
> If the page is under MIGRATE_MOVABLE
> 	- at least 50% of nearby pages are used for GFP_MOVABLE(GFP_HIGHUSER_MOVABLE)
>    the page is udner MIGRATE_REMOVABLE
> 	- at least 50% of nearby pages are used for  GFP_TEMPORARY
> 
> Then, we can expect meaningful lumpy reclaim if do_aggressive == 1.
> If do_aggressive==0, nearby pages are used for some kernel usage and not suitable
> for _this_ lumpy reclaim.
> 
> How about a comment like this ?
> /*
>  * __isolate_lru_page() returns busy status in many reason. If we are under
>  * migrate type of MIGRATE_MOVABLE/MIGRATE_REMOVABLE, we can expect nearby
>  * pages are just temporally busy and should be reclaimed later. (If the page
>  * is _now_ free or being freed, __isolate_lru_page() returns -EBUSY.)
>  * Then, continue this loop.
>  */

OK, looks good.
thanks.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/