linux-kernel - Re: [patch 3/3][rfc] vmscan: batched swap slot allocation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20090421095857.b989ce44.kamezawa.hiroyu@jp.fujitsu.com>
Date:	Tue, 21 Apr 2009 09:58:57 +0900
From:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To:	Johannes Weiner <hannes@...xchg.org>
Cc:	Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, Rik van Riel <riel@...hat.com>,
	Hugh Dickins <hugh@...itas.com>
Subject: Re: [patch 3/3][rfc] vmscan: batched swap slot allocation

On Mon, 20 Apr 2009 22:24:45 +0200
Johannes Weiner <hannes@...xchg.org> wrote:

> Every swap slot allocation tries to be subsequent to the previous one
> to help keeping the LRU order of anon pages intact when they are
> swapped out.
> 
> With an increasing number of concurrent reclaimers, the average
> distance between two subsequent slot allocations of one reclaimer
> increases as well.  The contiguous LRU list chunks each reclaimer
> swaps out get 'multiplexed' on the swap space as they allocate the
> slots concurrently.
> 
> 	2 processes isolating 15 pages each and allocating swap slots
> 	concurrently:
> 
> 	#0			#1
> 
> 	page 0 slot 0		page 15 slot 1
> 	page 1 slot 2		page 16 slot 3
> 	page 2 slot 4		page 17 slot 5
> 	...
> 
> 	-> average slot distance of 2
> 
> All reclaimers being equally fast, this becomes a problem when the
> total number of concurrent reclaimers gets so high that even equal
> distribution makes the average distance between the slots of one
> reclaimer too wide for optimistic swap-in to compensate.
> 
> But right now, one reclaimer can take much longer than another one
> because its pages are mapped into more page tables and it has thus
> more work to do and the faster reclaimer will allocate multiple swap
> slots between two slot allocations of the slower one.
> 
> This patch makes shrink_page_list() allocate swap slots in batches,
> collecting all the anonymous memory pages in a list without
> rescheduling and actual reclaim in between.  And only after all anon
> pages are swap cached, unmap and write-out starts for them.
> 
> While this does not fix the fundamental issue of slot distribution
> increasing with reclaimers, it mitigates the problem by balancing the
> resulting fragmentation equally between the allocators.
> 
> Signed-off-by: Johannes Weiner <hannes@...xchg.org>
> Cc: Rik van Riel <riel@...hat.com>
> Cc: Hugh Dickins <hugh@...itas.com>
> ---
>  mm/vmscan.c |   49 +++++++++++++++++++++++++++++++++++++++++--------
>  1 files changed, 41 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 70092fa..b3823fe 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -592,24 +592,42 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  					enum pageout_io sync_writeback)
>  {
>  	LIST_HEAD(ret_pages);
> +	LIST_HEAD(swap_pages);
>  	struct pagevec freed_pvec;
> -	int pgactivate = 0;
> +	int pgactivate = 0, restart = 0;
>  	unsigned long nr_reclaimed = 0;
>  
>  	cond_resched();
>  
>  	pagevec_init(&freed_pvec, 1);
> +restart:
>  	while (!list_empty(page_list)) {
>  		struct address_space *mapping;
>  		struct page *page;
>  		int may_enter_fs;
>  		int referenced;
>  
> -		cond_resched();
> +		if (list_empty(&swap_pages))
> +			cond_resched();
>  
Why this ?

>  		page = lru_to_page(page_list);
>  		list_del(&page->lru);
>  
> +		if (restart) {
> +			/*
> +			 * We are allowed to do IO when we restart for
> +			 * swap pages.
> +			 */
> +			may_enter_fs = 1;
> +			/*
> +			 * Referenced pages will be sorted out by
> +			 * try_to_unmap() and unmapped (anon!) pages
> +			 * are not to be referenced anymore.
> +			 */
> +			referenced = 0;
> +			goto reclaim;
> +		}
> +
>  		if (!trylock_page(page))
>  			goto keep;
>  
Keeping multiple pages locked while they stay on private list ? 

BTW, isn't it better to add "allocate multiple swap space at once" function
like
 - void get_swap_pages(nr, swp_entry_array[])
? "nr" will not be bigger than SWAP_CLUSTER_MAX.

Regards,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/