linux-kernel - Re: [early RFC][PATCH 8/7] vmscan: Don't deactivate many touched page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4B1D4513.1020206@redhat.com>
Date:	Mon, 07 Dec 2009 13:10:27 -0500
From:	Rik van Riel <riel@...hat.com>
To:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
CC:	Andrea Arcangeli <aarcange@...hat.com>,
	LKML <linux-kernel@...r.kernel.org>,
	linux-mm <linux-mm@...ck.org>,
	Larry Woodman <lwoodman@...hat.com>
Subject: Re: [early RFC][PATCH 8/7] vmscan: Don't deactivate many touched
 page

On 12/07/2009 06:36 AM, KOSAKI Motohiro wrote:
>
> Andrea, Can you please try following patch on your workload?
>
>
>  From a7758c66d36a136d5fbbcf0b042839445f0ca522 Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro<kosaki.motohiro@...fujitsu.com>
> Date: Mon, 7 Dec 2009 18:37:20 +0900
> Subject: [PATCH] [RFC] vmscan: Don't deactivate many touched page
>
> Changelog
>   o from andrea's original patch
>     - Rebase topon my patches.
>     - Use list_cut_position/list_splice_tail pair instead
>       list_del/list_add to make pte scan fairness.
>     - Only use max young threshold when soft_try is true.
>       It avoid wrong OOM sideeffect.
>     - Return SWAP_AGAIN instead successful result if max
>       young threshold exceed. It prevent the pages without clear
>       pte young bit will be deactivated wrongly.
>     - Add to treat ksm page logic

I like the concept and your changes, and really only
have a few small nitpicks :)

First, the VM uses a mix of "referenced", "accessed" and
"young".  We should probably avoid adding "active" to that
mix, and may even want to think about moving to just one
or two terms :)

> +#define MAX_YOUNG_BIT_CLEARED 64
> +/*
> + * if VM pressure is low and the page have too many active mappings, there isn't
> + * any reason to continue clear young bit of other ptes. Otherwise,
> + *  - Makes meaningless cpu wasting, many touched page sholdn't be reclaimed.
> + *  - Makes lots IPI for pte change and it might cause another sadly lock
> + *    contention.
> + */

If VM pressure is low and the page has lots of active users, we only
clear up to MAX_YOUNG_BIT_CLEARED accessed bits at a time.  Clearing
accessed bits takes CPU time, needs TLB invalidate IPIs and could
cause lock contention.  Since a heavily shared page is very likely
to be used again soon, the cost outweighs the benefit of making such
a heavily shared page a candidate for eviction.

> diff --git a/mm/rmap.c b/mm/rmap.c
> index cfda0a0..f4517f3 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -473,6 +473,21 @@ static int wipe_page_reference_anon(struct page *page,
>   		ret = wipe_page_reference_one(page, refctx, vma, address);
>   		if (ret != SWAP_SUCCESS)
>   			break;
> +		if (too_many_young_bit_found(refctx)) {
> +			LIST_HEAD(tmp_list);
> +
> +			/*
> +			 * The scanned ptes move to list tail. it help every ptes
> +			 * on this page will be tested by ptep_clear_young().
> +			 * Otherwise, this shortcut makes unfair thing.
> +			 */
> +			list_cut_position(&tmp_list,
> +					&vma->anon_vma_node,
> +					&anon_vma->head);
> +			list_splice_tail(&tmp_list,&vma->anon_vma_node);
> +			ret = SWAP_AGAIN;
> +			break;
> +		}

I do not understand the unfairness here, since all a page needs
to stay on the active list is >64 referenced PTEs.  It does not
matter which of the PTEs mapping the page were recently referenced.

However, rotating the anon vmas around may help spread out lock
pressure in the VM and help things that way, so the code looks
useful to me.

In short, you can give the next version of this patch my

Reviewed-by: Rik van Riel <riel@...hat.com>

All I have are comment nitpicks :)

-- 
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/