linux-kernel - Re: [v3 PATCH] mm: ksm: do not block on page lock when searching stable tree

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <82ba1395-baab-3b95-a3f7-47e219551881@nvidia.com>
Date:   Tue, 29 Jan 2019 23:14:19 -0800
From:   John Hubbard <jhubbard@...dia.com>
To:     Yang Shi <yang.shi@...ux.alibaba.com>, <ktkhai@...tuozzo.com>,
        <hughd@...gle.com>, <aarcange@...hat.com>,
        <akpm@...ux-foundation.org>
CC:     <linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [v3 PATCH] mm: ksm: do not block on page lock when searching
 stable tree

On 1/29/19 12:29 PM, Yang Shi wrote:
> ksmd need search stable tree to look for the suitable KSM page, but the
> KSM page might be locked for a while due to i.e. KSM page rmap walk.
> Basically it is not a big deal since commit 2c653d0ee2ae
> ("ksm: introduce ksm_max_page_sharing per page deduplication limit"),
> since max_page_sharing limits the number of shared KSM pages.
> 
> But it still sounds not worth waiting for the lock, the page can be skip,
> then try to merge it in the next scan to avoid potential stall if its
> content is still intact.
> 
> Introduce trylock mode to get_ksm_page() to not block on page lock, like
> what try_to_merge_one_page() does.  And, define three possible
> operations (nolock, lock and trylock) as enum type to avoid stacking up
> bools and make the code more readable.
> 
> Return -EBUSY if trylock fails, since NULL means not find suitable KSM
> page, which is a valid case.
> 
> With the default max_page_sharing setting (256), there is almost no
> observed change comparing lock vs trylock.
> 
> However, with ksm02 of LTP, the reduced ksmd full scan time can be
> observed, which has set max_page_sharing to 786432.  With lock version,
> ksmd may tak 10s - 11s to run two full scans, with trylock version ksmd
> may take 8s - 11s to run two full scans.  And, the number of
> pages_sharing and pages_to_scan keep same.  Basically, this change has
> no harm >
> Cc: Hugh Dickins <hughd@...gle.com>
> Cc: Andrea Arcangeli <aarcange@...hat.com>
> Suggested-by: John Hubbard <jhubbard@...dia.com>
> Reviewed-by: Kirill Tkhai <ktkhai@...tuozzo.com>
> Signed-off-by: Yang Shi <yang.shi@...ux.alibaba.com>
> ---
> Hi folks,
> 
> This patch was with "mm: vmscan: skip KSM page in direct reclaim if priority
> is low" in the initial submission.  Then Hugh and Andrea pointed out commit
> 2c653d0ee2ae ("ksm: introduce ksm_max_page_sharing per page deduplication
> limit") is good enough for limiting the number of shared KSM page to prevent
> from softlock when walking ksm page rmap.  This commit does solve the problem.
> So, the series was dropped by Andrew from -mm tree.
> 
> However, I thought the second patch (this one) still sounds useful.  So, I did
> some test and resubmit it.  The first version was reviewed by Krill Tkhai, so
> I keep his Reviewed-by tag since there is no change to the patch except the
> commit log.
> 
> So, would you please reconsider this patch?
> 
> v3: Use enum to define get_ksm_page operations (nolock, lock and trylock) per
>      John Hubbard
> v2: Updated the commit log to reflect some test result and latest discussion
> 
>   mm/ksm.c | 46 ++++++++++++++++++++++++++++++++++++----------
>   1 file changed, 36 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 6c48ad1..5647bc1 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -667,6 +667,12 @@ static void remove_node_from_stable_tree(struct stable_node *stable_node)
>   	free_stable_node(stable_node);
>   }
>   
> +enum get_ksm_page_flags {
> +	GET_KSM_PAGE_NOLOCK,
> +	GET_KSM_PAGE_LOCK,
> +	GET_KSM_PAGE_TRYLOCK
> +};
> +
>   /*
>    * get_ksm_page: checks if the page indicated by the stable node
>    * is still its ksm page, despite having held no reference to it.
> @@ -686,7 +692,8 @@ static void remove_node_from_stable_tree(struct stable_node *stable_node)
>    * a page to put something that might look like our key in page->mapping.
>    * is on its way to being freed; but it is an anomaly to bear in mind.
>    */
> -static struct page *get_ksm_page(struct stable_node *stable_node, bool lock_it)
> +static struct page *get_ksm_page(struct stable_node *stable_node,
> +				 enum get_ksm_page_flags flags)
>   {
>   	struct page *page;
>   	void *expected_mapping;
> @@ -728,8 +735,15 @@ static struct page *get_ksm_page(struct stable_node *stable_node, bool lock_it)
>   		goto stale;
>   	}
>   
> -	if (lock_it) {
> +	if (flags == GET_KSM_PAGE_TRYLOCK) {
> +		if (!trylock_page(page)) {
> +			put_page(page);
> +			return ERR_PTR(-EBUSY);
> +		}
> +	} else if (flags == GET_KSM_PAGE_LOCK)
>   		lock_page(page);
> +
> +	if (flags != GET_KSM_PAGE_NOLOCK) {
>   		if (READ_ONCE(page->mapping) != expected_mapping) {
>   			unlock_page(page);
>   			put_page(page);
> @@ -763,7 +777,7 @@ static void remove_rmap_item_from_tree(struct rmap_item *rmap_item)
>   		struct page *page;
>   
>   		stable_node = rmap_item->head;
> -		page = get_ksm_page(stable_node, true);
> +		page = get_ksm_page(stable_node, GET_KSM_PAGE_LOCK);
>   		if (!page)
>   			goto out;
>   
> @@ -863,7 +877,7 @@ static int remove_stable_node(struct stable_node *stable_node)
>   	struct page *page;
>   	int err;
>   
> -	page = get_ksm_page(stable_node, true);
> +	page = get_ksm_page(stable_node, GET_KSM_PAGE_LOCK);
>   	if (!page) {
>   		/*
>   		 * get_ksm_page did remove_node_from_stable_tree itself.
> @@ -1385,7 +1399,7 @@ static struct page *stable_node_dup(struct stable_node **_stable_node_dup,
>   		 * stable_node parameter itself will be freed from
>   		 * under us if it returns NULL.
>   		 */
> -		_tree_page = get_ksm_page(dup, false);
> +		_tree_page = get_ksm_page(dup, GET_KSM_PAGE_NOLOCK);
>   		if (!_tree_page)
>   			continue;
>   		nr += 1;
> @@ -1508,7 +1522,7 @@ static struct page *__stable_node_chain(struct stable_node **_stable_node_dup,
>   	if (!is_stable_node_chain(stable_node)) {
>   		if (is_page_sharing_candidate(stable_node)) {
>   			*_stable_node_dup = stable_node;
> -			return get_ksm_page(stable_node, false);
> +			return get_ksm_page(stable_node, GET_KSM_PAGE_NOLOCK);
>   		}
>   		/*
>   		 * _stable_node_dup set to NULL means the stable_node
> @@ -1613,7 +1627,8 @@ static struct page *stable_tree_search(struct page *page)
>   			 * wrprotected at all times. Any will work
>   			 * fine to continue the walk.
>   			 */
> -			tree_page = get_ksm_page(stable_node_any, false);
> +			tree_page = get_ksm_page(stable_node_any,
> +						 GET_KSM_PAGE_NOLOCK);
>   		}
>   		VM_BUG_ON(!stable_node_dup ^ !!stable_node_any);
>   		if (!tree_page) {
> @@ -1673,7 +1688,12 @@ static struct page *stable_tree_search(struct page *page)
>   			 * It would be more elegant to return stable_node
>   			 * than kpage, but that involves more changes.
>   			 */
> -			tree_page = get_ksm_page(stable_node_dup, true);
> +			tree_page = get_ksm_page(stable_node_dup,
> +						 GET_KSM_PAGE_TRYLOCK);
> +
> +			if (PTR_ERR(tree_page) == -EBUSY)
> +				return ERR_PTR(-EBUSY);

or just:

	if (PTR_ERR(tree_page) == -EBUSY)
		return tree_page;

right?

> +
>   			if (unlikely(!tree_page))
>   				/*
>   				 * The tree may have been rebalanced,
> @@ -1842,7 +1862,8 @@ static struct stable_node *stable_tree_insert(struct page *kpage)
>   			 * wrprotected at all times. Any will work
>   			 * fine to continue the walk.
>   			 */
> -			tree_page = get_ksm_page(stable_node_any, false);
> +			tree_page = get_ksm_page(stable_node_any,
> +						 GET_KSM_PAGE_NOLOCK);
>   		}
>   		VM_BUG_ON(!stable_node_dup ^ !!stable_node_any);
>   		if (!tree_page) {
> @@ -2060,6 +2081,10 @@ static void cmp_and_merge_page(struct page *page, struct rmap_item *rmap_item)
>   
>   	/* We first start with searching the page inside the stable tree */
>   	kpage = stable_tree_search(page);
> +
> +	if (PTR_ERR(kpage) == -EBUSY)
> +		return;
> +
>   	if (kpage == page && rmap_item->head == stable_node) {
>   		put_page(kpage);
>   		return;
> @@ -2242,7 +2267,8 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
>   
>   			list_for_each_entry_safe(stable_node, next,
>   						 &migrate_nodes, list) {
> -				page = get_ksm_page(stable_node, false);
> +				page = get_ksm_page(stable_node,
> +						    GET_KSM_PAGE_NOLOCK);
>   				if (page)
>   					put_page(page);
>   				cond_resched();
> 

Hi Yang,

The patch looks correct as far doing what it claims to do. I'll leave it
to others to decide if a trylock-based approach is really what you want,
for KSM scans. It seems reasonable from my very limited knowledge of
KSM: there shouldn't be any cases where you really *need* to wait for
a page lock, because the whole system is really sort of an optimization
anyway.


thanks,
-- 
John Hubbard
NVIDIA