linux-kernel - Re: [RFC] respect the referenced bit of KVM guest pages?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20090818234310.A64B.A69D9226@jp.fujitsu.com>
Date:	Wed, 19 Aug 2009 00:57:54 +0900 (JST)
From:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
To:	Wu Fengguang <fengguang.wu@...el.com>
Cc:	kosaki.motohiro@...fujitsu.com, Rik van Riel <riel@...hat.com>,
	Jeff Dike <jdike@...toit.com>, Avi Kivity <avi@...hat.com>,
	Andrea Arcangeli <aarcange@...hat.com>,
	"Yu, Wilfred" <wilfred.yu@...el.com>,
	"Kleen, Andi" <andi.kleen@...el.com>,
	Hugh Dickins <hugh.dickins@...cali.co.uk>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Christoph Lameter <cl@...ux-foundation.org>,
	Mel Gorman <mel@....ul.ie>,
	LKML <linux-kernel@...r.kernel.org>,
	linux-mm <linux-mm@...ck.org>
Subject: Re: [RFC] respect the referenced bit of KVM guest pages?

> > Yes it does. I said 'mostly' because there is a small hole that an
> > unevictable page may be scanned but still not moved to unevictable
> > list: when a page is mapped in two places, the first pte has the
> > referenced bit set, the _second_ VMA has VM_LOCKED bit set, then
> > page_referenced() will return 1 and shrink_page_list() will move it
> > into active list instead of unevictable list. Shall we fix this rare
> > case?
> 
> How about this fix?

Good spotting.
Yes, this is rare case. but I also don't think your patch introduce
performance degression.

However, I think your patch have one bug.

> 
> ---
> mm: stop circulating of referenced mlocked pages
> 
> Signed-off-by: Wu Fengguang <fengguang.wu@...el.com>
> ---
> 
> --- linux.orig/mm/rmap.c	2009-08-16 19:11:13.000000000 +0800
> +++ linux/mm/rmap.c	2009-08-16 19:22:46.000000000 +0800
> @@ -358,6 +358,7 @@ static int page_referenced_one(struct pa
>  	 */
>  	if (vma->vm_flags & VM_LOCKED) {
>  		*mapcount = 1;	/* break early from loop */
> +		*vm_flags |= VM_LOCKED;
>  		goto out_unmap;
>  	}
>  
> @@ -482,6 +483,8 @@ static int page_referenced_file(struct p
>  	}
>  
>  	spin_unlock(&mapping->i_mmap_lock);
> +	if (*vm_flags & VM_LOCKED)
> +		referenced = 0;
>  	return referenced;
>  }
>  

page_referenced_file?
I think we should change page_referenced().


Instead, How about this?
==============================================

Subject: [PATCH] mm: stop circulating of referenced mlocked pages

Currently, mlock() systemcall doesn't gurantee to mark the page PG_Mlocked
because some race prevent page grabbing.
In that case, instead vmscan move the page to unevictable lru.

However, Recently Wu Fengguang pointed out current vmscan logic isn't so
efficient.
mlocked page can move circulatly active and inactive list because
vmscan check the page is referenced _before_ cull mlocked page.

Plus, vmscan should mark PG_Mlocked when cull mlocked page.
Otherwise vm stastics show strange number.

This patch does that.

Reported-by: Wu Fengguang <fengguang.wu@...el.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
---
 mm/internal.h |    5 +++--
 mm/rmap.c     |    8 +++++++-
 mm/vmscan.c   |    2 +-
 3 files changed, 11 insertions(+), 4 deletions(-)

Index: b/mm/internal.h
===================================================================
--- a/mm/internal.h	2009-06-26 21:06:43.000000000 +0900
+++ b/mm/internal.h	2009-08-18 23:31:11.000000000 +0900
@@ -91,7 +91,8 @@ static inline void unevictable_migrate_p
  * to determine if it's being mapped into a LOCKED vma.
  * If so, mark page as mlocked.
  */
-static inline int is_mlocked_vma(struct vm_area_struct *vma, struct page *page)
+static inline int try_set_page_mlocked(struct vm_area_struct *vma,
+				       struct page *page)
 {
 	VM_BUG_ON(PageLRU(page));
 
@@ -144,7 +145,7 @@ static inline void mlock_migrate_page(st
 }
 
 #else /* CONFIG_HAVE_MLOCKED_PAGE_BIT */
-static inline int is_mlocked_vma(struct vm_area_struct *v, struct page *p)
+static inline int try_set_page_mlocked(struct vm_area_struct *v, struct page *p)
 {
 	return 0;
 }
Index: b/mm/rmap.c
===================================================================
--- a/mm/rmap.c	2009-08-18 19:48:14.000000000 +0900
+++ b/mm/rmap.c	2009-08-18 23:47:34.000000000 +0900
@@ -362,7 +362,9 @@ static int page_referenced_one(struct pa
 	 * unevictable list.
 	 */
 	if (vma->vm_flags & VM_LOCKED) {
-		*mapcount = 1;	/* break early from loop */
+		*mapcount = 1;		/* break early from loop */
+		*vm_flags |= VM_LOCKED;	/* for prevent to move active list */
+		try_set_page_mlocked(vma, page);
 		goto out_unmap;
 	}
 
@@ -531,6 +533,9 @@ int page_referenced(struct page *page,
 	if (page_test_and_clear_young(page))
 		referenced++;
 
+	if (unlikely(*vm_flags & VM_LOCKED))
+		referenced = 0;
+
 	return referenced;
 }
 
@@ -784,6 +789,7 @@ static int try_to_unmap_one(struct page 
 	 */
 	if (!(flags & TTU_IGNORE_MLOCK)) {
 		if (vma->vm_flags & VM_LOCKED) {
+			try_set_page_mlocked(vma, page);
 			ret = SWAP_MLOCK;
 			goto out_unmap;
 		}
Index: b/mm/vmscan.c
===================================================================
--- a/mm/vmscan.c	2009-08-18 19:48:14.000000000 +0900
+++ b/mm/vmscan.c	2009-08-18 23:30:51.000000000 +0900
@@ -2666,7 +2666,7 @@ int page_evictable(struct page *page, st
 	if (mapping_unevictable(page_mapping(page)))
 		return 0;
 
-	if (PageMlocked(page) || (vma && is_mlocked_vma(vma, page)))
+	if (PageMlocked(page) || (vma && try_set_page_mlocked(vma, page)))
 		return 0;
 
 	return 1;








--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/