lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 3 Mar 2015 12:25:51 +0900
From:	Minchan Kim <minchan@...nel.org>
To:	"Wang, Yalin" <Yalin.Wang@...ymobile.com>
Cc:	'Michal Hocko' <mhocko@...e.cz>,
	'Andrew Morton' <akpm@...ux-foundation.org>,
	"'linux-kernel@...r.kernel.org'" <linux-kernel@...r.kernel.org>,
	"'linux-mm@...ck.org'" <linux-mm@...ck.org>,
	'Rik van Riel' <riel@...hat.com>,
	'Johannes Weiner' <hannes@...xchg.org>,
	'Mel Gorman' <mgorman@...e.de>, 'Shaohua Li' <shli@...nel.org>,
	Hugh Dickins <hughd@...gle.com>,
	Cyrill Gorcunov <gorcunov@...il.com>
Subject: Re: [RFC V3] mm: change mm_advise_free to clear page dirty

Could you separte this patch in this patchset thread?
It's tackling differnt problem.

As well, I had a question to previous thread about why shared page
has a problem now but you didn't answer and send a new patchset.
It makes reviewers/maintainer time waste/confuse. Please, don't
hurry to send a code. Before that, resolve reviewers's comments.

On Tue, Mar 03, 2015 at 10:06:40AM +0800, Wang, Yalin wrote:
> This patch add ClearPageDirty() to clear AnonPage dirty flag,
> if not clear page dirty for this anon page, the page will never be
> treated as freeable. We also make sure the shared AnonPage is not
> freeable, we implement it by dirty all copyed AnonPage pte,
> so that make sure the Anonpage will not become freeable, unless
> all process which shared this page call madvise_free syscall.

Please, spend more time to make description clear. I really doubt
who understand this description without code inspection. :(
Of course, I'm not a person to write description clear like native
, either but just I'm sure I spend a more time to write description
rather than coding, at least. :)

> 
> Signed-off-by: Yalin Wang <yalin.wang@...ymobile.com>
> ---
>  mm/madvise.c | 16 +++++++++-------
>  mm/memory.c  | 12 ++++++++++--
>  2 files changed, 19 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 6d0fcb8..b61070d 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -297,23 +297,25 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>  			continue;
>  
>  		page = vm_normal_page(vma, addr, ptent);
> -		if (!page)
> +		if (!page || !trylock_page(page))
>  			continue;
>  
>  		if (PageSwapCache(page)) {
> -			if (!trylock_page(page))
> -				continue;
> -
>  			if (!try_to_free_swap(page)) {
>  				unlock_page(page);
>  				continue;
>  			}
> -
> -			ClearPageDirty(page);
> -			unlock_page(page);
>  		}
>  
>  		/*
> +		 * we clear page dirty flag for AnonPage, no matter if this
> +		 * page is in swapcahce or not, AnonPage not in swapcache also set
> +		 * dirty flag sometimes, this happened when a AnonPage is removed
> +		 * from swapcahce by try_to_free_swap()
> +		 */
> +		ClearPageDirty(page);
> +		unlock_page(page);
> +		/*

Parent:

ptrP = malloc();
*ptrP = 'a';
fork(); -> child process pte has dirty by your patch
..
memory pressure -> So, swapped out the page.
..
..
Child: var = *ptrP; assert(var =='a') -> So, swapin happens and child has pte_clean
parent: var = *ptrP; aasert(var == 'a') -> So, swapin happens and parent has pte_clean
..
..
Parent:
madvise_free -> remove PageDirty
So, both parent and child has pte_clean and !PageDirty, which
is target for VM to discard a page.
..
VM discard the page by memory pressure.
..
Child: var = *ptrP: assert(var == 'a'); <---- oops.

And blindly ClearPageDirty makes duplicates swap out.

>  		 * Some of architecture(ex, PPC) don't update TLB
>  		 * with set_pte_at and tlb_remove_tlb_entry so for
>  		 * the portability, remap the pte with old|clean
> diff --git a/mm/memory.c b/mm/memory.c
> index 8068893..3d949b3 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -874,10 +874,18 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>  	if (page) {
>  		get_page(page);
>  		page_dup_rmap(page);
> -		if (PageAnon(page))
> +		if (PageAnon(page)) {
> +			/*
> +			 * we dirty the copyed pte for anon page,
> +			 * this is useful for madvise_free_pte_range(),
> +			 * this can prevent shared anon page freed by madvise_free
> +			 * syscall
> +			 */
> +			pte = pte_mkdirty(pte);

It made every MADV_FREE hinted page void. IOW, if a process called MADV_FREE
calls fork, VM cannot discard pages if child doesn't free pages or calls madvise_free.
Then, if parent calls madvise_free before fork, we couldn't free those pages.
IOW, you are ignoring below example.

parent:
ptr1 = malloc(len);
        -> allocator calls mmap(len);
memset(ptr1, 'a', len);
free(ptr1);
        -> allocator calls madvise_free(ptr1, len);
fork();
..
..
        -> VM discard hinted pages
child:

ptr2 = malloc(len)
        -> allocator reuses the chunk allocated from parent.
so, child will see zero pages from ptr2 but he doesn't write
anything so garbage|zero page anything is okay to him.

As well, you are adding new instructions in fork which is very frequent syscall
so I'd like to find another way to avoid adding instructions in such hot path.

I will send different patch. Please review it.

So, my suggestion is below. It always makes pte dirty so let's Cc
Cyrill to take care of softdirty and Hugh who is Mr.Swap.

>From 30c6d5b35a3dc7e451041183ce5efd6a6c42bf88 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@...nel.org>
Date: Tue, 3 Mar 2015 10:06:59 +0900
Subject: [RFC] mm: make every pte dirty on do_swap_page

Bascially, MADV_FREE relys on the pte dirty to decide whether
VM should discard or not. However, swapped-in page doesn't have
pte_dirty. Instead, it checks PageDirty and PageSwapCache for
such page because swapped-in page could live on swap cache or
set PageDirty when it is removed from swapcache so MADV_FREE
checks it and doesn't discard.

The problem in here is any anonymous page can have PageDirty if
it is removed from swapcache so that VM cannot parse those pages
as freeable even if we did madvise_free. Look at below example.

ptr = malloc();
memset(ptr);
..
heavy memory pressure -> swap-out all of pages
..
out of memory pressure
..
var = *ptr; -> swap-in page/remove the page from swapcache. so pte_clean
               but SetPageDirty

madvise_free(ptr);
..
..
heavy memory pressure -> VM cannot discard the page by PageDirty.

PageDirty for anonymous page aims for avoiding duplicating
swapping out. In other words, if a page have swapped-in but
live swapcache(ie, !PageDirty), we could save swapout if the page
is selected as victim by VM in future because swap device have
kept previous swapped-out contents of the page.

So, rather than relying on the PG_dirty for working madvise_free,
pte_dirty is more straightforward.
Inherently, swapped-out page was pte_dirty so this patch restores
the dirtiness when swap-in fault happens and madvise_free doesn't
rely on the PageDirty.

Signed-off-by: Minchan Kim <minchan@...nel.org>
---
 mm/madvise.c | 1 -
 mm/memory.c  | 9 +++++++--
 mm/rmap.c    | 2 +-
 mm/vmscan.c  | 3 +--
 4 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 6d0fcb8..d64200e 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -309,7 +309,6 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
 				continue;
 			}
 
-			ClearPageDirty(page);
 			unlock_page(page);
 		}
 
diff --git a/mm/memory.c b/mm/memory.c
index 8ae52c9..2f45e77 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2460,9 +2460,14 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	inc_mm_counter_fast(mm, MM_ANONPAGES);
 	dec_mm_counter_fast(mm, MM_SWAPENTS);
-	pte = mk_pte(page, vma->vm_page_prot);
+
+	/*
+	 * Every page swapped-out was pte_dirty so we makes pte dirty again.
+	 * MADV_FREE relys on it.
+	 */
+	pte = mk_pte(pte_mkdirty(page), vma->vm_page_prot);
 	if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page)) {
-		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
+		pte = maybe_mkwrite(pte, vma);
 		flags &= ~FAULT_FLAG_WRITE;
 		ret |= VM_FAULT_WRITE;
 		exclusive = 1;
diff --git a/mm/rmap.c b/mm/rmap.c
index 47b3ba8..34c1d66 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1268,7 +1268,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 
 		if (flags & TTU_FREE) {
 			VM_BUG_ON_PAGE(PageSwapCache(page), page);
-			if (!dirty && !PageDirty(page)) {
+			if (!dirty) {
 				/* It's a freeable page by MADV_FREE */
 				dec_mm_counter(mm, MM_ANONPAGES);
 				goto discard;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 671e47e..7f520c9 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -805,8 +805,7 @@ static enum page_references page_check_references(struct page *page,
 		return PAGEREF_KEEP;
 	}
 
-	if (PageAnon(page) && !pte_dirty && !PageSwapCache(page) &&
-			!PageDirty(page))
+	if (PageAnon(page) && !pte_dirty && !PageSwapCache(page))
 		*freeable = true;
 
 	/* Reclaim if clean, defer dirty pages to writeback */
-- 
1.9.3

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ