lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150330052250.GA3008@blaptop>
Date:	Mon, 30 Mar 2015 14:22:50 +0900
From:	Minchan Kim <minchan@...nel.org>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	Michal Hocko <mhocko@...e.cz>,
	Johannes Weiner <hannes@...xchg.org>,
	Mel Gorman <mgorman@...e.de>, Rik van Riel <riel@...hat.com>,
	Shaohua Li <shli@...nel.org>, Yalin.Wang@...ymobile.com,
	Hugh Dickins <hughd@...gle.com>,
	Cyrill Gorcunov <gorcunov@...il.com>,
	Pavel Emelyanov <xemul@...allels.com>
Subject: Re: [PATCH 4/4] mm: make every pte dirty on do_swap_page

2nd description trial.

>From ccfc6c79634f6cec69d8fb23b0e863ebfa5b893c Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@...nel.org>
Date: Mon, 30 Mar 2015 13:43:08 +0900
Subject: [PATCH v2] mm: make every pte dirty on do_swap_page

Bascially, MADV_FREE relys on the dirty bit in page table entry
to decide whether VM allows to discard the page or not.
IOW, if page table entry includes marked dirty bit, VM shouldn't
discard the page.

However, if swap-in by read fault happens, page table entry
point out the page doesn't have marked dirty bit so MADV_FREE
might discard the page wrongly. For avoiding the problem,
MADV_FREE did more checks with PageDirty and PageSwapCache.
It worked out because swapped-in page lives on swap cache
and since it was evicted from the swap cache, the page has
PG_dirty flag. So both page flags checks effectvely prevent
wrong discarding by MADV_FREE.

A problem in above logic is that swapped-in page has PG_dirty
since they are removed from swap cache so VM cannot consider
those pages as freeable any more alghouth madvise_free is
called in future. Look at below example for detail.

ptr = malloc();
memset(ptr);
..
..
.. heavy memory pressure so all of pages are swapped out
..
..
var = *ptr; -> a page swapped-in and removed from swapcache.
               page table doesn't mark dirty bit and page
               descriptor includes PG_dirty
..
..
madvise_free(ptr);
..
..
..
.. heavy memory pressure again.
.. In this time, VM cannot discard the page because the page
.. has *PG_dirty*

Rather than relying on the PG_dirty of page descriptor
for preventing discarding a page, dirty bit in page table is more
straightforward and simple. So, this patch makes page table dirty
bit marked whenever swap-in happens. Inherenty, page table entry
point out swapped-out page had dirty bit so I think it's no prblem.

With this, it removes complicated logic and makes freeable page
checking by madvise_free simple. Of course, we could solve
above mentioned example.

Cc: Hugh Dickins <hughd@...gle.com>
Cc: Cyrill Gorcunov <gorcunov@...il.com>
Cc: Pavel Emelyanov <xemul@...allels.com>
Reported-by: Yalin Wang <yalin.wang@...ymobile.com>
Signed-off-by: Minchan Kim <minchan@...nel.org>
---

* From v1:
  * Rewrite description - Andrew

 mm/madvise.c |  1 -
 mm/memory.c  | 10 ++++++++--
 mm/rmap.c    |  2 +-
 mm/vmscan.c  |  3 +--
 4 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 22e8f0c..a045798 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -325,7 +325,6 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
 				continue;
 			}
 
-			ClearPageDirty(page);
 			unlock_page(page);
 		}
 
diff --git a/mm/memory.c b/mm/memory.c
index 6743966..48ff537 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2521,9 +2521,15 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	inc_mm_counter_fast(mm, MM_ANONPAGES);
 	dec_mm_counter_fast(mm, MM_SWAPENTS);
-	pte = mk_pte(page, vma->vm_page_prot);
+
+	/*
+	 * The page is swapping in now was dirty before it was swapped out
+	 * so restore the state again(ie, pte_mkdirty) because MADV_FREE
+	 * relies on the dirty bit on page table.
+	 */
+	pte = pte_mkdirty(mk_pte(page, vma->vm_page_prot));
 	if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page)) {
-		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
+		pte = maybe_mkwrite(pte, vma);
 		flags &= ~FAULT_FLAG_WRITE;
 		ret |= VM_FAULT_WRITE;
 		exclusive = 1;
diff --git a/mm/rmap.c b/mm/rmap.c
index dad23a4..281e806 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1275,7 +1275,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 
 		if (flags & TTU_FREE) {
 			VM_BUG_ON_PAGE(PageSwapCache(page), page);
-			if (!dirty && !PageDirty(page)) {
+			if (!dirty) {
 				/* It's a freeable page by MADV_FREE */
 				dec_mm_counter(mm, MM_ANONPAGES);
 				goto discard;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index dc6cd51..fffebf0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -805,8 +805,7 @@ static enum page_references page_check_references(struct page *page,
 		return PAGEREF_KEEP;
 	}
 
-	if (PageAnon(page) && !pte_dirty && !PageSwapCache(page) &&
-			!PageDirty(page))
+	if (PageAnon(page) && !pte_dirty && !PageSwapCache(page))
 		*freeable = true;
 
 	/* Reclaim if clean, defer dirty pages to writeback */
-- 
1.9.3

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ