lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1421999256-3881-1-git-send-email-ebru.akagunduz@gmail.com>
Date:	Fri, 23 Jan 2015 09:47:36 +0200
From:	Ebru Akagunduz <ebru.akagunduz@...il.com>
To:	linux-mm@...ck.org
Cc:	akpm@...ux-foundation.org, kirill@...temov.name, mhocko@...e.cz,
	mgorman@...e.de, rientjes@...gle.com, sasha.levin@...cle.com,
	hughd@...gle.com, hannes@...xchg.org, vbabka@...e.cz,
	linux-kernel@...r.kernel.org, riel@...hat.com, aarcange@...hat.com,
	Ebru Akagunduz <ebru.akagunduz@...il.com>
Subject: [PATCH] mm: incorporate read-only pages into transparent huge pages

This patch aims to improve THP collapse rates, by allowing
THP collapse in the presence of read-only ptes, like those
left in place by do_swap_page after a read fault.

Currently THP can collapse 4kB pages into a THP when
there are up to khugepaged_max_ptes_none pte_none ptes
in a 2MB range. This patch applies the same limit for
read-only ptes.

The patch was tested with a test program that allocates
800MB of memory, writes to it, and then sleeps. I force
the system to swap out all but 190MB of the program by
touching other memory. Afterwards, the test program does
a mix of reads and writes to its memory, and the memory
gets swapped back in.

Without the patch, only the memory that did not get
swapped out remained in THPs, which corresponds to 24% of
the memory of the program. The percentage did not increase
over time.

With this patch, after 5 minutes of waiting khugepaged had
collapsed 55% of the program's memory back into THPs.

Signed-off-by: Ebru Akagunduz <ebru.akagunduz@...il.com>
Reviewed-by: Rik van Riel <riel@...hat.com>
---
I've written down test results:

With the patch:
After swapped out:
cat /proc/pid/smaps:
Anonymous:      100352 kB
AnonHugePages:  98304 kB
Swap:           699652 kB
Fraction:       97,95

cat /proc/meminfo:
AnonPages:      1763732 kB
AnonHugePages:  1716224 kB
Fraction:       97,30

After swapped in:
In a few seconds:
cat /proc/pid/smaps
Anonymous:      800004 kB
AnonHugePages:  235520 kB
Swap:           0 kB
Fraction:       29,43

cat /proc/meminfo:
AnonPages:      2464336 kB
AnonHugePages:  1853440 kB
Fraction:       75,21

In five minutes:
cat /proc/pid/smaps:
Anonymous:      800004 kB
AnonHugePages:  440320 kB
Swap:           0 kB
Fraction:       55,0

cat /proc/meminfo:
AnonPages:      2464340
AnonHugePages:  2058240
Fraction:       83,52

Without the patch:
After swapped out:
cat /proc/pid/smaps:
Anonymous:      190660 kB
AnonHugePages:  190464 kB
Swap:           609344 kB
Fraction:       99,89

cat /proc/meminfo:
AnonPages:      1740456 kB
AnonHugePages:  1667072 kB
Fraction:       95,78

After swapped in:
cat /proc/pid/smaps:
Anonymous:      800004 kB
AnonHugePages:  190464 kB
Swap:           0 kB
Fraction:       23,80

cat /proc/meminfo:
AnonPages:      2350032 kB
AnonHugePages:  1667072 kB
Fraction:       70,93

I waited 10 minutes the fractions
did not change without the patch.

 mm/huge_memory.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 817a875..af750d9 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2158,7 +2158,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 			else
 				goto out;
 		}
-		if (!pte_present(pteval) || !pte_write(pteval))
+		if (!pte_present(pteval))
 			goto out;
 		page = vm_normal_page(vma, address, pteval);
 		if (unlikely(!page))
@@ -2169,7 +2169,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 		VM_BUG_ON_PAGE(!PageSwapBacked(page), page);
 
 		/* cannot use mapcount: can't collapse if there's a gup pin */
-		if (page_count(page) != 1)
+		if (page_count(page) != 1 + !!PageSwapCache(page))
 			goto out;
 		/*
 		 * We can do it before isolate_lru_page because the
@@ -2179,6 +2179,17 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 		 */
 		if (!trylock_page(page))
 			goto out;
+		if (!pte_write(pteval)) {
+			if (PageSwapCache(page) && !reuse_swap_page(page)) {
+					unlock_page(page);
+					goto out;
+			}
+			/*
+			 * Page is not in the swap cache, and page count is
+			 * one (see above). It can be collapsed into a THP.
+			 */
+		}
+
 		/*
 		 * Isolate the page to avoid collapsing an hugepage
 		 * currently in use by the VM.
@@ -2550,7 +2561,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
 {
 	pmd_t *pmd;
 	pte_t *pte, *_pte;
-	int ret = 0, referenced = 0, none = 0;
+	int ret = 0, referenced = 0, none = 0, ro = 0;
 	struct page *page;
 	unsigned long _address;
 	spinlock_t *ptl;
@@ -2573,8 +2584,12 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
 			else
 				goto out_unmap;
 		}
-		if (!pte_present(pteval) || !pte_write(pteval))
+		if (!pte_present(pteval))
 			goto out_unmap;
+		if (!pte_write(pteval)) {
+			if (++ro > khugepaged_max_ptes_none)
+				goto out_unmap;
+		}
 		page = vm_normal_page(vma, _address, pteval);
 		if (unlikely(!page))
 			goto out_unmap;
@@ -2592,7 +2607,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
 		if (!PageLRU(page) || PageLocked(page) || !PageAnon(page))
 			goto out_unmap;
 		/* cannot use mapcount: can't collapse if there's a gup pin */
-		if (page_count(page) != 1)
+		if (page_count(page) != 1 + !!PageSwapCache(page))
 			goto out_unmap;
 		if (pte_young(pteval) || PageReferenced(page) ||
 		    mmu_notifier_test_young(vma->vm_mm, address))
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ