linux-kernel - [PATCH v2] mm, hugetlb: set PageLRU for in-use/active hugepages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150217093153.GA12875@hori1.linux.bs1.fc.nec.co.jp>
Date:	Tue, 17 Feb 2015 09:32:08 +0000
From:	Naoya Horiguchi <n-horiguchi@...jp.nec.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
CC:	Hugh Dickins <hughd@...gle.com>, Michal Hocko <mhocko@...e.cz>,
	Mel Gorman <mgorman@...e.de>,
	Johannes Weiner <hannes@...xchg.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Naoya Horiguchi <nao.horiguchi@...il.com>
Subject: [PATCH v2] mm, hugetlb: set PageLRU for in-use/active hugepages

On Tue, Feb 17, 2015 at 03:22:45AM +0000, Horiguchi Naoya(堀口 直也) wrote:
> Currently we are not safe from concurrent calls of isolate_huge_page(),
> which can make the victim hugepage in invalid state and results in BUG_ON().
> 
> The root problem of this is that we don't have any information on struct page
> (so easily accessible) about the hugepage's activeness. Note that hugepages'
> activeness means just being linked to hstate->hugepage_activelist, which is
> not the same as normal pages' activeness represented by PageActive flag.
> 
> Normal pages are isolated by isolate_lru_page() which prechecks PageLRU before
> isolation, so let's do similarly for hugetlb. PageLRU is unused on hugetlb,
> so this change is mostly straightforward. One non-straightforward point is that
> __put_compound_page() calls __page_cache_release() to do some LRU works,
> but this is obviously for thps and assumes that hugetlb has always !PageLRU.
> This assumption is no more true, so this patch simply adds if (!PageHuge) to
> avoid calling __page_cache_release() for hugetlb.
> 
> Set/ClearPageLRU should be called within hugetlb_lock, but hugetlb_cow() and
> hugetlb_no_page() don't do this. This is justified because in these function
> SetPageLRU is called right after the hugepage is allocated and no other thread
> tries to isolate it.
> 
> Fixes: commit 31caf665e666 ("mm: migrate: make core migration code aware of hugepage")
> Signed-off-by: Naoya Horiguchi <n-horiguchi@...jp.nec.com>
> Cc: <stable@...r.kernel.org>        [3.12+]

Sorry, my testing was not enough and I found a bug in soft offline code.
Here is the updated one.

Thanks,
Naoya Horiguchi
----
From e69950011360f624e08712de4d541c7d686d6296 Mon Sep 17 00:00:00 2001
From: Naoya Horiguchi <n-horiguchi@...jp.nec.com>
Date: Mon, 16 Feb 2015 18:33:35 +0900
Subject: [PATCH v2] mm, hugetlb: set PageLRU for in-use/active hugepages

Currently we are not safe from concurrent calls of isolate_huge_page(),
which can make the victim hugepage in invalid state and results in BUG_ON().

The root problem of this is that we don't have any information on struct page
(so easily accessible) about the hugepage's activeness. Note that hugepages'
activeness means just being linked to hstate->hugepage_activelist, which is
not the same as normal pages' activeness represented by PageActive flag.

Normal pages are isolated by isolate_lru_page() which prechecks PageLRU before
isolation, so let's do similarly for hugetlb. PageLRU is unused on hugetlb now,
so the change is mostly just inserting Set/ClearPageLRU (no conflict with
current usage.) And the other changes are justified like below:
- __put_compound_page() calls __page_cache_release() to do some LRU works,
  but this is obviously for thps and assumes that hugetlb has always !PageLRU.
  This assumption is not true any more, so this patch simply adds if (!PageHuge)
  to avoid calling __page_cache_release() for hugetlb.
- soft_offline_huge_page() now just calls list_move(), but generally callers
  of page migration should use the common routine in isolation, so let's
  replace the list_move() with isolate_huge_page() rather than inserting
  ClearPageLRU.

Set/ClearPageLRU should be called within hugetlb_lock, but hugetlb_cow() and
hugetlb_no_page() don't do this. This is justified because in these function
SetPageLRU is called right after the hugepage is allocated and no other thread
tries to isolate it.

Fixes: commit 31caf665e666 ("mm: migrate: make core migration code aware of hugepage")
Signed-off-by: Naoya Horiguchi <n-horiguchi@...jp.nec.com>
Cc: <stable@...r.kernel.org>        [3.12+]
---
ChangeLog v1->v2:
- call isolate_huge_page() in soft_offline_huge_page() instead of list_move()
---
 mm/hugetlb.c        | 17 ++++++++++++++---
 mm/memory-failure.c | 14 ++++++++++++--
 mm/swap.c           |  4 +++-
 3 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a2bfd02e289f..e28489270d9a 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -830,7 +830,7 @@ static void update_and_free_page(struct hstate *h, struct page *page)
 		page[i].flags &= ~(1 << PG_locked | 1 << PG_error |
 				1 << PG_referenced | 1 << PG_dirty |
 				1 << PG_active | 1 << PG_private |
-				1 << PG_writeback);
+				1 << PG_writeback | 1 << PG_lru);
 	}
 	VM_BUG_ON_PAGE(hugetlb_cgroup_from_page(page), page);
 	set_compound_page_dtor(page, NULL);
@@ -875,6 +875,7 @@ void free_huge_page(struct page *page)
 	ClearPagePrivate(page);
 
 	spin_lock(&hugetlb_lock);
+	ClearPageLRU(page);
 	hugetlb_cgroup_uncharge_page(hstate_index(h),
 				     pages_per_huge_page(h), page);
 	if (restore_reserve)
@@ -2889,6 +2890,7 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 	copy_user_huge_page(new_page, old_page, address, vma,
 			    pages_per_huge_page(h));
 	__SetPageUptodate(new_page);
+	SetPageLRU(new_page);
 
 	mmun_start = address & huge_page_mask(h);
 	mmun_end = mmun_start + huge_page_size(h);
@@ -3001,6 +3003,7 @@ static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		}
 		clear_huge_page(page, address, pages_per_huge_page(h));
 		__SetPageUptodate(page);
+		SetPageLRU(page);
 
 		if (vma->vm_flags & VM_MAYSHARE) {
 			int err;
@@ -3794,6 +3797,7 @@ int dequeue_hwpoisoned_huge_page(struct page *hpage)
 		 * so let it point to itself with list_del_init().
 		 */
 		list_del_init(&hpage->lru);
+		ClearPageLRU(hpage);
 		set_page_refcounted(hpage);
 		h->free_huge_pages--;
 		h->free_huge_pages_node[nid]--;
@@ -3806,11 +3810,17 @@ int dequeue_hwpoisoned_huge_page(struct page *hpage)
 
 bool isolate_huge_page(struct page *page, struct list_head *list)
 {
+	bool ret = true;
+
 	VM_BUG_ON_PAGE(!PageHead(page), page);
-	if (!get_page_unless_zero(page))
-		return false;
 	spin_lock(&hugetlb_lock);
+	if (!PageLRU(page) || !get_page_unless_zero(page)) {
+		ret = false;
+		goto unlock;
+	}
+	ClearPageLRU(page);
 	list_move_tail(&page->lru, list);
+unlock:
 	spin_unlock(&hugetlb_lock);
 	return true;
 }
@@ -3819,6 +3829,7 @@ void putback_active_hugepage(struct page *page)
 {
 	VM_BUG_ON_PAGE(!PageHead(page), page);
 	spin_lock(&hugetlb_lock);
+	SetPageLRU(page);
 	list_move_tail(&page->lru, &(page_hstate(page))->hugepage_activelist);
 	spin_unlock(&hugetlb_lock);
 	put_page(page);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 20c29ddff17b..50ccfdbda214 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1547,8 +1547,18 @@ static int soft_offline_huge_page(struct page *page, int flags)
 	}
 	unlock_page(hpage);
 
-	/* Keep page count to indicate a given hugepage is isolated. */
-	list_move(&hpage->lru, &pagelist);
+	ret = isolate_huge_page(hpage, &pagelist);
+	if (ret) {
+		/*
+		 * get_any_page() already takes another refcount, so drop one
+		 * here.
+		 */
+		put_page(hpage);
+	} else {
+		pr_info("soft offline: %#lx hugepage failed to isolate\n", pfn);
+		return -EBUSY;
+	}
+
 	ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
 				MIGRATE_SYNC, MR_MEMORY_FAILURE);
 	if (ret) {
diff --git a/mm/swap.c b/mm/swap.c
index 8a12b33936b4..ea8fe72999a8 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -31,6 +31,7 @@
 #include <linux/memcontrol.h>
 #include <linux/gfp.h>
 #include <linux/uio.h>
+#include <linux/hugetlb.h>
 
 #include "internal.h"
 
@@ -75,7 +76,8 @@ static void __put_compound_page(struct page *page)
 {
 	compound_page_dtor *dtor;
 
-	__page_cache_release(page);
+	if (!PageHuge(page))
+		__page_cache_release(page);
 	dtor = get_compound_page_dtor(page);
 	(*dtor)(page);
 }
-- 
1.9.3