[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <151835937752.185602.5640977700089242532.stgit@buzz>
Date: Sun, 11 Feb 2018 17:29:37 +0300
From: Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
To: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org
Cc: Michal Hocko <mhocko@...e.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Nicholas Piggin <npiggin@...il.com>
Subject: [PATCH v2] mm/huge_memory.c: reorder operations in
__split_huge_page_tail()
THP split makes non-atomic change of tail page flags. This is almost ok
because tail pages are locked and isolated but this breaks recent changes
in page locking: non-atomic operation could clear bit PG_waiters.
As a result concurrent sequence get_page_unless_zero() -> lock_page()
might block forever. Especially if this page was truncated later.
Fix is trivial: clone flags before unfreezing page reference counter.
This race exists since commit 62906027091f ("mm: add PageWaiters indicating
tasks are waiting for a page bit") while unsave unfreeze itself was added
in commit 8df651c7059e ("thp: cleanup split_huge_page()").
clear_compound_head() also must be called before unfreezing page reference
because successful get_page_unless_zero() must stabilize compound_head().
And replace page_ref_inc()/page_ref_add() with page_ref_unfreeze() which
is made especially for that and has semantic of smp_store_release().
Signed-off-by: Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
---
mm/huge_memory.c | 34 +++++++++++++---------------------
1 file changed, 13 insertions(+), 21 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 87ab9b8f56b5..fa577aa7ecd8 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2355,26 +2355,11 @@ static void __split_huge_page_tail(struct page *head, int tail,
struct page *page_tail = head + tail;
VM_BUG_ON_PAGE(atomic_read(&page_tail->_mapcount) != -1, page_tail);
- VM_BUG_ON_PAGE(page_ref_count(page_tail) != 0, page_tail);
/*
- * tail_page->_refcount is zero and not changing from under us. But
- * get_page_unless_zero() may be running from under us on the
- * tail_page. If we used atomic_set() below instead of atomic_inc() or
- * atomic_add(), we would then run atomic_set() concurrently with
- * get_page_unless_zero(), and atomic_set() is implemented in C not
- * using locked ops. spin_unlock on x86 sometime uses locked ops
- * because of PPro errata 66, 92, so unless somebody can guarantee
- * atomic_set() here would be safe on all archs (and not only on x86),
- * it's safer to use atomic_inc()/atomic_add().
+ * Clone page flags before unfreezing refcount.
+ * lock_page() after speculative get wants to set PG_waiters.
*/
- if (PageAnon(head) && !PageSwapCache(head)) {
- page_ref_inc(page_tail);
- } else {
- /* Additional pin to radix tree */
- page_ref_add(page_tail, 2);
- }
-
page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
page_tail->flags |= (head->flags &
((1L << PG_referenced) |
@@ -2387,14 +2372,21 @@ static void __split_huge_page_tail(struct page *head, int tail,
(1L << PG_unevictable) |
(1L << PG_dirty)));
- /*
- * After clearing PageTail the gup refcount can be released.
- * Page flags also must be visible before we make the page non-compound.
- */
+ /* Page flags must be visible before we make the page non-compound. */
smp_wmb();
+ /*
+ * Clear PageTail before unfreezing page refcount:
+ * speculative refcount must stabilize compound_head().
+ */
clear_compound_head(page_tail);
+ /*
+ * Finally unfreeze refcount. Additional pin to radix tree.
+ */
+ page_ref_unfreeze(page_tail, 1 + (!PageAnon(head) ||
+ PageSwapCache(head)));
+
if (page_is_young(head))
set_page_young(page_tail);
if (page_is_idle(head))
Powered by blists - more mailing lists