[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240307010250.3847179-1-jthoughton@google.com>
Date: Thu, 7 Mar 2024 01:02:50 +0000
From: James Houghton <jthoughton@...gle.com>
To: Peter Xu <peterx@...hat.com>, Axel Rasmussen <axelrasmussen@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>
Cc: Muchun Song <songmuchun@...edance.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, James Houghton <jthoughton@...gle.com>
Subject: [PATCH v2] mm: Add an explicit smp_wmb() to UFFDIO_CONTINUE
Users of UFFDIO_CONTINUE may reasonably assume that a write memory
barrier is included as part of UFFDIO_CONTINUE. That is, a user may
believe that all writes it has done to a page that it is now
UFFDIO_CONTINUE'ing are guaranteed to be visible to anyone subsequently
reading the page through the newly mapped virtual memory region.
Today, such a user happens to be correct. mmget_not_zero(), for example,
is called as part of UFFDIO_CONTINUE (and comes before any PTE updates),
and it implicitly gives us a write barrier.
To be resilient against future changes, include an explicit smp_wmb().
While we're at it, optimize the smp_wmb() that is already incidentally
present for the HugeTLB case.
Merely making a syscall does not generally imply the memory ordering
constraints that we need (including on x86).
Signed-off-by: James Houghton <jthoughton@...gle.com>
---
mm/hugetlb.c | 17 +++++++++++++----
mm/userfaultfd.c | 9 +++++++++
2 files changed, 22 insertions(+), 4 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index bb17e5c22759..23ef240ba48a 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6780,11 +6780,20 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
}
/*
- * The memory barrier inside __folio_mark_uptodate makes sure that
- * preceding stores to the page contents become visible before
- * the set_pte_at() write.
+ * If we just allocated a new page, we need a memory barrier to ensure
+ * that preceding stores to the page become visible before the
+ * set_pte_at() write. The memory barrier inside __folio_mark_uptodate
+ * is what we need.
+ *
+ * In the case where we have not allocated a new page (is_continue),
+ * the page must already be uptodate. UFFDIO_CONTINUE already includes
+ * an earlier smp_wmb() to ensure that prior stores will be visible
+ * before the set_pte_at() write.
*/
- __folio_mark_uptodate(folio);
+ if (!is_continue)
+ __folio_mark_uptodate(folio);
+ else
+ WARN_ON_ONCE(!folio_test_uptodate(folio));
/* Add shared, newly allocated pages to the page cache. */
if (vm_shared && !is_continue) {
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 503ea77c81aa..712160cd41ec 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -845,6 +845,15 @@ ssize_t mfill_atomic_zeropage(struct userfaultfd_ctx *ctx,
ssize_t mfill_atomic_continue(struct userfaultfd_ctx *ctx, unsigned long start,
unsigned long len, uffd_flags_t flags)
{
+
+ /*
+ * A caller might reasonably assume that UFFDIO_CONTINUE contains an
+ * smp_wmb() to ensure that any writes to the about-to-be-mapped page by
+ * the thread doing the UFFDIO_CONTINUE are guaranteed to be visible to
+ * subsequent loads from the page through the newly mapped address range.
+ */
+ smp_wmb();
+
return mfill_atomic(ctx, start, 0, len,
uffd_flags_set_mode(flags, MFILL_ATOMIC_CONTINUE));
}
base-commit: f4239a5d7acc1b5ff9bac4d5471000b952279ef0
--
2.44.0.278.ge034bb2e1d-goog
Powered by blists - more mailing lists