[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240205191843.4009640-2-leitao@debian.org>
Date: Mon, 5 Feb 2024 11:18:41 -0800
From: Breno Leitao <leitao@...ian.org>
To: mike.kravetz@...cle.com,
linux-mm@...ck.org,
akpm@...ux-foundation.org,
muchun.song@...ux.dev
Cc: lstoakes@...il.com,
willy@...radead.org,
hannes@...xchg.org,
mhocko@...nel.org,
roman.gushchin@...ux.dev,
linux-kernel@...r.kernel.org,
Rik van Riel <riel@...riel.com>
Subject: [PATCH v2 1/2] mm/hugetlb: Restore the reservation if needed
Currently there is a bug that a huge page could be stolen, and when the
original owner tries to fault in it, it causes a page fault.
You can achieve that by:
1) Creating a single page
echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
2) mmap() the page above with MAP_HUGETLB into (void *ptr1).
* This will mark the page as reserved
3) touch the page, which causes a page fault and allocates the page
* This will move the page out of the free list.
* It will also unreserved the page, since there is no more free
page
4) madvise(MADV_DONTNEED) the page
* This will free the page, but not mark it as reserved.
5) Allocate a secondary page with mmap(MAP_HUGETLB) into (void *ptr2).
* it should fail, but, since there is no more available page.
* But, since the page above is not reserved, this mmap() succeed.
6) Faulting at ptr1 will cause a SIGBUS
* it will try to allocate a huge page, but there is none
available
A full reproducer is in selftest. See
https://lore.kernel.org/all/20240105155419.1939484-1-leitao@debian.org/
Fix this by restoring the reserved page if necessary.
These are the condition for the page restore:
* The system is not using surplus pages. The goal is to reduce the
surplus usage for this case.
* If the VMA has the HPAGE_RESV_OWNER flag set, and is PRIVATE. This is
safely checked using __vma_private_lock()
* The page is anonymous
Once this is scenario is found, set the `hugetlb_restore_reserve` bit in
the folio. Then check if the resv reservations need to be adjusted
later, done later, after the spinlock, since the vma_xxxx_reservation()
might touch the file system lock.
Suggested-by: Rik van Riel <riel@...riel.com>
Signed-off-by: Breno Leitao <leitao@...ian.org>
---
mm/hugetlb.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ed1581b670d4..44f1e6366d04 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5585,6 +5585,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
struct page *page;
struct hstate *h = hstate_vma(vma);
unsigned long sz = huge_page_size(h);
+ bool adjust_reservation = false;
unsigned long last_addr_mask;
bool force_flush = false;
@@ -5677,7 +5678,31 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
hugetlb_count_sub(pages_per_huge_page(h), mm);
hugetlb_remove_rmap(page_folio(page));
+ /*
+ * Restore the reservation for anonymous page, otherwise the
+ * backing page could be stolen by someone.
+ * If there we are freeing a surplus, do not set the restore
+ * reservation bit.
+ */
+ if (!h->surplus_huge_pages && __vma_private_lock(vma) &&
+ folio_test_anon(page_folio(page))) {
+ folio_set_hugetlb_restore_reserve(page_folio(page));
+ /* Reservation to be adjusted after the spin lock */
+ adjust_reservation = true;
+ }
+
spin_unlock(ptl);
+
+ /*
+ * Adjust the reservation for the region that will have the
+ * reserve restored. Keep in mind that vma_needs_reservation() changes
+ * resv->adds_in_progress if it succeeds. If this is not done,
+ * do_exit() will not see it, and will keep the reservation
+ * forever.
+ */
+ if (adjust_reservation && vma_needs_reservation(h, vma, address))
+ vma_add_reservation(h, vma, address);
+
tlb_remove_page_size(tlb, page, huge_page_size(h));
/*
* Bail out after unmapping reference page if supplied
--
2.34.1
Powered by blists - more mailing lists