[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20210807032521.7591-4-peterx@redhat.com>
Date: Fri, 6 Aug 2021 23:25:20 -0400
From: Peter Xu <peterx@...hat.com>
To: linux-kernel@...r.kernel.org, linux-mm@...ck.org
Cc: Alistair Popple <apopple@...dia.com>,
Tiberiu Georgescu <tiberiu.georgescu@...anix.com>,
ivan.teterevkov@...anix.com,
Mike Rapoport <rppt@...ux.vnet.ibm.com>,
Hugh Dickins <hughd@...gle.com>, peterx@...hat.com,
Matthew Wilcox <willy@...radead.org>,
Andrea Arcangeli <aarcange@...hat.com>,
David Hildenbrand <david@...hat.com>,
"Kirill A . Shutemov" <kirill@...temov.name>,
Andrew Morton <akpm@...ux-foundation.org>,
Mike Kravetz <mike.kravetz@...cle.com>
Subject: [PATCH RFC 3/4] mm: Handle PTE_MARKER page faults
handle_pte_marker() is the function that will parse and handle all the pte
marker faults. For PAGEOUT marker, it's as simple as dropping the pte and do
the fault just like a none pte.
The other solution should be that we clear the pte to none pte and retry the
fault, however that'll be slower than handling it right now.
Signed-off-by: Peter Xu <peterx@...hat.com>
---
mm/memory.c | 41 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 41 insertions(+)
diff --git a/mm/memory.c b/mm/memory.c
index 7288f585544a..47f8ca064459 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -98,6 +98,8 @@ struct page *mem_map;
EXPORT_SYMBOL(mem_map);
#endif
+static vm_fault_t do_fault(struct vm_fault *vmf);
+
/*
* A number of key systems in x86 including ioremap() rely on the assumption
* that high_memory defines the upper bound on direct map memory, then end
@@ -1394,6 +1396,10 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
put_page(page);
continue;
+ } else if (is_pte_marker_entry(entry)) {
+ /* Drop PTE_MARKER_PAGEOUT when zapped */
+ pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
+ continue;
}
/* If details->check_mapping, we leave swap entries. */
@@ -3467,6 +3473,39 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf)
return 0;
}
+/*
+ * This function parses PTE markers and handle the faults. Returns true if we
+ * finished the fault, and we should have put the return value into "*ret".
+ * Otherwise it means we want to continue the swap path, and "*ret" untouched.
+ */
+static vm_fault_t handle_pte_marker(struct vm_fault *vmf)
+{
+ swp_entry_t entry = pte_to_swp_entry(vmf->orig_pte);
+ unsigned long marker;
+
+ marker = pte_marker_get(entry);
+
+ /*
+ * PTE markers should always be with file-backed memories, and the
+ * marker should never be empty. If anything weird happened, the best
+ * thing to do is to kill the process along with its mm.
+ */
+ if (WARN_ON_ONCE(vma_is_anonymous(vmf->vma) || !marker))
+ return VM_FAULT_SIGBUS;
+
+#ifdef CONFIG_PTE_MARKER_PAGEOUT
+ if (marker == PTE_MARKER_PAGEOUT)
+ /*
+ * This pte is previously zapped for swap, the PAGEOUT is only
+ * a flag before it's accessed again. Safe to drop it now.
+ */
+ return do_fault(vmf);
+#endif
+
+ /* We see some marker that we can't handle */
+ return VM_FAULT_SIGBUS;
+}
+
/*
* We enter with non-exclusive mmap_lock (to exclude vma changes,
* but allow concurrent faults), and pte mapped but not yet locked.
@@ -3503,6 +3542,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
ret = vmf->page->pgmap->ops->migrate_to_ram(vmf);
} else if (is_hwpoison_entry(entry)) {
ret = VM_FAULT_HWPOISON;
+ } else if (is_pte_marker_entry(entry)) {
+ ret = handle_pte_marker(vmf);
} else {
print_bad_pte(vma, vmf->address, vmf->orig_pte, NULL);
ret = VM_FAULT_SIGBUS;
--
2.32.0
Powered by blists - more mailing lists