[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250210193801.781278-9-david@redhat.com>
Date: Mon, 10 Feb 2025 20:37:50 +0100
From: David Hildenbrand <david@...hat.com>
To: linux-kernel@...r.kernel.org
Cc: linux-doc@...r.kernel.org,
dri-devel@...ts.freedesktop.org,
linux-mm@...ck.org,
nouveau@...ts.freedesktop.org,
linux-trace-kernel@...r.kernel.org,
linux-perf-users@...r.kernel.org,
damon@...ts.linux.dev,
David Hildenbrand <david@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Jérôme Glisse <jglisse@...hat.com>,
Jonathan Corbet <corbet@....net>,
Alex Shi <alexs@...nel.org>,
Yanteng Si <si.yanteng@...ux.dev>,
Karol Herbst <kherbst@...hat.com>,
Lyude Paul <lyude@...hat.com>,
Danilo Krummrich <dakr@...nel.org>,
David Airlie <airlied@...il.com>,
Simona Vetter <simona@...ll.ch>,
Masami Hiramatsu <mhiramat@...nel.org>,
Oleg Nesterov <oleg@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
SeongJae Park <sj@...nel.org>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>,
Jann Horn <jannh@...gle.com>,
Pasha Tatashin <pasha.tatashin@...een.com>,
Peter Xu <peterx@...hat.com>,
Alistair Popple <apopple@...dia.com>,
Jason Gunthorpe <jgg@...dia.com>
Subject: [PATCH v2 08/17] kernel/events/uprobes: handle device-exclusive entries correctly in __replace_page()
Ever since commit b756a3b5e7ea ("mm: device exclusive memory access")
we can return with a device-exclusive entry from page_vma_mapped_walk().
__replace_page() is not prepared for that, so teach it about these
PFN swap PTEs. Note that device-private entries are so far not
applicable on that path, because GUP would never have returned such
folios (conversion to device-private happens by page migration, not
in-place conversion of the PTE).
There is a race between GUP and us locking the folio to look it up
using page_vma_mapped_walk(), so this is likely a fix (unless something
else could prevent that race, but it doesn't look like). pte_pfn() on
something that is not a present pte could give use garbage, and we'd
wrongly mess up the mapcount because it was already adjusted by calling
folio_remove_rmap_pte() when making the entry device-exclusive.
Fixes: b756a3b5e7ea ("mm: device exclusive memory access")
Signed-off-by: David Hildenbrand <david@...hat.com>
---
kernel/events/uprobes.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 2ca797cbe465f..cd6105b100325 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -173,6 +173,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
DEFINE_FOLIO_VMA_WALK(pvmw, old_folio, vma, addr, 0);
int err;
struct mmu_notifier_range range;
+ pte_t pte;
mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, addr,
addr + PAGE_SIZE);
@@ -192,6 +193,16 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
if (!page_vma_mapped_walk(&pvmw))
goto unlock;
VM_BUG_ON_PAGE(addr != pvmw.address, old_page);
+ pte = ptep_get(pvmw.pte);
+
+ /*
+ * Handle PFN swap PTES, such as device-exclusive ones, that actually
+ * map pages: simply trigger GUP again to fix it up.
+ */
+ if (unlikely(!pte_present(pte))) {
+ page_vma_mapped_walk_done(&pvmw);
+ goto unlock;
+ }
if (new_page) {
folio_get(new_folio);
@@ -206,7 +217,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
inc_mm_counter(mm, MM_ANONPAGES);
}
- flush_cache_page(vma, addr, pte_pfn(ptep_get(pvmw.pte)));
+ flush_cache_page(vma, addr, pte_pfn(pte));
ptep_clear_flush(vma, addr, pvmw.pte);
if (new_page)
set_pte_at(mm, addr, pvmw.pte,
--
2.48.1
Powered by blists - more mailing lists