[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240826204353.2228736-8-peterx@redhat.com>
Date: Mon, 26 Aug 2024 16:43:41 -0400
From: Peter Xu <peterx@...hat.com>
To: linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Cc: Gavin Shan <gshan@...hat.com>,
Catalin Marinas <catalin.marinas@....com>,
x86@...nel.org,
Ingo Molnar <mingo@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Paolo Bonzini <pbonzini@...hat.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Thomas Gleixner <tglx@...utronix.de>,
Alistair Popple <apopple@...dia.com>,
kvm@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org,
Sean Christopherson <seanjc@...gle.com>,
peterx@...hat.com,
Oscar Salvador <osalvador@...e.de>,
Jason Gunthorpe <jgg@...dia.com>,
Borislav Petkov <bp@...en8.de>,
Zi Yan <ziy@...dia.com>,
Axel Rasmussen <axelrasmussen@...gle.com>,
David Hildenbrand <david@...hat.com>,
Yan Zhao <yan.y.zhao@...el.com>,
Will Deacon <will@...nel.org>,
Kefeng Wang <wangkefeng.wang@...wei.com>,
Alex Williamson <alex.williamson@...hat.com>
Subject: [PATCH v2 07/19] mm/fork: Accept huge pfnmap entries
Teach the fork code to properly copy pfnmaps for pmd/pud levels. Pud is
much easier, the write bit needs to be persisted though for writable and
shared pud mappings like PFNMAP ones, otherwise a follow up write in either
parent or child process will trigger a write fault.
Do the same for pmd level.
Signed-off-by: Peter Xu <peterx@...hat.com>
---
mm/huge_memory.c | 29 ++++++++++++++++++++++++++---
1 file changed, 26 insertions(+), 3 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e2c314f631f3..15418ffdd377 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1559,6 +1559,24 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
pgtable_t pgtable = NULL;
int ret = -ENOMEM;
+ pmd = pmdp_get_lockless(src_pmd);
+ if (unlikely(pmd_special(pmd))) {
+ dst_ptl = pmd_lock(dst_mm, dst_pmd);
+ src_ptl = pmd_lockptr(src_mm, src_pmd);
+ spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
+ /*
+ * No need to recheck the pmd, it can't change with write
+ * mmap lock held here.
+ *
+ * Meanwhile, making sure it's not a CoW VMA with writable
+ * mapping, otherwise it means either the anon page wrongly
+ * applied special bit, or we made the PRIVATE mapping be
+ * able to wrongly write to the backend MMIO.
+ */
+ VM_WARN_ON_ONCE(is_cow_mapping(src_vma->vm_flags) && pmd_write(pmd));
+ goto set_pmd;
+ }
+
/* Skip if can be re-fill on fault */
if (!vma_is_anonymous(dst_vma))
return 0;
@@ -1640,7 +1658,9 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
pmdp_set_wrprotect(src_mm, addr, src_pmd);
if (!userfaultfd_wp(dst_vma))
pmd = pmd_clear_uffd_wp(pmd);
- pmd = pmd_mkold(pmd_wrprotect(pmd));
+ pmd = pmd_wrprotect(pmd);
+set_pmd:
+ pmd = pmd_mkold(pmd);
set_pmd_at(dst_mm, addr, dst_pmd, pmd);
ret = 0;
@@ -1686,8 +1706,11 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm,
* TODO: once we support anonymous pages, use
* folio_try_dup_anon_rmap_*() and split if duplicating fails.
*/
- pudp_set_wrprotect(src_mm, addr, src_pud);
- pud = pud_mkold(pud_wrprotect(pud));
+ if (is_cow_mapping(vma->vm_flags) && pud_write(pud)) {
+ pudp_set_wrprotect(src_mm, addr, src_pud);
+ pud = pud_wrprotect(pud);
+ }
+ pud = pud_mkold(pud);
set_pud_at(dst_mm, addr, dst_pud, pud);
ret = 0;
--
2.45.0
Powered by blists - more mailing lists