[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aHrsdvjjliBBdVQm@lstrano-desk.jf.intel.com>
Date: Fri, 18 Jul 2025 17:53:10 -0700
From: Matthew Brost <matthew.brost@...el.com>
To: Balbir Singh <balbirs@...dia.com>
CC: <linux-mm@...ck.org>, <akpm@...ux-foundation.org>,
<linux-kernel@...r.kernel.org>, Karol Herbst <kherbst@...hat.com>, Lyude Paul
<lyude@...hat.com>, Danilo Krummrich <dakr@...nel.org>, David Airlie
<airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
Jérôme Glisse <jglisse@...hat.com>, Shuah Khan
<shuah@...nel.org>, David Hildenbrand <david@...hat.com>, Barry Song
<baohua@...nel.org>, Baolin Wang <baolin.wang@...ux.alibaba.com>, "Ryan
Roberts" <ryan.roberts@....com>, Matthew Wilcox <willy@...radead.org>, "Peter
Xu" <peterx@...hat.com>, Zi Yan <ziy@...dia.com>, Kefeng Wang
<wangkefeng.wang@...wei.com>, Jane Chu <jane.chu@...cle.com>, Alistair Popple
<apopple@...dia.com>, Donet Tom <donettom@...ux.ibm.com>
Subject: Re: [v1 resend 00/12] THP support for zone device page migration
On Fri, Jul 18, 2025 at 01:57:13PM +1000, Balbir Singh wrote:
> On 7/18/25 09:40, Matthew Brost wrote:
> > On Fri, Jul 04, 2025 at 09:34:59AM +1000, Balbir Singh wrote:
> ...
> >>
> >> The nouveau dmem code has been enhanced to use the new THP migration
> >> capability.
> >>
> >> Feedback from the RFC [2]:
> >>
> >
> > Thanks for the patches, results look very promising. I wanted to give
> > some quick feedback:
> >
>
> Are you seeing improvements with the patchset?
>
> > - You appear to have missed updating hmm_range_fault, specifically
> > hmm_vma_handle_pmd, to check for device-private entries and populate the
> > HMM PFNs accordingly. My colleague François has a fix for this if you're
> > interested.
> >
>
> Sure, please feel free to post them.
>
> > - I believe copy_huge_pmd also needs to be updated to avoid installing a
> > migration entry if the swap entry is device-private. I don't have an
> > exact fix yet due to my limited experience with core MM. The test case
> > that triggers this is fairly simple: fault in a 2MB device page on the
> > GPU, then fork a process that reads the page — the kernel crashes in
> > this scenario.
> >
>
> I'd be happy to look at any traces you have or post any fixes you have
>
Ok, I think I have some code that works after slowly reverse-engineering
the core MM code - my test case passes without any warnings / kernel
crashes.
I've included it below. Feel free to include it in your next revision,
modify it as you see fit, or do whatever you like with it.
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 2b2563f35544..1cd6d9a10657 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1773,17 +1773,46 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
swp_entry_t entry = pmd_to_swp_entry(pmd);
VM_BUG_ON(!is_pmd_migration_entry(pmd) &&
- !is_device_private_entry(entry));
- if (!is_readable_migration_entry(entry)) {
- entry = make_readable_migration_entry(
- swp_offset(entry));
+ !is_device_private_entry(entry));
+
+ if (!is_device_private_entry(entry) &&
+ !is_readable_migration_entry(entry)) {
+ entry = make_readable_migration_entry(swp_offset(entry));
pmd = swp_entry_to_pmd(entry);
if (pmd_swp_soft_dirty(*src_pmd))
pmd = pmd_swp_mksoft_dirty(pmd);
if (pmd_swp_uffd_wp(*src_pmd))
pmd = pmd_swp_mkuffd_wp(pmd);
set_pmd_at(src_mm, addr, src_pmd, pmd);
+ } else if (is_device_private_entry(entry)) {
+ if (is_writable_device_private_entry(entry)) {
+ entry = make_readable_device_private_entry(swp_offset(entry));
+
+ pmd = swp_entry_to_pmd(entry);
+ if (pmd_swp_soft_dirty(*src_pmd))
+ pmd = pmd_swp_mksoft_dirty(pmd);
+ if (pmd_swp_uffd_wp(*src_pmd))
+ pmd = pmd_swp_mkuffd_wp(pmd);
+ set_pmd_at(src_mm, addr, src_pmd, pmd);
+ }
+
+ src_page = pfn_swap_entry_to_page(entry);
+ VM_BUG_ON_PAGE(!PageHead(src_page), src_page);
+ src_folio = page_folio(src_page);
+
+ folio_get(src_folio);
+ if (unlikely(folio_try_dup_anon_rmap_pmd(src_folio, src_page,
+ dst_vma, src_vma))) {
+ /* Page maybe pinned: split and retry the fault on PTEs. */
+ folio_put(src_folio);
+ pte_free(dst_mm, pgtable);
+ spin_unlock(src_ptl);
+ spin_unlock(dst_ptl);
+ __split_huge_pmd(src_vma, src_pmd, addr, false);
+ return -EAGAIN;
+ }
}
+
add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
mm_inc_nr_ptes(dst_mm);
pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
Matt
> Thanks for the feedback
> Balbir Singh
Powered by blists - more mailing lists