linux-kernel - Re: [v1 resend 00/12] THP support for zone device page migration

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aHrsdvjjliBBdVQm@lstrano-desk.jf.intel.com>
Date: Fri, 18 Jul 2025 17:53:10 -0700
From: Matthew Brost <matthew.brost@...el.com>
To: Balbir Singh <balbirs@...dia.com>
CC: <linux-mm@...ck.org>, <akpm@...ux-foundation.org>,
	<linux-kernel@...r.kernel.org>, Karol Herbst <kherbst@...hat.com>, Lyude Paul
	<lyude@...hat.com>, Danilo Krummrich <dakr@...nel.org>, David Airlie
	<airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
	Jérôme Glisse <jglisse@...hat.com>, Shuah Khan
	<shuah@...nel.org>, David Hildenbrand <david@...hat.com>, Barry Song
	<baohua@...nel.org>, Baolin Wang <baolin.wang@...ux.alibaba.com>, "Ryan
 Roberts" <ryan.roberts@....com>, Matthew Wilcox <willy@...radead.org>, "Peter
 Xu" <peterx@...hat.com>, Zi Yan <ziy@...dia.com>, Kefeng Wang
	<wangkefeng.wang@...wei.com>, Jane Chu <jane.chu@...cle.com>, Alistair Popple
	<apopple@...dia.com>, Donet Tom <donettom@...ux.ibm.com>
Subject: Re: [v1 resend 00/12] THP support for zone device page migration

On Fri, Jul 18, 2025 at 01:57:13PM +1000, Balbir Singh wrote:
> On 7/18/25 09:40, Matthew Brost wrote:
> > On Fri, Jul 04, 2025 at 09:34:59AM +1000, Balbir Singh wrote:
> ...
> >>
> >> The nouveau dmem code has been enhanced to use the new THP migration
> >> capability.
> >>
> >> Feedback from the RFC [2]:
> >>
> > 
> > Thanks for the patches, results look very promising. I wanted to give
> > some quick feedback:
> > 
> 
> Are you seeing improvements with the patchset?
> 
> > - You appear to have missed updating hmm_range_fault, specifically
> > hmm_vma_handle_pmd, to check for device-private entries and populate the
> > HMM PFNs accordingly. My colleague François has a fix for this if you're
> > interested.
> > 
> 
> Sure, please feel free to post them. 
> 
> > - I believe copy_huge_pmd also needs to be updated to avoid installing a
> > migration entry if the swap entry is device-private. I don't have an
> > exact fix yet due to my limited experience with core MM. The test case
> > that triggers this is fairly simple: fault in a 2MB device page on the
> > GPU, then fork a process that reads the page — the kernel crashes in
> > this scenario.
> > 
> 
> I'd be happy to look at any traces you have or post any fixes you have
> 

Ok, I think I have some code that works after slowly reverse-engineering
the core MM code - my test case passes without any warnings / kernel
crashes.

I've included it below. Feel free to include it in your next revision,
modify it as you see fit, or do whatever you like with it.

diff --git  a/mm/huge_memory.c b/mm/huge_memory.c
index 2b2563f35544..1cd6d9a10657 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1773,17 +1773,46 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
                swp_entry_t entry = pmd_to_swp_entry(pmd);

                VM_BUG_ON(!is_pmd_migration_entry(pmd) &&
-                               !is_device_private_entry(entry));
-               if (!is_readable_migration_entry(entry)) {
-                       entry = make_readable_migration_entry(
-                                                       swp_offset(entry));
+                         !is_device_private_entry(entry));
+
+               if (!is_device_private_entry(entry) &&
+                   !is_readable_migration_entry(entry)) {
+                       entry = make_readable_migration_entry(swp_offset(entry));
                        pmd = swp_entry_to_pmd(entry);
                        if (pmd_swp_soft_dirty(*src_pmd))
                                pmd = pmd_swp_mksoft_dirty(pmd);
                        if (pmd_swp_uffd_wp(*src_pmd))
                                pmd = pmd_swp_mkuffd_wp(pmd);
                        set_pmd_at(src_mm, addr, src_pmd, pmd);
+               } else if (is_device_private_entry(entry)) {
+                       if (is_writable_device_private_entry(entry)) {
+                               entry = make_readable_device_private_entry(swp_offset(entry));
+
+                               pmd = swp_entry_to_pmd(entry);
+                               if (pmd_swp_soft_dirty(*src_pmd))
+                                       pmd = pmd_swp_mksoft_dirty(pmd);
+                               if (pmd_swp_uffd_wp(*src_pmd))
+                                       pmd = pmd_swp_mkuffd_wp(pmd);
+                               set_pmd_at(src_mm, addr, src_pmd, pmd);
+                       }
+
+                       src_page = pfn_swap_entry_to_page(entry);
+                       VM_BUG_ON_PAGE(!PageHead(src_page), src_page);
+                       src_folio = page_folio(src_page);
+
+                       folio_get(src_folio);
+                       if (unlikely(folio_try_dup_anon_rmap_pmd(src_folio, src_page,
+                                                                dst_vma, src_vma))) {
+                               /* Page maybe pinned: split and retry the fault on PTEs. */
+                               folio_put(src_folio);
+                               pte_free(dst_mm, pgtable);
+                               spin_unlock(src_ptl);
+                               spin_unlock(dst_ptl);
+                               __split_huge_pmd(src_vma, src_pmd, addr, false);
+                               return -EAGAIN;
+                       }
                }
+
                add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
                mm_inc_nr_ptes(dst_mm);
                pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);

Matt

> Thanks for the feedback
> Balbir Singh