linux-kernel - RE: Issues with Pinning User Pages for SVA on IOMMUs Lacking IOPF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <54be598d93404e7185ecfbe49f7fe93c@huawei.com>
Date: Tue, 2 Sep 2025 13:04:51 +0000
From: Zhangyuhao <yuhao.zhang@...wei.com>
To: David Hildenbrand <david@...hat.com>, Andrew Morton
	<akpm@...ux-foundation.org>, Jason Gunthorpe <jgg@...pe.ca>, John Hubbard
	<jhubbard@...dia.com>, Peter Xu <peterx@...hat.com>, Joerg Roedel
	<joro@...tes.org>, Will Deacon <will@...nel.org>, Robin Murphy
	<robin.murphy@....com>
CC: "linux-mm@...ck.org" <linux-mm@...ck.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "iommu@...ts.linux.dev"
	<iommu@...ts.linux.dev>
Subject: RE: Issues with Pinning User Pages for SVA on IOMMUs Lacking IOPF

[Adding linux-kernel mailing list for visibility]

Best,
Yuhao
-----Original Message-----
From: David Hildenbrand <david@...hat.com> 
Sent: Monday, September 1, 2025 10:34 PM
To: Zhangyuhao <yuhao.zhang@...wei.com>; Andrew Morton <akpm@...ux-foundation.org>; Jason Gunthorpe <jgg@...pe.ca>; John Hubbard <jhubbard@...dia.com>; Peter Xu <peterx@...hat.com>; Joerg Roedel <joro@...tes.org>; Will Deacon <will@...nel.org>; Robin Murphy <robin.murphy@....com>
Subject: Re: Issues with Pinning User Pages for SVA on IOMMUs Lacking IOPF

On 01.09.25 15:43, Zhangyuhao wrote:
> Hello Linux kernel community,

Hi,

> 
> Current IOMMU SVA support relies on hardware IOPF (IO Page Fault). We have observed that certain IOMMU devices do not support IOPF.
> But We are still exploring how to enable SVA in such scenarios.
> 
> To address this, we attempted to pin memory to prevent device accesses from triggering IO page faults.
> 
> Solution 1: User-space mlock + madvise(MADV_POPULATE_WRITE)
> 
> if (madvise(buf, size, MADV_POPULATE_WRITE) != 0) {
>      free(buf);
>      return 1;
> }
> if (mlock(buf, size) != 0) {
>      free(buf);
>      return 1;
> }
> Result: Page faults still occurred due to page migration.

Yes, NUMA-hinting might similarly affect this (even when page not migrated).

> 
> Solution 2: Kernel-space pin via IOCTL
> 
> ret = pin_user_pages_fast(cur_base, npages, FOLL_LONGTERM, page_list);
> 
> Result: Page faults occurred occasionally, traced to NUMA balancing marking pages as invalid.

Ah, there you talk about NUMA balancing.

> 
> To solve the problem, we used FOLL_LONGTERM | FOLL_HONOR_NUMA_FAULT to pin user pages.
> 

See prot_numa_skip(): we skip DMA-pinned folios in COW mappings only. So If you would have a !COW mapping (e.g., MAP_SHARED shmem), that wouldn't work reliably I think.

I think we could change that without causing too much harm.

diff --git a/mm/mprotect.c b/mm/mprotect.c index 113b489858341..17809c8604f25 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -137,8 +137,11 @@ static bool prot_numa_skip(struct vm_area_struct *vma, unsigned long addr,
                 goto skip;
  
         /* Also skip shared copy-on-write pages */
-       if (is_cow_mapping(vma->vm_flags) &&
-           (folio_maybe_dma_pinned(folio) || folio_maybe_mapped_shared(folio)))
+       if (is_cow_mapping(vma->vm_flags) && folio_maybe_mapped_shared(folio))
+               goto skip;
+
+       /* Folios that are pinned and cannot be migrated either way. */
+       if (folio_maybe_dma_pinned(folio))
                 goto skip;
  
         /*


> This approach has been tested and successfully prevents IO page faults so far.
> 
> We would like guidance from the community:
> 
> Can this approach reliably prevent all IO page faults?

See the case above regarding non-cow mappings.

We essentially need to make sure that we don't (temporarily) unmap for migration/reclaim/split/whatever if a folio maybe pinned.

We back out in all cases (unexpected reference), but we'll have to sanity-check whether we reject maybe_pinned folios early to not temporarily unmap.

> 
> Is there a better or recommended method to pin user pages for SVA?

Most use cases use longterm pinnings to then configure the iommu manually. Then, it does not really matter what happens to your process page tables.

So your use case is rather new :)

But yes, a longerm pinning while resolving NUMA-hitning faults should in theory work.

We just have to make sure that everybody else plays nice early with dma-pinned folios.

--
Cheers

David / dhildenb