lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <54be598d93404e7185ecfbe49f7fe93c@huawei.com>
Date: Tue, 2 Sep 2025 13:04:51 +0000
From: Zhangyuhao <yuhao.zhang@...wei.com>
To: David Hildenbrand <david@...hat.com>, Andrew Morton
	<akpm@...ux-foundation.org>, Jason Gunthorpe <jgg@...pe.ca>, John Hubbard
	<jhubbard@...dia.com>, Peter Xu <peterx@...hat.com>, Joerg Roedel
	<joro@...tes.org>, Will Deacon <will@...nel.org>, Robin Murphy
	<robin.murphy@....com>
CC: "linux-mm@...ck.org" <linux-mm@...ck.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "iommu@...ts.linux.dev"
	<iommu@...ts.linux.dev>
Subject: RE: Issues with Pinning User Pages for SVA on IOMMUs Lacking IOPF

[Adding linux-kernel mailing list for visibility]

Best,
Yuhao
-----Original Message-----
From: David Hildenbrand <david@...hat.com> 
Sent: Monday, September 1, 2025 10:34 PM
To: Zhangyuhao <yuhao.zhang@...wei.com>; Andrew Morton <akpm@...ux-foundation.org>; Jason Gunthorpe <jgg@...pe.ca>; John Hubbard <jhubbard@...dia.com>; Peter Xu <peterx@...hat.com>; Joerg Roedel <joro@...tes.org>; Will Deacon <will@...nel.org>; Robin Murphy <robin.murphy@....com>
Subject: Re: Issues with Pinning User Pages for SVA on IOMMUs Lacking IOPF

On 01.09.25 15:43, Zhangyuhao wrote:
> Hello Linux kernel community,

Hi,

> 
> Current IOMMU SVA support relies on hardware IOPF (IO Page Fault). We have observed that certain IOMMU devices do not support IOPF.
> But We are still exploring how to enable SVA in such scenarios.
> 
> To address this, we attempted to pin memory to prevent device accesses from triggering IO page faults.
> 
> Solution 1: User-space mlock + madvise(MADV_POPULATE_WRITE)
> 
> if (madvise(buf, size, MADV_POPULATE_WRITE) != 0) {
>      free(buf);
>      return 1;
> }
> if (mlock(buf, size) != 0) {
>      free(buf);
>      return 1;
> }
> Result: Page faults still occurred due to page migration.

Yes, NUMA-hinting might similarly affect this (even when page not migrated).

> 
> Solution 2: Kernel-space pin via IOCTL
> 
> ret = pin_user_pages_fast(cur_base, npages, FOLL_LONGTERM, page_list);
> 
> Result: Page faults occurred occasionally, traced to NUMA balancing marking pages as invalid.

Ah, there you talk about NUMA balancing.

> 
> To solve the problem, we used FOLL_LONGTERM | FOLL_HONOR_NUMA_FAULT to pin user pages.
> 

See prot_numa_skip(): we skip DMA-pinned folios in COW mappings only. So If you would have a !COW mapping (e.g., MAP_SHARED shmem), that wouldn't work reliably I think.

I think we could change that without causing too much harm.

diff --git a/mm/mprotect.c b/mm/mprotect.c index 113b489858341..17809c8604f25 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -137,8 +137,11 @@ static bool prot_numa_skip(struct vm_area_struct *vma, unsigned long addr,
                 goto skip;
  
         /* Also skip shared copy-on-write pages */
-       if (is_cow_mapping(vma->vm_flags) &&
-           (folio_maybe_dma_pinned(folio) || folio_maybe_mapped_shared(folio)))
+       if (is_cow_mapping(vma->vm_flags) && folio_maybe_mapped_shared(folio))
+               goto skip;
+
+       /* Folios that are pinned and cannot be migrated either way. */
+       if (folio_maybe_dma_pinned(folio))
                 goto skip;
  
         /*


> This approach has been tested and successfully prevents IO page faults so far.
> 
> We would like guidance from the community:
> 
> Can this approach reliably prevent all IO page faults?

See the case above regarding non-cow mappings.

We essentially need to make sure that we don't (temporarily) unmap for migration/reclaim/split/whatever if a folio maybe pinned.

We back out in all cases (unexpected reference), but we'll have to sanity-check whether we reject maybe_pinned folios early to not temporarily unmap.

> 
> Is there a better or recommended method to pin user pages for SVA?

Most use cases use longterm pinnings to then configure the iommu manually. Then, it does not really matter what happens to your process page tables.

So your use case is rather new :)

But yes, a longerm pinning while resolving NUMA-hitning faults should in theory work.

We just have to make sure that everybody else plays nice early with dma-pinned folios.

--
Cheers

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ