lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20260123062309.23090-1-jniethe@nvidia.com>
Date: Fri, 23 Jan 2026 17:22:56 +1100
From: Jordan Niethe <jniethe@...dia.com>
To: linux-mm@...ck.org
Cc: balbirs@...dia.com,
	matthew.brost@...el.com,
	akpm@...ux-foundation.org,
	linux-kernel@...r.kernel.org,
	dri-devel@...ts.freedesktop.org,
	david@...hat.com,
	ziy@...dia.com,
	apopple@...dia.com,
	lorenzo.stoakes@...cle.com,
	lyude@...hat.com,
	dakr@...nel.org,
	airlied@...il.com,
	simona@...ll.ch,
	rcampbell@...dia.com,
	mpenttil@...hat.com,
	jgg@...dia.com,
	willy@...radead.org,
	linuxppc-dev@...ts.ozlabs.org,
	intel-xe@...ts.freedesktop.org,
	jgg@...pe.ca,
	Felix.Kuehling@....com,
	jniethe@...dia.com,
	jhubbard@...dia.com
Subject: [PATCH v3 00/13] Remove device private pages from physical address space

Introduction
------------

The existing design of device private memory imposes limitations which
render it non functional for certain systems and configurations - this
series removes those limitations. These issues are:

  1) Limited available physical address space 
  2) Conflicts with arch64 mm implementation

Limited available address space
-------------------------------

Device private memory is implemented by first reserving a region of the
physical address space. This is a problem. The physical address space is
not a resource that is directly under the kernel's control. Availability
of suitable physical address space is constrained by the underlying
hardware and firmware and may not always be available. 

Device private memory assumes that it will be able to reserve a device
memory sized chunk of physical address space. However, there is nothing
guaranteeing that this will succeed, and there a number of factors that
increase the likelihood of failure. We need to consider what else may
exist in the physical address space. It is observed that certain VM
configurations place very large PCI windows immediately after RAM. Large
enough that there is no physical address space available at all for
device private memory. This is more likely to occur on 43 bit physical
width systems which have less physical address space.

The fundamental issue is the physical address space is not a resource
the kernel can rely on being to allocate from at will.  

aarch64 issues
--------------

The current device private memory implementation has further issues on
aarch64. On aarch64, vmemmap is sized to cover the ram only. Adding
device private pages to the linear map then means that for device
private page, pfn_to_page() will read beyond the end of vmemmap region
leading to potential memory corruption. This means that device private
memory does not work reliably on aarch64 [0].  

New implementation
------------------

This series changes device private memory so that it does not require
allocation of physical address space and these problems are avoided.
Instead of using the physical address space, we introduce a "device
private address space" and allocate from there.

A consequence of placing the device private pages outside of the
physical address space is that they no longer have a PFN. However, it is
still necessary to be able to look up a corresponding device private
page from a device private PTE entry, which means that we still require
some way to index into this device private address space. Instead of a
PFN, device private pages use an offset into this device private address
space to look up device private struct pages.

The problem that then needs to be addressed is how to avoid confusing
these device private offsets with PFNs. It is the limited usage
of the device private pages themselves which make this possible. A
device private page is only used for userspace mappings, we do not need
to be concerned with them being used within the mm more broadly. This
means that the only way that the core kernel looks up these pages is via
the page table, where their PTE already indicates if they refer to a
device private page via their swap type, e.g.  SWP_DEVICE_WRITE. We can
use this information to determine if the PTE contains a PFN which should
be looked up in the page map, or a device private offset which should be
looked up elsewhere.

This applies when we are creating PTE entries for device private pages -
because they have their own type there are already must be handled
separately, so it is a small step to convert them to a device private
PFN now too.

The first part of the series updates callers where device private
offsets might now be encountered to track this extra state.

The last patch contains the bulk of the work where we change how we
convert between device private pages to device private offsets and then
use a new interface for allocating device private pages without the need
for reserving physical address space.

By removing the device private pages from the physical address space,
this series also opens up the possibility to moving away from tracking
device private memory using struct pages in the future. This is
desirable as on systems with large amounts of memory these device
private struct pages use a signifiant amount of memory and take a
significant amount of time to initialize.

Changes in v3
-------------

Thanks all for feedback and suggestions on v2.

Most significant change is fixing an null pointer redef when
memremap_device_private_pagemap() was called with NUMA_NO_NODE.

Details:

  - mm/migrate_device: Add migrate PFN flag to track device private pages
    - Use adev->kfd.pgmap.type == MEMORY_DEVICE_PRIVATE
  - mm/page_vma_mapped: Add flag to page_vma_mapped_walk::flags to track
    device private pages
      - Track device private offset in pvmw::flags instead of pvmw::pfn
  - mm: Add a new swap type for migration entries of device private pages
      - Move softleaf changes to new patch
      - Update commit message to explain the change reduces the number of
        swap files.
      - Move creating the device private migration changes to a separate
        patch
      - Remove predicates - we'll rely on softleaf predicates entirely
  - mm: Add softleaf support for device private migration entries
    - Separated from previous patch
    - s/SOFTLEAF_MIGRATION_DEVICE_/SOFTLEAF_MIGRATION_DEVICE_PRIVATE_/
    - Update comment for softleaf_is_migration_read()
  - mm: Begin creating device private migration entries
      - Provided as an individual patch
  - mm: Remove device private pages from the physical address space
    - Use numa_mem_id() if memremap_device_private_pagemap is called with
      NUMA_NO_NODE. This fixes a null pointer deref in
      lruvec_stat_mod_folio().
    - drm/xe: Remove call to devm_release_mem_region() in xe_pagemap_destroy_work()
    - s/VM_BUG/VM_WARN/


Testing:
- selftests/mm/hmm-tests on an amd64 VM

Revisions:
- RFC: https://lore.kernel.org/all/20251128044146.80050-1-jniethe@nvidia.com/
- v1: https://lore.kernel.org/all/20251231043154.42931-1-jniethe@nvidia.com/
- v2: https://lore.kernel.org/all/20260107091823.68974-1-jniethe@nvidia.com/

[0] https://lore.kernel.org/lkml/CAMj1kXGtFyugzi9MZW=4_oVTy==eAF6283fwvX9fdZhO98ZA3g@mail.gmail.com/

Jordan Niethe (13):
  mm/migrate_device: Introduce migrate_pfn_from_page() helper
  drm/amdkfd: Use migrate pfns internally
  mm/migrate_device: Make migrate_device_{pfns,range}() take mpfns
  mm/migrate_device: Add migrate PFN flag to track device private pages
  mm/page_vma_mapped: Add flag to page_vma_mapped_walk::flags to track
    device private pages
  mm: Add helpers to create migration entries from struct pages
  mm: Add a new swap type for migration entries of device private pages
  mm: Add softleaf support for device private migration entries
  mm: Begin creating device private migration entries
  mm: Add helpers to create device private entries from struct pages
  mm/util: Add flag to track device private pages in page snapshots
  mm/hmm: Add flag to track device private pages
  mm: Remove device private pages from the physical address space

 Documentation/mm/hmm.rst                 |  11 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c       |  43 ++---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  45 +++---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.h |   2 +-
 drivers/gpu/drm/drm_pagemap.c            |  11 +-
 drivers/gpu/drm/nouveau/nouveau_dmem.c   |  45 ++----
 drivers/gpu/drm/xe/xe_svm.c              |  37 ++---
 fs/proc/page.c                           |   6 +-
 include/drm/drm_pagemap.h                |   8 +-
 include/linux/hmm.h                      |   7 +-
 include/linux/leafops.h                  | 120 ++++++++++++--
 include/linux/memremap.h                 |  64 +++++++-
 include/linux/migrate.h                  |  23 ++-
 include/linux/mm.h                       |   9 +-
 include/linux/rmap.h                     |  29 +++-
 include/linux/swap.h                     |   8 +-
 include/linux/swapops.h                  |  99 ++++++++++++
 lib/test_hmm.c                           |  87 ++++++----
 mm/debug.c                               |   9 +-
 mm/hmm.c                                 |   5 +-
 mm/huge_memory.c                         |  43 ++---
 mm/hugetlb.c                             |  15 +-
 mm/memory.c                              |   5 +-
 mm/memremap.c                            | 196 ++++++++++++++++++-----
 mm/migrate.c                             |   6 +-
 mm/migrate_device.c                      |  76 +++++----
 mm/mm_init.c                             |   8 +-
 mm/mprotect.c                            |  10 +-
 mm/page_vma_mapped.c                     |  26 ++-
 mm/rmap.c                                |  59 ++++---
 mm/util.c                                |   8 +-
 mm/vmscan.c                              |   2 +-
 32 files changed, 781 insertions(+), 341 deletions(-)


base-commit: 7a45b8f10286a29b005fdcf1e4eb0ecff8675c75
-- 
2.34.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ