lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260115182720.1691130-1-Liam.Howlett@oracle.com>
Date: Thu, 15 Jan 2026 13:27:10 -0500
From: "Liam R. Howlett" <Liam.Howlett@...cle.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Suren Baghdasaryan <surenb@...gle.com>,
        Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
        Pedro Falcato <pfalcato@...e.de>, David Hildenbrand <david@...hat.com>,
        Vlastimil Babka <vbabka@...e.cz>, Michal Hocko <mhocko@...e.com>,
        Jann Horn <jannh@...gle.com>, shikemeng@...weicloud.com,
        kasong@...cent.com, nphamcs@...il.com, bhe@...hat.com,
        baohua@...nel.org, chrisl@...nel.org,
        Matthew Wilcox <willy@...radead.org>,
        "Liam R. Howlett" <Liam.Howlett@...cle.com>
Subject: [PATCH v2 00/10] Remove XA_ZERO from error recovery of dup_mmap()

It is possible that the dup_mmap() call fails on allocating or setting
up a vma after the maple tree of the oldmm is copied.  Today, that
failure point is marked by inserting an XA_ZERO entry over the failure
point so that the exact location does not need to be communicated
through to exit_mmap().

However, a race exists in the tear down process because the dup_mmap()
drops the mmap lock before exit_mmap() can remove the partially set up
vma tree.  This means that other tasks may get to the mm tree and find
the invalid vma pointer (since it's an XA_ZERO entry), even though the
mm is marked as MMF_OOM_SKIP and MMF_UNSTABLE.

To remove the race fully, the tree must be cleaned up before dropping
the lock.  This is accomplished by extracting the vma cleanup in
exit_mmap() and changing the required functions to pass through the vma
search limit.  Any other tree modifications would require extra cycles
which should be spent on freeing memory.

This does run the risk of increasing the possibility of finding no vmas
(which is already possible!) in code that isn't careful.

The final four patches are to address the excessive argument lists being
passed between the functions.  Using the struct unmap_desc also allows
some special-case code to be removed in favour of the struct setup
differences.

V1: https://lore.kernel.org/all/20250909190945.1030905-1-Liam.Howlett@oracle.com/
RFC: https://lore.kernel.org/linux-mm/20250815191031.3769540-1-Liam.Howlett@oracle.com/

Changes since v1:
- All patches have been standardized to <prefix>_start/_end with the
  prefix being vma and pg.  - Thanks David (and Lorenzo)
- Change WARN_ON_ONCE to VM_WARN_ON_ONCE - Thanks David
- Drop static and let the compiler decide - Thanks Suren
- Fix header/c variable name mismatch - Thanks Lorenzo & Pedro
- Added to commit message about the vma search - Thanks Suren
- Added to commit message about variable names - Thanks David
- Fixed comment in free_pgtables() - Thanks Suren
- Make free_pgtables() comment a kernel doc - Thanks Lorenzo
- Fixed order of arguments to free_pgtables() - Thanks Suren
- Added change log comment about cleaning up the failed dup_mmap() even
  when all vmas are copied.
- Renamed UNMAP_REGION to UNMAP_STATE - Thanks Suren & Lorenzo
- Split patch 8 into two.  - Thanks Lorenzo

Liam R. Howlett (10):
  mm/mmap: Move exit_mmap() trace point
  mm/mmap: Abstract vma clean up from exit_mmap()
  mm/vma: Add limits to unmap_region() for vmas
  mm/memory: Add tree limit to free_pgtables()
  mm/vma: Add page table limit to unmap_region()
  mm: Change dup_mmap() recovery
  mm: Introduce unmap_desc struct to reduce function arguments
  mm/vma: Use unmap_desc in exit_mmap() and vms_clear_ptes()
  mm/vma: Use unmap_region() in vms_clear_ptes()
  mm: Use unmap_desc struct for freeing page tables.

 include/linux/mm.h               |  4 --
 mm/internal.h                    |  8 ++-
 mm/memory.c                      | 71 +++++++++++++----------
 mm/mmap.c                        | 97 ++++++++++++++++++++++----------
 mm/vma.c                         | 54 ++++++++++--------
 mm/vma.h                         | 47 +++++++++++++++-
 tools/testing/vma/vma_internal.h | 12 ++--
 7 files changed, 193 insertions(+), 100 deletions(-)

-- 
2.47.3


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ