[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <cover.1750433500.git.lorenzo.stoakes@oracle.com>
Date: Fri, 20 Jun 2025 16:33:00 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: David Hildenbrand <david@...hat.com>, Zi Yan <ziy@...dia.com>,
Baolin Wang <baolin.wang@...ux.alibaba.com>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Nico Pache <npache@...hat.com>, Ryan Roberts <ryan.roberts@....com>,
Dev Jain <dev.jain@....com>, Barry Song <baohua@...nel.org>,
Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Lance Yang <ioworker0@...il.com>, SeongJae Park <sj@...nel.org>,
Suren Baghdasaryan <surenb@...gle.com>
Subject: [PATCH v2 0/5] madvise cleanup
This is a series of patches that helps address a number of historic
problems in the madvise() implementation:
* Eliminate the visitor pattern and having the code which is implemented
for both the anon_vma_name implementation and ordinary madvise()
operations use the same madvise_vma_behavior() implementation.
* Thread state through the madvise_behavior state object - this object,
very usefully introduced by SJ, is already used to transmit state through
operations. This series extends this by having all madvise() operations
use this, including anon_vma_name.
* Thread range, VMA state through madvise_behavior - This helps avoid a lot
of the confusing code around range and VMA state and again keeps things
consistent and with a single 'source of truth'.
* Addressing the very strange behaviour around the passed around struct
vm_area_struct **prev pointer - all read-only users do absolutely nothing
with the prev pointer. The only function that uses it is
madvise_update_vma(), and in all cases prev is always reset to
VMA.
Fix this by no longer having aything but madvise_update_vma() reference
prev, and having madvise_walk_vmas() update prev in each
instance. Additionally make it clear that the meaningful change in vma
state is when madvise_update_vma() potentially merges a VMA, so
explicitly retrieve the VMA in this case.
* Update and clarify the madvise_walk_vmas() function - this is a source of
a great deal of confusion, so simplify, stop using prev = NULL to signify
that the mmap lock has been dropped (!) and make that explicit, and add
some comments to explain what's going on.
v2:
* Propagated tags (thanks everyone!)
* Don't separate out __MADV_SET_ANON_VMA_NAME and __MADV_SET_CLEAR_VMA_NAME,
just use __MADV_SET_ANON_VMA_NAME as per Zi.
* Eliminate is_anon_vma_name() as no longer necessary, addressing Zi's concern
around naming another way :)
* Put mm_struct abstraction of try_vma_read_lock() into 2/5 from 3/5 as per Zi.
* Added comment about get/put anon_vma_name in madvise_vma_behavior() as per
Vlastimil.
* Renamed have_anon_name to set_new_anon_name to make it clear why we make an
exception to this get/put behaviour in madvise_vma_behavior().
* Reworded 1/4 commit message to make it clearer what's being done as per
Vlastimil.
* Avoid comma-separated decls in struct madvise_behavior_range as per Zi and
Vlastimil.
* Put fix for silly development bug (range->start comparison to end not
range->end) in 3/5 rather than 4/5 so as to eliminate it altogether,
having fixed it during development but having not put the fix in the
correct place :) as per Vlastimil.
* Rename end to last_end in madvise_walk_vmas() and added a comment for
clarity as per Vlastimil.
* Update madvise_walk_vmas() comment to no longer refer to a visitor
function.
* Separated out prev, vma fields in struct madvise_behavior as per
Vlastimil.
* Added assert on not holding VMA lock whenever mmap lock is dropped and
abstracted to mark_mmap_lock_dropped() so we always assert when we do
this, based on discussion with Vlastimil.
* Removed duplicate comment about weird -ENOMEM unmapped error behaviour.
v1:
https://lore.kernel.org/all/cover.1750363557.git.lorenzo.stoakes@oracle.com/
Lorenzo Stoakes (5):
mm/madvise: remove the visitor pattern and thread anon_vma state
mm/madvise: thread mm_struct through madvise_behavior
mm/madvise: thread VMA range state through madvise_behavior
mm/madvise: thread all madvise state through madv_behavior
mm/madvise: eliminate very confusing manipulation of prev VMA
include/linux/huge_mm.h | 9 +-
mm/khugepaged.c | 9 +-
mm/madvise.c | 585 +++++++++++++++++++++-------------------
3 files changed, 313 insertions(+), 290 deletions(-)
--
2.49.0
Powered by blists - more mailing lists