[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <44fd6f72-a5c5-471e-8044-3bbcea6d8338@lucifer.local>
Date: Mon, 12 May 2025 12:24:10 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Harry Yoo <harry.yoo@...cle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Suren Baghdasaryan <surenb@...gle.com>,
David Hildenbrand <david@...hat.com>,
Matthew Wilcox <willy@...radead.org>, Rik van Riel <riel@...riel.com>,
Wei Yang <richard.weiyang@...il.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/3] mm/vma: fix incorrectly disallowed anonymous VMA
merges
On Mon, May 12, 2025 at 11:39:16AM +0900, Harry Yoo wrote:
> On Tue, Apr 08, 2025 at 10:29:31AM +0100, Lorenzo Stoakes wrote:
> > anon_vma_chain's were introduced by Rik von Riel in commit 5beb49305251 ("mm:
> > change anon_vma linking to fix multi-process server scalability issue").
> >
> > This patch was introduced in March 2010. As part of this change, careful
> > attention was made to the instance of mprotect() causing a VMA merge, with one
> > faulted (i.e. having anon_vma set) and another not:
> >
> > /*
> > * Easily overlooked: when mprotect shifts the boundary,
> > * make sure the expanding vma has anon_vma set if the
> > * shrinking vma had, to cover any anon pages imported.
> > */
> >
> > In the modern VMA code, this is handled in dup_anon_vma() (and ultimately
> > anon_vma_clone()).
> >
> > This case is one of the three configurations of adjacent VMA anon_vma state
> > that we might encounter on merge (where dst is the VMA which will be merged
> > into and src the one being merged into dst):
> >
> > 1. dst->anon_vma, src->anon_vma - These must be equal, no-op.
> > 2. dst->anon_vma, !src->anon_vma - We simply use dst->anon_vma, no-op.
> > 3. !dst->anon_vma, src->anon_vma - The case in question here.
> >
> > In case 3, the instance addressed here - we duplicate the AVC connections
> > from src and place into dst.
> >
> > However, in practice, we very often do NOT do this.
> >
> > This appears to be due to an inadvertent consequence of the change
> > introduced by commit 965f55dea0e3 ("mmap: avoid merging cloned VMAs"),
> > introduced in May 2011.
> >
> > This implies that this merge case was functional only for a little over a
> > year, and has since been broken for ~15 years.
> >
> > Here, lock scalability concerns lead to us restricting anonymous merges
> > only to those VMAs with 1 entry in their vma->anon_vma_chain, that is, a
> > VMA that is not connected to any parent process's anon_vma.
> >
> > The mergeability test looks like this:
> >
> > static inline bool is_mergeable_anon_vma(struct anon_vma *anon_vma1,
> > struct anon_vma *anon_vma2, struct vm_area_struct *vma)
> > {
> > if ((!anon_vma1 || !anon_vma2) && (!vma ||
> > !vma->anon_vma || list_is_singular(&vma->anon_vma_chain)))
> > return true;
> > return anon_vma1 == anon_vma2;
> > }
> >
> > However, we have a problem here - typically the vma passed here is the
> > destination VMA.
> >
> > For instance in vma_merge_existing_range() we invoke:
> >
> > can_vma_merge_left()
> > -> [ check that there is an immediately adjacent prior VMA ]
> > -> can_vma_merge_after()
> > -> is_mergeable_vma() for general attribute check
> > -> is_mergeable_anon_vma([ proposed anon_vma ], prev->anon_vma, prev)
> >
> > So if we were considering a target unfaulted 'prev':
> >
> > unfaulted faulted
> > |-----------|-----------|
> > | prev | vma |
> > |-----------|-----------|
> >
> > This would call is_mergeable_anon_vma(NULL, vma->anon_vma, prev).
> >
> > The list_is_singular() check for vma->anon_vma_chain, an empty list on
> > fault, would cause this merge to _fail_ even though all else indicates a
> > merge.
> >
> > Equally a simple merge into a next VMA would hit the same problem:
> >
> > faulted unfaulted
> > |-----------|-----------|
> > | vma | next |
> > |-----------|-----------|
> >
> > can_vma_merge_right()
> > -> [ check that there is an immediately adjacent succeeding VMA ]
> > -> can_vma_merge_before()
> > -> is_mergeable_vma() for general attribute check
> > -> is_mergeable_anon_vma([ proposed anon_vma ], next->anon_vma, next)
> >
> > For a 3-way merge, we'd also hit the same problem if it was configured like
> > this for instance:
> >
> > unfaulted faulted unfaulted
> > |-----------|-----------|-----------|
> > | prev | vma | next |
> > |-----------|-----------|-----------|
> >
> > As we'd call can_vma_merge_left() for prev, and can_vma_merge_right() for
> > next, both of which would fail.
> >
> > vma_merge_new_range() (and relatedly, vma_expand()) are not impacted, as
> > the new VMA would never already be faulted (it is a proposed new range).
> >
> > Because we already handle each of the aforementioned merge cases, and can
> > absolutely therefore deal with an existing VMA merge with !dst->anon_vma,
> > src->anon_vma, there is absolutely no reason to disallow this kind of
> > merge.
> >
> > It seems that the intention of this patch is to ensure that, in the
> > instance of merging unfaulted VMAs with faulted ones, we never wish to do
> > so with those with multiple AVCs due to the fact that anon_vma lock's are
> > held across both parent and child anon_vma's (actually, the 'root' parent
> > anon_vma's lock is used).
> >
> > In fact, the original commit alludes to this - "find_mergeable_anon_vma()
> > already considers this case".
> >
> > In find_mergeable_anon_vma() however, we check the anon_vma which will be
> > merged from, if it is set, then we check
> > list_is_singular(vma->anon_vma_chain).
> >
> > So to match this logic, update is_mergeable_anon_vma() to perform this
> > scalability check on the VMA whose anon_vma we ultimately merge into.
> >
> > This matches existing behaviour with forked VMAs, only we no longer wrongly
> > disallow ALL empty target merges.
> >
> > So we both allow merge cases and ensure the scalability check is correctly
> > applied.
> >
> > We may wish to revisit these lock scalability concerns at a later date and
> > ensure they are still valid.
> >
> > Additionally, correct userland VMA tests which were mistakenly not
> > asserting these cases correctly previously to now correctly assert this,
> > and to ensure vmg->anon_vma state is always consistent to account for newly
> > introduced asserts.
> >
> > Fixes: 965f55dea0e3 ("mmap: avoid merging cloned VMAs")
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> > Reviewed-by: Yeoreum Yun <yeoreum.yun@....com>
> > ---
>
> A little bit late to review, but better late than never :)
>
> This patch makes sense and looks good to me,
> Reviewed-by: Harry Yoo <harry.yoo@...cle.com>
Thanks! Appreciated :)
>
> --
> Cheers,
> Harry / Hyeonggon
Powered by blists - more mailing lists