lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <44fd6f72-a5c5-471e-8044-3bbcea6d8338@lucifer.local>
Date: Mon, 12 May 2025 12:24:10 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Harry Yoo <harry.yoo@...cle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
        "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        David Hildenbrand <david@...hat.com>,
        Matthew Wilcox <willy@...radead.org>, Rik van Riel <riel@...riel.com>,
        Wei Yang <richard.weiyang@...il.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/3] mm/vma: fix incorrectly disallowed anonymous VMA
 merges

On Mon, May 12, 2025 at 11:39:16AM +0900, Harry Yoo wrote:
> On Tue, Apr 08, 2025 at 10:29:31AM +0100, Lorenzo Stoakes wrote:
> > anon_vma_chain's were introduced by Rik von Riel in commit 5beb49305251 ("mm:
> > change anon_vma linking to fix multi-process server scalability issue").
> >
> > This patch was introduced in March 2010. As part of this change, careful
> > attention was made to the instance of mprotect() causing a VMA merge, with one
> > faulted (i.e. having anon_vma set) and another not:
> >
> > 		/*
> > 		 * Easily overlooked: when mprotect shifts the boundary,
> > 		 * make sure the expanding vma has anon_vma set if the
> > 		 * shrinking vma had, to cover any anon pages imported.
> > 		 */
> >
> > In the modern VMA code, this is handled in dup_anon_vma() (and ultimately
> > anon_vma_clone()).
> >
> > This case is one of the three configurations of adjacent VMA anon_vma state
> > that we might encounter on merge (where dst is the VMA which will be merged
> > into and src the one being merged into dst):
> >
> > 1.  dst->anon_vma,  src->anon_vma - These must be equal, no-op.
> > 2.  dst->anon_vma, !src->anon_vma - We simply use dst->anon_vma, no-op.
> > 3. !dst->anon_vma,  src->anon_vma - The case in question here.
> >
> > In case 3, the instance addressed here - we duplicate the AVC connections
> > from src and place into dst.
> >
> > However, in practice, we very often do NOT do this.
> >
> > This appears to be due to an inadvertent consequence of the change
> > introduced by commit 965f55dea0e3 ("mmap: avoid merging cloned VMAs"),
> > introduced in May 2011.
> >
> > This implies that this merge case was functional only for a little over a
> > year, and has since been broken for ~15 years.
> >
> > Here, lock scalability concerns lead to us restricting anonymous merges
> > only to those VMAs with 1 entry in their vma->anon_vma_chain, that is, a
> > VMA that is not connected to any parent process's anon_vma.
> >
> > The mergeability test looks like this:
> >
> > static inline bool is_mergeable_anon_vma(struct anon_vma *anon_vma1,
> > 		 struct anon_vma *anon_vma2, struct vm_area_struct *vma)
> > {
> > 	if ((!anon_vma1 || !anon_vma2) && (!vma ||
> > 		!vma->anon_vma || list_is_singular(&vma->anon_vma_chain)))
> > 		return true;
> > 	return anon_vma1 == anon_vma2;
> > }
> >
> > However, we have a problem here - typically the vma passed here is the
> > destination VMA.
> >
> > For instance in vma_merge_existing_range() we invoke:
> >
> > can_vma_merge_left()
> > -> [ check that there is an immediately adjacent prior VMA ]
> > -> can_vma_merge_after()
> >   -> is_mergeable_vma() for general attribute check
> > -> is_mergeable_anon_vma([ proposed anon_vma ], prev->anon_vma, prev)
> >
> > So if we were considering a target unfaulted 'prev':
> >
> > 	  unfaulted    faulted
> > 	|-----------|-----------|
> > 	|    prev   |    vma    |
> > 	|-----------|-----------|
> >
> > This would call is_mergeable_anon_vma(NULL, vma->anon_vma, prev).
> >
> > The list_is_singular() check for vma->anon_vma_chain, an empty list on
> > fault, would cause this merge to _fail_ even though all else indicates a
> > merge.
> >
> > Equally a simple merge into a next VMA would hit the same problem:
> >
> > 	   faulted    unfaulted
> > 	|-----------|-----------|
> > 	|    vma    |    next   |
> > 	|-----------|-----------|
> >
> > can_vma_merge_right()
> > -> [ check that there is an immediately adjacent succeeding VMA ]
> > -> can_vma_merge_before()
> >   -> is_mergeable_vma() for general attribute check
> > -> is_mergeable_anon_vma([ proposed anon_vma ], next->anon_vma, next)
> >
> > For a 3-way merge, we'd also hit the same problem if it was configured like
> > this for instance:
> >
> > 	  unfaulted    faulted    unfaulted
> > 	|-----------|-----------|-----------|
> > 	|    prev   |    vma    |    next   |
> > 	|-----------|-----------|-----------|
> >
> > As we'd call can_vma_merge_left() for prev, and can_vma_merge_right() for
> > next, both of which would fail.
> >
> > vma_merge_new_range() (and relatedly, vma_expand()) are not impacted, as
> > the new VMA would never already be faulted (it is a proposed new range).
> >
> > Because we already handle each of the aforementioned merge cases, and can
> > absolutely therefore deal with an existing VMA merge with !dst->anon_vma,
> > src->anon_vma, there is absolutely no reason to disallow this kind of
> > merge.
> >
> > It seems that the intention of this patch is to ensure that, in the
> > instance of merging unfaulted VMAs with faulted ones, we never wish to do
> > so with those with multiple AVCs due to the fact that anon_vma lock's are
> > held across both parent and child anon_vma's (actually, the 'root' parent
> > anon_vma's lock is used).
> >
> > In fact, the original commit alludes to this - "find_mergeable_anon_vma()
> > already considers this case".
> >
> > In find_mergeable_anon_vma() however, we check the anon_vma which will be
> > merged from, if it is set, then we check
> > list_is_singular(vma->anon_vma_chain).
> >
> > So to match this logic, update is_mergeable_anon_vma() to perform this
> > scalability check on the VMA whose anon_vma we ultimately merge into.
> >
> > This matches existing behaviour with forked VMAs, only we no longer wrongly
> > disallow ALL empty target merges.
> >
> > So we both allow merge cases and ensure the scalability check is correctly
> > applied.
> >
> > We may wish to revisit these lock scalability concerns at a later date and
> > ensure they are still valid.
> >
> > Additionally, correct userland VMA tests which were mistakenly not
> > asserting these cases correctly previously to now correctly assert this,
> > and to ensure vmg->anon_vma state is always consistent to account for newly
> > introduced asserts.
> >
> > Fixes: 965f55dea0e3 ("mmap: avoid merging cloned VMAs")
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> > Reviewed-by: Yeoreum Yun <yeoreum.yun@....com>
> > ---
>
> A little bit late to review, but better late than never :)
>
> This patch makes sense and looks good to me,
> Reviewed-by: Harry Yoo <harry.yoo@...cle.com>

Thanks! Appreciated :)

>
> --
> Cheers,
> Harry / Hyeonggon

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ