lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fb53f16c-9386-4b83-9696-1ab51f03fe54@lucifer.local>
Date: Thu, 1 May 2025 15:38:03 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Wei Yang <richard.weiyang@...il.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
        "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Matthew Wilcox <willy@...radead.org>,
        David Hildenbrand <david@...hat.com>, Pedro Falcato <pfalcato@...e.de>,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v2 01/10] mm/mremap: introduce more mergeable mremap
 via MREMAP_RELOCATE_ANON

On Thu, May 01, 2025 at 02:35:01PM +0000, Wei Yang wrote:
> On Thu, May 01, 2025 at 10:27:47AM +0100, Lorenzo Stoakes wrote:
> >On Thu, May 01, 2025 at 01:18:45AM +0000, Wei Yang wrote:
> >> On Wed, Apr 30, 2025 at 05:07:40PM +0100, Lorenzo Stoakes wrote:
> >> >On Wed, Apr 30, 2025 at 03:41:19PM +0000, Wei Yang wrote:
> >> >> On Wed, Apr 30, 2025 at 02:15:24PM +0100, Lorenzo Stoakes wrote:
> >> >> >On Wed, Apr 30, 2025 at 12:47:03AM +0000, Wei Yang wrote:
> >> >> >> On Tue, Apr 22, 2025 at 09:09:20AM +0100, Lorenzo Stoakes wrote:
> >> >> >> [...]
> >> >> >> >+bool vma_had_uncowed_children(struct vm_area_struct *vma)
> >> >> >> >+{
> >> >> >> >+	struct anon_vma *anon_vma = vma ? vma->anon_vma : NULL;
> >> >> >> >+	bool ret;
> >> >> >> >+
> >> >> >> >+	if (!anon_vma)
> >> >> >> >+		return false;
> >> >> >> >+
> >> >> >> >+	/*
> >> >> >> >+	 * If we're mmap locked then there's no way for this count to change, as
> >> >> >> >+	 * any such change would require this lock not be held.
> >> >> >> >+	 */
> >> >> >> >+	if (rwsem_is_locked(&vma->vm_mm->mmap_lock))
> >> >> >> >+		return anon_vma->num_children > 1;
> >> >> >>
> >> >> >> Hi, Lorenzo
> >> >> >>
> >> >> >> May I have a question here?
> >> >> >
> >> >> >Just ask the question.
> >> >> >
> >> >>
> >> >> Thanks.
> >> >>
> >> >> My question is the function is expected to return true, if we have forked a
> >> >> vma from this one, right?
> >> >>
> >> >> IMO there are cases when it has one forked child and anon_vma->num_children == 1,
> >> >> which means folios are not exclusively mapped. But the function would return
> >> >> false.
> >> >>
> >> >> Or maybe I misunderstand the logic here.
> >> >
> >> >I mean, it'd be helpful if you delineated which cases these were?
> >> >
> >>
> >> Sorry, I should be more specific.
> >>
> >> >Presumably you're thiking of something like:
> >> >
> >> >1. Process 1: VMA A is established. num_children == 1 (self-reference is counted).
> >> >2. Process 2: Process 1 forks, VMA B references A, a->num_children++
> >> >3. Process 3: Process 2 forks, VMA C is established (maybe you think b->num_children++?)
> >>
> >> Maybe this is the key point. Will explain below at ***.
> >>
> >> >4. Unmap vma B, oops, a->num_children == 1 but it still has C!
> >> >
> >> >But that won't happen, as VMA C will be referencing a->anon_vma, so in reality
> >> >a->anon_vma->num_children == 3, then after unmap == 2.
> >> >
> >>
> >> The case here could be handled well, I am thinking a little different one.
> >>
> >> Here is the case I am thinking about. If my understanding is wrong, please
> >> correct me.
> >>
> >> 	a                  VMA A
> >> 	+-----------+      +-----------+
> >> 	|           | ---> |         av| == a
> >> 	+-----------+      +-----------+
> >> 	             \
> >> 	              \
> >> 	              |\   VMA B
> >> 	              | \  +-----------+
> >> 	              |  > |         av| == b
> >> 	              |    +-----------+
> >> 	              \
> >> 	               \   VMA C
> >> 	                \  +-----------+
> >> 	                 > |         av| == c
> >> 	                   +-----------+
> >>
> >> 1. Process 1: VMA A is established, num_children == 1
> >> 2. Process 2: Process 1 forks, a->num_children++ and b->num_children == 0
> >> 3. Process 3: Process 2 forks, b->num_children++ => b->number_children == 1
> >>
> >> If vma_had_uncowed_children(VMA B), we would check b->number_children and
> >> return false since it is not greater than 1. But we do have a child process 3.
> >>
> >> ***
> >>
> >> Come back the b->num_children. After re-read your example, I guess this is the
> >> key point. In anon_vma_fork(), we do anon_vma->parent->num_children++. So when
> >> fork VMA C, we increase b->num_children instead of a->num_children.
> >>
> >> To verify this, I did a quick test in my test cases in
> >> test_fork_grand_child[1]. I see b->num_children is increased to 1 after C is
> >> forked. Will reply in that thread and hope that would be helpful to
> >> communicate the case.
> >>
> >> Well, if I am not correct, feel free to correct me :-)
> >
> >OK so you've expressed this in a very confusing way and the diagram is
> >wrong but I think I see the point.
> >
>
> Sorry for my poor expression, while fortunately you get it :-)

No need to apologise haha, thanks for reporting this. This kind of thing is
useful, we always want reports of problems (in this case, ahead of time...).

>
> >Because of anon_vma reuse logic in anon_vma_clone() we might end up in the
> >situation where num_children (which strictly reports number of anon_vma
> >objects whose parent pointer points at that anon_vma) does not actually
> >correctly reflect the fact that there are multiple mappings of a folio.
> >
> >I think correct approach is to also look at num_active_vmas which accounts
> >for this, but I think overall we should move these checks to being a 'best
> >guess' and remove the WARN_ON() around the multiply-mapped folio
> >logic. It's fine to just back out if we guesstimated wrong.
> >
>
> Would you mind cc me if you would spin another round? I would like to learn
> more from your work.

Of course dude, if I reference somebody in a change log I always cc as a matter
of principle :)

Cheers, Lorenzo

>
> >I'll also add a bunch of tests to assert specific fork scenarios.
> >
> >>
> >> [1]: http://lkml.kernel.org/r/20250429090639.784-3-richard.weiyang@gmail.com
> >>
> >> >References to the originally faulted-in anon_vma is propagated through the
> >> >forks.
> >> >
> >> >anon_vma logic is tricky, one of many reasons I want to (significantly) rework
> >> >it.
> >> >
> >> >Though sadly there is a lot of _essential_ complexity, I do think we can do
> >> >better.
> >> >
> >>
> >> --
> >> Wei Yang
> >> Help you, Help me
>
> --
> Wei Yang
> Help you, Help me

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ