[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250501143501.vljk4hriuc3c2yrv@master>
Date: Thu, 1 May 2025 14:35:01 +0000
From: Wei Yang <richard.weiyang@...il.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Wei Yang <richard.weiyang@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Suren Baghdasaryan <surenb@...gle.com>,
Matthew Wilcox <willy@...radead.org>,
David Hildenbrand <david@...hat.com>,
Pedro Falcato <pfalcato@...e.de>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v2 01/10] mm/mremap: introduce more mergeable mremap
via MREMAP_RELOCATE_ANON
On Thu, May 01, 2025 at 10:27:47AM +0100, Lorenzo Stoakes wrote:
>On Thu, May 01, 2025 at 01:18:45AM +0000, Wei Yang wrote:
>> On Wed, Apr 30, 2025 at 05:07:40PM +0100, Lorenzo Stoakes wrote:
>> >On Wed, Apr 30, 2025 at 03:41:19PM +0000, Wei Yang wrote:
>> >> On Wed, Apr 30, 2025 at 02:15:24PM +0100, Lorenzo Stoakes wrote:
>> >> >On Wed, Apr 30, 2025 at 12:47:03AM +0000, Wei Yang wrote:
>> >> >> On Tue, Apr 22, 2025 at 09:09:20AM +0100, Lorenzo Stoakes wrote:
>> >> >> [...]
>> >> >> >+bool vma_had_uncowed_children(struct vm_area_struct *vma)
>> >> >> >+{
>> >> >> >+ struct anon_vma *anon_vma = vma ? vma->anon_vma : NULL;
>> >> >> >+ bool ret;
>> >> >> >+
>> >> >> >+ if (!anon_vma)
>> >> >> >+ return false;
>> >> >> >+
>> >> >> >+ /*
>> >> >> >+ * If we're mmap locked then there's no way for this count to change, as
>> >> >> >+ * any such change would require this lock not be held.
>> >> >> >+ */
>> >> >> >+ if (rwsem_is_locked(&vma->vm_mm->mmap_lock))
>> >> >> >+ return anon_vma->num_children > 1;
>> >> >>
>> >> >> Hi, Lorenzo
>> >> >>
>> >> >> May I have a question here?
>> >> >
>> >> >Just ask the question.
>> >> >
>> >>
>> >> Thanks.
>> >>
>> >> My question is the function is expected to return true, if we have forked a
>> >> vma from this one, right?
>> >>
>> >> IMO there are cases when it has one forked child and anon_vma->num_children == 1,
>> >> which means folios are not exclusively mapped. But the function would return
>> >> false.
>> >>
>> >> Or maybe I misunderstand the logic here.
>> >
>> >I mean, it'd be helpful if you delineated which cases these were?
>> >
>>
>> Sorry, I should be more specific.
>>
>> >Presumably you're thiking of something like:
>> >
>> >1. Process 1: VMA A is established. num_children == 1 (self-reference is counted).
>> >2. Process 2: Process 1 forks, VMA B references A, a->num_children++
>> >3. Process 3: Process 2 forks, VMA C is established (maybe you think b->num_children++?)
>>
>> Maybe this is the key point. Will explain below at ***.
>>
>> >4. Unmap vma B, oops, a->num_children == 1 but it still has C!
>> >
>> >But that won't happen, as VMA C will be referencing a->anon_vma, so in reality
>> >a->anon_vma->num_children == 3, then after unmap == 2.
>> >
>>
>> The case here could be handled well, I am thinking a little different one.
>>
>> Here is the case I am thinking about. If my understanding is wrong, please
>> correct me.
>>
>> a VMA A
>> +-----------+ +-----------+
>> | | ---> | av| == a
>> +-----------+ +-----------+
>> \
>> \
>> |\ VMA B
>> | \ +-----------+
>> | > | av| == b
>> | +-----------+
>> \
>> \ VMA C
>> \ +-----------+
>> > | av| == c
>> +-----------+
>>
>> 1. Process 1: VMA A is established, num_children == 1
>> 2. Process 2: Process 1 forks, a->num_children++ and b->num_children == 0
>> 3. Process 3: Process 2 forks, b->num_children++ => b->number_children == 1
>>
>> If vma_had_uncowed_children(VMA B), we would check b->number_children and
>> return false since it is not greater than 1. But we do have a child process 3.
>>
>> ***
>>
>> Come back the b->num_children. After re-read your example, I guess this is the
>> key point. In anon_vma_fork(), we do anon_vma->parent->num_children++. So when
>> fork VMA C, we increase b->num_children instead of a->num_children.
>>
>> To verify this, I did a quick test in my test cases in
>> test_fork_grand_child[1]. I see b->num_children is increased to 1 after C is
>> forked. Will reply in that thread and hope that would be helpful to
>> communicate the case.
>>
>> Well, if I am not correct, feel free to correct me :-)
>
>OK so you've expressed this in a very confusing way and the diagram is
>wrong but I think I see the point.
>
Sorry for my poor expression, while fortunately you get it :-)
>Because of anon_vma reuse logic in anon_vma_clone() we might end up in the
>situation where num_children (which strictly reports number of anon_vma
>objects whose parent pointer points at that anon_vma) does not actually
>correctly reflect the fact that there are multiple mappings of a folio.
>
>I think correct approach is to also look at num_active_vmas which accounts
>for this, but I think overall we should move these checks to being a 'best
>guess' and remove the WARN_ON() around the multiply-mapped folio
>logic. It's fine to just back out if we guesstimated wrong.
>
Would you mind cc me if you would spin another round? I would like to learn
more from your work.
>I'll also add a bunch of tests to assert specific fork scenarios.
>
>>
>> [1]: http://lkml.kernel.org/r/20250429090639.784-3-richard.weiyang@gmail.com
>>
>> >References to the originally faulted-in anon_vma is propagated through the
>> >forks.
>> >
>> >anon_vma logic is tricky, one of many reasons I want to (significantly) rework
>> >it.
>> >
>> >Though sadly there is a lot of _essential_ complexity, I do think we can do
>> >better.
>> >
>>
>> --
>> Wei Yang
>> Help you, Help me
--
Wei Yang
Help you, Help me
Powered by blists - more mailing lists