lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0570a5ed-2583-4d96-9d92-95865ee65b97@lucifer.local>
Date: Fri, 9 Aug 2024 11:14:53 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        Vlastimil Babka <vbabka@...e.cz>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH 07/10] mm: avoid using vma_merge() for new VMAs

On Thu, Aug 08, 2024 at 03:06:14PM GMT, Liam R. Howlett wrote:
> * Liam R. Howlett <Liam.Howlett@...cle.com> [240808 14:34]:
> > * Lorenzo Stoakes <lorenzo.stoakes@...cle.com> [240808 14:02]:
> > > On Thu, Aug 08, 2024 at 06:45:43PM GMT, Vlastimil Babka wrote:
> > > > On 8/5/24 14:13, Lorenzo Stoakes wrote:
> > > > > In mmap_region() and do_brk_flags() we open code scenarios where we prefer
> > > > > to use vma_expand() rather than invoke a full vma_merge() operation.
> > > > >
> > > > > Abstract this logic and eliminate all of the open-coding, and also use the
> > > > > same logic for all cases where we add new VMAs to, rather than ultimately
> > > > > use vma_merge(), rather use vma_expand().
> > > > >
> > > > > We implement this by replacing vma_merge_new_vma() with this newly
> > > > > abstracted logic.
> > > > >
> > > > > Doing so removes duplication and simplifies VMA merging in all such cases,
> > > > > laying the ground for us to eliminate the merging of new VMAs in
> > > > > vma_merge() altogether.
> > > > >
> > > > > This makes it far easier to understand what is happening in these cases
> > > > > avoiding confusion, bugs and allowing for future optimisation.
> > > > >
> > > > > As a result of this change we are also able to make vma_prepare(),
> > > > > init_vma_prep(), vma_complete(), can_vma_merge_before() and
> > > > > can_vma_merge_after() static and internal to vma.c.
> > > >
> > > > That's really great, but it would be even better if these code moves could
> > > > be a separate patch as it would make reviewing so much easier. But with git
> > > > diff's --color-moved to the rescue, let me try...
> > >
> > > Will separate out on respin.
> > >
> > > >
> > > > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> > > > > ---
> > > > >  mm/mmap.c                        |  79 ++---
> > > > >  mm/vma.c                         | 482 +++++++++++++++++++------------
> > > > >  mm/vma.h                         |  51 +---
> > > > >  tools/testing/vma/vma_internal.h |   6 +
> > > > >  4 files changed, 324 insertions(+), 294 deletions(-)
> >
> > ...
> > > > > + */
> > > > > +struct vm_area_struct *vma_merge_new_vma(struct vma_merge_struct *vmg)
> > > > > +{
> > > > > +	bool is_special = vmg->flags & VM_SPECIAL;
> > > > > +	struct vm_area_struct *prev = vmg->prev;
> > > > > +	struct vm_area_struct *next = vmg->next;
> > > > > +	unsigned long start = vmg->start;
> > > > > +	unsigned long end = vmg->end;
> > > > > +	pgoff_t pgoff = vmg->pgoff;
> > > > > +	pgoff_t pglen = PHYS_PFN(end - start);
> > > > > +
> > > > > +	VM_WARN_ON(vmg->vma);
> > > > > +
> > > > > +	if (!prev && !next) {
> > > > > +		/*
> > > > > +		 * Since the caller must have determined that the requested
> > > > > +		 * range is empty, vmg->vmi will be left pointing at the VMA
> > > > > +		 * immediately prior.
> > > > > +		 */
> > > >
> > > > OK that's perhaps not that obvious, as it seems copy_vma() is doing some
> > > > special dance to ensure this. Should we add it to the ASSUMPTIONS and assert
> > > > it, or is there a maple tree operation we can do to ensure it, ideally if
> > > > it's very cheap if the iterator is already set the way we want it to be?
> > > >
> > >
> > > To be fair this is something that was previously assumed, and I just added
> > > a comment.
> > >
> > > Will add to assumptions, and again I think any assert should be done in
> > > such a way that under non-CONFIG_DEBUG_VM nothing happens, maybe
> > > VM_WARN_ON()?
> > >
> > > Will try to come up with something.
>
> Something like:
>
> VM_BUG_ON(vma_iter_end(vmg->vmi) > start);
>
> > >
> > > > > +		next = vmg->next = vma_next(vmg->vmi);
>
> and:
>
> VM_BUG_ON(vma_iter_addr(vmg->vmi) < end);
>

Ack x2.

Thought VM_BUG_ON() was 'not done' these days though... but checkpatch.pl
has become rather hit or miss as to what should be given attention to or
not.

> > > > > +		prev = vmg->prev = vma_prev(vmg->vmi);
> > > > > +
> > > > > +		/* Avoid maple tree re-walk. */
> > > > > +		if (is_special && prev)
> > > > > +			vma_iter_next_range(vmg->vmi);
> > > >
> > > > I wish I knew what this did but seems it's the same as the old code did so
> > > > hopefully that's fine.
> > >
> > > I think point is that we are about to exit, so we'd be left pointing at
> > > prev. But since we're exiting in just a second, we want to be pointing at
> > > the next vma which will become the prev of the next merge attempt.
> > >
> > > Liam can maybe elucidate further.
> >
> > What you have to remember is that the vma iterator (vmg->vmi above),
> > contains (or, basically is) a maple state (usually written as mas).  We
> > keep state of the maple tree walker so that we don't have to keep
> > re-walking to find the same thing.  We move around the tree with this
> > maple state because going prev/next is faster from leaves (almost always
> > just the next thing in the nodes array of pointers).
> >
> > We use the maple state to write as well, so the maple state needs to
> > point to the correct location in the tree for a write.
> >
> > The maple tree is a range-based tree, so each entry exists for a span of
> > values.  A write happens at the lowest index and can overwrite
> > subsequent values.  This means that the maple state needs to point to
> > the range containing the lowest index for the write (if it's pointing to
> > a node - it could walk from the top).
> >
> > A side effect of writing to the lowest index is that we need to point to
> > the previous vma if we are going to 'expand' the vma.  The range is
> > essentially going to be from prev->start to "whatever we are expanding
> > over".
> >
> > In the old code, the vm_flags & VM_SPECIAL code meant there was no way
> > an expansion was going to happen, but we've moved the maple state to the
> > wrong location for a write of a new vma - so this vma_iter_next_range()
> > just moves it back.  Then we "goto cannot_expand".
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ