lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <de29fb6a-625e-4950-98ca-490535e07843@lucifer.local>
Date: Thu, 17 Oct 2024 18:46:53 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Oliver Sang <oliver.sang@...el.com>
Subject: Re: [PATCH hotfix 6.12 1/2] mm/vma: add expand-only VMA merge mode
 and optimise do_brk_flags()

On Thu, Oct 17, 2024 at 01:41:15PM -0400, Liam R. Howlett wrote:
> * Lorenzo Stoakes <lorenzo.stoakes@...cle.com> [241017 10:31]:
> > We know in advance that do_brk_flags() wants only to perform a VMA
> > expansion (if the prior VMA is compatible), and that we assume no mergeable
> > VMA follows it.
> >
> > These are the semantics of this function prior to the recent rewrite of the
> > VMA merging logic, however we are now doing more work than necessary -
> > positioning the VMA iterator at the prior VMA and performing tasks that are
> > not required.
> >
> > Add a new field to the vmg struct to permit merge flags and add a new merge
> > flag VMG_FLAG_JUST_EXPAND which implies this behaviour, and have
> > do_brk_flags() use this.
>
> Funny, I was thinking we could do this for relocate_vma_down() as well,
> bu that's expanding in the wrong direction so we'd have to add
> VMG_FLAG_JUST_EXPAND_DOWN, or the like.  I'm not sure it's worth doing
> since the expand down happens a lot less often.

Yeah I think we have to carefully profile these and go case-by-case. My
initial series to fix this made things worse :P as usual in perf developer
instinct (in this case mine) is often miles off.

>
> >
> > This fixes a reported performance regression in a brk() benchmarking suite.
> >
> > Reported-by: kernel test robot <oliver.sang@...el.com>
> > Closes: https://lore.kernel.org/linux-mm/202409301043.629bea78-oliver.sang@intel.com
> > Fixes: cacded5e42b9 ("mm: avoid using vma_merge() for new VMAs")
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
>
> Reviewed-by: Liam R. Howlett <Liam.Howlett@...cle.com>
>
> > ---
> >  mm/mmap.c |  3 ++-
> >  mm/vma.c  | 23 +++++++++++++++--------
> >  mm/vma.h  | 14 ++++++++++++++
> >  3 files changed, 31 insertions(+), 9 deletions(-)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index dd4b35a25aeb..4a13d9f138f6 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1744,7 +1744,8 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
> >  		VMG_STATE(vmg, mm, vmi, addr, addr + len, flags, PHYS_PFN(addr));
> >
> >  		vmg.prev = vma;
> > -		vma_iter_next_range(vmi);
> > +		/* vmi is positioned at prev, which this mode expects. */
> > +		vmg.merge_flags = VMG_FLAG_JUST_EXPAND;
> >
> >  		if (vma_merge_new_range(&vmg))
> >  			goto out;
> > diff --git a/mm/vma.c b/mm/vma.c
> > index 4737afcb064c..b21ffec33f8e 100644
> > --- a/mm/vma.c
> > +++ b/mm/vma.c
> > @@ -917,6 +917,7 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
> >  	pgoff_t pgoff = vmg->pgoff;
> >  	pgoff_t pglen = PHYS_PFN(end - start);
> >  	bool can_merge_left, can_merge_right;
> > +	bool just_expand = vmg->merge_flags & VMG_FLAG_JUST_EXPAND;
> >
> >  	mmap_assert_write_locked(vmg->mm);
>
> Could we detect the wrong location by ensuring that the vma iterator is
> positioned at prev?
>
> 	VM_WARN_ON(just_expand && prev && prev->vm_end != vma_iter_end(vmg->vmi);
>
> or, since vma_iter_addr is used above already..
>
> 	VM_WARN_ON(just_expand && prev && prev->start != vma_iter_addr(vmg->vmi);
>
>
> Does it make sense to just expand without a prev?  Should that be
> checked separately?
>
> Anyways, I think it's safer to keep these checks out of the regression
> fix, but maybe better to add them later?

Yeah I do think it'd be worth adding some specific ones. But yeah perhaps
let's add those later!

>
>
> >  	VM_WARN_ON(vmg->vma);
> > @@ -930,7 +931,7 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
> >  		return NULL;
> >
> >  	can_merge_left = can_vma_merge_left(vmg);
> > -	can_merge_right = can_vma_merge_right(vmg, can_merge_left);
> > +	can_merge_right = !just_expand && can_vma_merge_right(vmg, can_merge_left);
> >
> >  	/* If we can merge with the next VMA, adjust vmg accordingly. */
> >  	if (can_merge_right) {
> > @@ -953,7 +954,11 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
> >  		if (can_merge_right && !can_merge_remove_vma(next))
> >  			vmg->end = end;
> >
> > -		vma_prev(vmg->vmi); /* Equivalent to going to the previous range */
> > +		/* In expand-only case we are already positioned at prev. */
> > +		if (!just_expand) {
> > +			/* Equivalent to going to the previous range. */
> > +			vma_prev(vmg->vmi);
> > +		}
> >  	}
> >
> >  	/*
> > @@ -967,12 +972,14 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
> >  	}
> >
> >  	/* If expansion failed, reset state. Allows us to retry merge later. */
> > -	vmg->vma = NULL;
> > -	vmg->start = start;
> > -	vmg->end = end;
> > -	vmg->pgoff = pgoff;
> > -	if (vmg->vma == prev)
> > -		vma_iter_set(vmg->vmi, start);
> > +	if (!just_expand) {
> > +		vmg->vma = NULL;
> > +		vmg->start = start;
> > +		vmg->end = end;
> > +		vmg->pgoff = pgoff;
> > +		if (vmg->vma == prev)
> > +			vma_iter_set(vmg->vmi, start);
> > +	}
> >
> >  	return NULL;
> >  }
> > diff --git a/mm/vma.h b/mm/vma.h
> > index 819f994cf727..8c6ecc0dfbf6 100644
> > --- a/mm/vma.h
> > +++ b/mm/vma.h
> > @@ -59,6 +59,17 @@ enum vma_merge_state {
> >  	VMA_MERGE_SUCCESS,
> >  };
> >
> > +enum vma_merge_flags {
> > +	VMG_FLAG_DEFAULT = 0,
> > +	/*
> > +	 * If we can expand, simply do so. We know there is nothing to merge to
> > +	 * the right. Does not reset state upon failure to merge. The VMA
> > +	 * iterator is assumed to be positioned at the previous VMA, rather than
> > +	 * at the gap.
> > +	 */
> > +	VMG_FLAG_JUST_EXPAND = 1 << 0,
> > +};
> > +
> >  /* Represents a VMA merge operation. */
> >  struct vma_merge_struct {
> >  	struct mm_struct *mm;
> > @@ -75,6 +86,7 @@ struct vma_merge_struct {
> >  	struct mempolicy *policy;
> >  	struct vm_userfaultfd_ctx uffd_ctx;
> >  	struct anon_vma_name *anon_name;
> > +	enum vma_merge_flags merge_flags;
> >  	enum vma_merge_state state;
> >  };
> >
> > @@ -99,6 +111,7 @@ static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
> >  		.flags = flags_,					\
> >  		.pgoff = pgoff_,					\
> >  		.state = VMA_MERGE_START,				\
> > +		.merge_flags = VMG_FLAG_DEFAULT,			\
> >  	}
> >
> >  #define VMG_VMA_STATE(name, vmi_, prev_, vma_, start_, end_)	\
> > @@ -118,6 +131,7 @@ static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
> >  		.uffd_ctx = vma_->vm_userfaultfd_ctx,		\
> >  		.anon_name = anon_vma_name(vma_),		\
> >  		.state = VMA_MERGE_START,			\
> > +		.merge_flags = VMG_FLAG_DEFAULT,		\
> >  	}
> >
> >  #ifdef CONFIG_DEBUG_VM_MAPLE_TREE
> > --
> > 2.46.2

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ