lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230516164907.5bf2qe7nn5bo7iaa@revolver>
Date:   Tue, 16 May 2023 12:49:07 -0400
From:   "Liam R. Howlett" <Liam.Howlett@...cle.com>
To:     Peter Xu <peterx@...hat.com>
Cc:     Lorenzo Stoakes <lstoakes@...il.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mike Rapoport <rppt@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Christian Brauner <brauner@...nel.org>
Subject: Re: [PATCH v2] mm: userfaultfd: avoid passing an invalid range to
 vma_merge()

* Peter Xu <peterx@...hat.com> [230516 11:06]:

...

> > > This seems like you're relying on:-
> > >
> > >   ***
> > > CCCCCNNNNN -> CCNNNNNNNN
> 
> I had a closer look today, I still think this patch is not really the right
> one.  The split/merge order is something we use everywhere and I am not
> convinced it must change as drastic.  At least so far it still seems to me
> if we do with what current patch proposed we can have vma fragmentations.
> 
> I think I see what you meant, but here I think it's a legal case where we
> should have PPPP rather than CCCC (PPPPPNNNN --> PPNNNNNNNN).
> 
> To be explicit, for register I think it _should_ be the case 0 where we
> cannot merge (note: the current code is indeed wrong though, see below):
> 
>    ****
>   PPPPPPNNNNNN
>   cannot merge
> 
> While for the unregister case here it's case 4:
> 
>     ****
>   PPPPPPNNNNNN
>   might become
>   PPNNNNNNNNNN
>   case 4 below
> 
> Here the problem is not that we need to do split then merge (I think it'll
> have the problem of vma defragmentation here), the problem is we simply
> passed in the wrong "prev" vma pointer, IMHO.  I've patches attached
> showing what I meant.
> 
> I checked the original commit from Andrea and I found that it _was_ correct:
> 
> commit 86039bd3b4e6a1129318cbfed4e0a6e001656635
> Author: Andrea Arcangeli <aarcange@...hat.com>
> Date:   Fri Sep 4 15:46:31 2015 -0700
> 
>     userfaultfd: add new syscall to provide memory externalization
> 
> I had a feeling that it's broken during the recent rework on vma (or maybe
> even not that close), but I'm not yet sure and need to further check.

I believe this was 69dbe6daf1041e32e003f966d71f70f20c63af53, which is my
work on this area.  Any patches will need to be backported to before
the vma iterator for 6.1

I think keeping the asserts and ensuring we are calling vma_merge() with
arguments that make sense is safer.

> 
> > >
> > > By specifying parameters that are compatible with N even though you're only
> > > partially spanning C?
> > >
> > > This is crazy, and isn't how this should be used. vma_merge() is not
> > > supposed to do partial merges. If it works (presumably it does) this is not
> > > by design unless I've lost my mind and I (and others) have somehow not
> > > noticed this??
> > >
> > > I think you're right that now we'll end up with more fragmentation, but
> > > what you're suggesting is not how vma_merge() is supposed to work.
> > >
> > > As I said above, giving vma_merge() invalid parameters is very dangerous as
> > > you could end up merging over empty ranges in theory (and could otherwise
> > > have corruption).
> > >
> > > I guess we should probably be passing 0 to the last parameter in
> > > split_vma() here then to ensure we do a merge pass too. Will experiment
> > > with this.
> > >
> > > I'm confused as to how the remove from case 8 is not proceeding. I'll look
> > > into this some more...
> > >
> > > Happy to be corrected if I'm misconstruing this!
> > >
> > 
> > OK, so I wrote a small program to do perform exactly this case [0] and it seems
> > that the outcome is the same before and after this patch - vma_merge() is
> > clearly rejecting the case 8 merge (phew!) and in both instances you end up with
> > 3 VMAs.
> > 
> > So this patch doesn't change this behaviour and everything is as it was
> > before. Ideally we'd let it go for another pass, so maybe we should change the
> > split to add a new VMA _afterwards_. Will experiment with that, separately.
> > 
> > But looks like the patch is good as it is.
> > 
> > (if you notice something wrong with the repro, etc. do let me know!)
> > 
> > [0]: https://gist.github.com/lorenzo-stoakes/a11a10f5f479e7a977fc456331266e0e
> 
> I think your test case is fine, but... no, this is still not expected. The
> vma should just merge.
> 
> So I have another closer look on this specific issue, it seems we have a
> long standing bug on pgoff calculation here when passing that to
> vma_merge().  I've got another patch attached to show what I meant.
> 
> To summarize.. now I've got two patches attached:
> 
> Patch 1 is something I'd like to propose to replace this patch that fixes
> incorrect use of vma_merge() so it should also eliminate the assertion
> being triggered (I still think this is a regression but I need to check
> which I will do later; I'm not super familiar with maple tree work, maybe
> you or Liam can quickly spot).
> 
> Patch 2 fixes the long standing issue of vma not being able to merge in
> above case, and with patch 2 applied it should start merging all right.
> 
> Please have a look, thanks.
> 
> ---8<---
> From 6bc39028bba246394bb0bafdaeaab7b8dfd1694f Mon Sep 17 00:00:00 2001
> From: Peter Xu <peterx@...hat.com>
> Date: Tue, 16 May 2023 09:03:22 -0400
> Subject: [PATCH 1/2] mm/uffd: Fix vma operation where start addr cuts part of
>  vma
> 
> It seems vma merging with uffd paths is broken with either
> register/unregister, where right now we can feed wrong parameters to
> vma_merge() and it's found by recent patch which moved asserts upwards in
> vma_merge():
> 
> https://lore.kernel.org/all/ZFunF7DmMdK05MoF@FVFF77S0Q05N.cambridge.arm.com/
> 
> The problem is in the current code base we didn't fixup "prev" for the case
> where "start" address can be within the "prev" vma section.  In that case
> we should have "prev" points to the current vma rather than the previous
> one when feeding to vma_merge().
> 
> This will eliminate the report and make sure vma_merge() calls will become
> legal again.
> 
> Signed-off-by: Peter Xu <peterx@...hat.com>
> ---
>  fs/userfaultfd.c | 27 +++++++++++++++++++++------
>  1 file changed, 21 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 0fd96d6e39ce..7eb88bc74d00 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -1458,10 +1458,17 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
>  	BUG_ON(!found);
>  
>  	vma_iter_set(&vmi, start);
> -	prev = vma_prev(&vmi);
> +	vma = vma_find(&vmi, end);
> +	if (!vma)
> +		goto out_unlock;
> +
> +	if (vma->vm_start < start)
> +		prev = vma;
> +	else
> +		prev = vma_prev(&vmi);
>  
>  	ret = 0;
> -	for_each_vma_range(vmi, vma, end) {
> +	do {

The iterator may be off by one, depending on if vma_prev() is called or
not.

Perhaps:
	prev = vma_prev(&vmi); /* could be wrong, or null */
	vma = vma_find(&vmi, end);
	if (!vma)
		goto out_unlock;

	if (vma->vm_start < start)
		prev = vma;

now we know we are at vma with the iterator..
	ret = 0
	do{
	...


>  		cond_resched();
>  
>  		BUG_ON(!vma_can_userfault(vma, vm_flags));
> @@ -1517,7 +1524,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
>  	skip:
>  		prev = vma;
>  		start = vma->vm_end;
> -	}
> +	} for_each_vma_range(vmi, vma, end);
>  
>  out_unlock:
>  	mmap_write_unlock(mm);
> @@ -1624,9 +1631,17 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
>  	BUG_ON(!found);
>  
>  	vma_iter_set(&vmi, start);
> -	prev = vma_prev(&vmi);
> +	vma = vma_find(&vmi, end);
> +	if (!vma)
> +		goto out_unlock;
> +
> +	if (vma->vm_start < start)
> +		prev = vma;
> +	else
> +		prev = vma_prev(&vmi);
> +
>  	ret = 0;
> -	for_each_vma_range(vmi, vma, end) {
> +	do {
>  		cond_resched();
>  
>  		BUG_ON(!vma_can_userfault(vma, vma->vm_flags));
> @@ -1692,7 +1707,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
>  	skip:
>  		prev = vma;
>  		start = vma->vm_end;
> -	}
> +	} for_each_vma_range(vmi, vma, end);
>  
>  out_unlock:
>  	mmap_write_unlock(mm);
> -- 
> 2.39.1
> 
> From bf61f3c937e9e2ab96ab2bed0005cb7dc74db231 Mon Sep 17 00:00:00 2001
> From: Peter Xu <peterx@...hat.com>
> Date: Tue, 16 May 2023 09:39:38 -0400
> Subject: [PATCH 2/2] mm/uffd: Allow vma to merge as much as possible
> 
> We used to not pass in the pgoff correctly when register/unregister uffd
> regions, it caused incorrect behavior on vma merging.
> 
> For example, when we have:
> 
>   vma1(range 0-9, with uffd), vma2(range 10-19, no uffd)
> 
> Then someone unregisters uffd on range (5-9), it should become:
> 
>   vma1(range 0-4, with uffd), vma2(range 5-19, no uffd)
> 
> But with current code it's:
> 
>   vma1(range 0-4, with uffd), vma3(range 5-9, no uffd), vma2(range 10-19, no uffd)
> 
> This patch allows such merge to happen correctly.
> 
> This behavior seems to have existed since the 1st day of uffd, keep it just
> as a performance optmization and not copy stable.
> 
> Cc: Andrea Arcangeli <aarcange@...hat.com>
> Cc: Mike Rapoport (IBM) <rppt@...nel.org>
> Signed-off-by: Peter Xu <peterx@...hat.com>
> ---
>  fs/userfaultfd.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 7eb88bc74d00..891048b4799f 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -1332,6 +1332,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
>  	bool basic_ioctls;
>  	unsigned long start, end, vma_end;
>  	struct vma_iterator vmi;
> +	pgoff_t pgoff;
>  
>  	user_uffdio_register = (struct uffdio_register __user *) arg;
>  
> @@ -1489,8 +1490,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
>  		vma_end = min(end, vma->vm_end);
>  
>  		new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags;
> +		pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);

I don't think this is safe.  You are telling vma_merge() something that
is not true and will result in can_vma_merge_before() passing.  I mean,
sure it will become true after you split (unless you can't?), but I
don't know if you can just merge a VMA that doesn't pass
can_vma_merge_before(), even for a short period?

>  		prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
> -				 vma->anon_vma, vma->vm_file, vma->vm_pgoff,
> +				 vma->anon_vma, vma->vm_file, pgoff,
>  				 vma_policy(vma),
>  				 ((struct vm_userfaultfd_ctx){ ctx }),
>  				 anon_vma_name(vma));
> @@ -1570,6 +1572,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
>  	unsigned long start, end, vma_end;
>  	const void __user *buf = (void __user *)arg;
>  	struct vma_iterator vmi;
> +	pgoff_t pgoff;
>  
>  	ret = -EFAULT;
>  	if (copy_from_user(&uffdio_unregister, buf, sizeof(uffdio_unregister)))
> @@ -1677,8 +1680,9 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
>  			uffd_wp_range(vma, start, vma_end - start, false);
>  
>  		new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
> +		pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
>  		prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
> -				 vma->anon_vma, vma->vm_file, vma->vm_pgoff,
> +				 vma->anon_vma, vma->vm_file, pgoff,
>  				 vma_policy(vma),
>  				 NULL_VM_UFFD_CTX, anon_vma_name(vma));
>  		if (prev) {
> -- 
> 2.39.1
> ---8<---
> 
> -- 
> Peter Xu
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ