lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241218100601.GI12500@noisy.programming.kicks-ass.net>
Date: Wed, 18 Dec 2024 11:06:01 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Suren Baghdasaryan <surenb@...gle.com>
Cc: akpm@...ux-foundation.org, willy@...radead.org, liam.howlett@...cle.com,
	lorenzo.stoakes@...cle.com, mhocko@...e.com, vbabka@...e.cz,
	hannes@...xchg.org, mjguzik@...il.com, oliver.sang@...el.com,
	mgorman@...hsingularity.net, david@...hat.com, peterx@...hat.com,
	oleg@...hat.com, dave@...olabs.net, paulmck@...nel.org,
	brauner@...nel.org, dhowells@...hat.com, hdanton@...a.com,
	hughd@...gle.com, lokeshgidra@...gle.com, minchan@...gle.com,
	jannh@...gle.com, shakeel.butt@...ux.dev, souravpanda@...gle.com,
	pasha.tatashin@...een.com, klarasmodin@...il.com, corbet@....net,
	linux-doc@...r.kernel.org, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, kernel-team@...roid.com
Subject: Re: [PATCH v6 10/16] mm: replace vm_lock and detached flag with a
 reference count

On Wed, Dec 18, 2024 at 10:41:04AM +0100, Peter Zijlstra wrote:
> On Tue, Dec 17, 2024 at 08:27:46AM -0800, Suren Baghdasaryan wrote:
> 
> > > So I just replied there, and no, I don't think it makes sense. Just put
> > > the kmem_cache_free() in vma_refcount_put(), to be done on 0.
> > 
> > That's very appealing indeed and makes things much simpler. The
> > problem I see with that is the case when we detach a vma from the tree
> > to isolate it, then do some cleanup and only then free it. That's done
> > in vms_gather_munmap_vmas() here:
> > https://elixir.bootlin.com/linux/v6.12.5/source/mm/vma.c#L1240 and we
> > even might reattach detached vmas back:
> > https://elixir.bootlin.com/linux/v6.12.5/source/mm/vma.c#L1312. IOW,
> > detached state is not final and we can't destroy the object that
> > reached this state. 
> 
> Urgh, so that's the munmap() path, but arguably when that fails, the
> map stays in place.
> 
> I think this means you're marking detached too soon; you should only
> mark detached once you reach the point of no return.
> 
> That said, once you've reached the point of no return; and are about to
> go remove the page-tables, you very much want to ensure a lack of
> concurrency.
> 
> So perhaps waiting for out-standing readers at this point isn't crazy.
> 
> Also, I'm having a very hard time reading this maple tree stuff :/
> Afaict vms_gather_munmap_vmas() only adds the VMAs to be removed to a
> second tree, it does not in fact unlink them from the mm yet.
> 
> AFAICT it's vma_iter_clear_gfp() that actually wipes the vmas from the
> mm -- and that being able to fail is mind boggling and I suppose is what
> gives rise to much of this insanity :/
> 
> Anyway, I would expect remove_vma() to be the one that marks it detached
> (it's already unreachable through vma_lookup() at this point) and there
> you should wait for concurrent readers to bugger off.

Also, I think vma_start_write() in that gather look is too early, you're
not actually going to change the VMA yet -- with obvious exception of
the split cases.

That too should probably come after you've passes all the fail/unwind
spots.

Something like so perhaps? (yeah, I know, I wrecked a bunch)

diff --git a/mm/vma.c b/mm/vma.c
index 8e31b7e25aeb..45d43adcbb36 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -1173,6 +1173,11 @@ static void vms_complete_munmap_vmas(struct vma_munmap_struct *vms,
 	struct vm_area_struct *vma;
 	struct mm_struct *mm;
 
+	mas_for_each(mas_detach, vma, ULONG_MAX) {
+		vma_start_write(next);
+		vma_mark_detached(next, true);
+	}
+
 	mm = current->mm;
 	mm->map_count -= vms->vma_count;
 	mm->locked_vm -= vms->locked_vm;
@@ -1219,9 +1224,6 @@ static void reattach_vmas(struct ma_state *mas_detach)
 	struct vm_area_struct *vma;
 
 	mas_set(mas_detach, 0);
-	mas_for_each(mas_detach, vma, ULONG_MAX)
-		vma_mark_detached(vma, false);
-
 	__mt_destroy(mas_detach->tree);
 }
 
@@ -1289,13 +1291,11 @@ static int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
 			if (error)
 				goto end_split_failed;
 		}
-		vma_start_write(next);
 		mas_set(mas_detach, vms->vma_count++);
 		error = mas_store_gfp(mas_detach, next, GFP_KERNEL);
 		if (error)
 			goto munmap_gather_failed;
 
-		vma_mark_detached(next, true);
 		nrpages = vma_pages(next);
 
 		vms->nr_pages += nrpages;
@@ -1431,14 +1431,17 @@ int do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	struct vma_munmap_struct vms;
 	int error;
 
+	error = mas_preallocate(vmi->mas);
+	if (error)
+		goto gather_failed;
+
 	init_vma_munmap(&vms, vmi, vma, start, end, uf, unlock);
 	error = vms_gather_munmap_vmas(&vms, &mas_detach);
 	if (error)
 		goto gather_failed;
 
 	error = vma_iter_clear_gfp(vmi, start, end, GFP_KERNEL);
-	if (error)
-		goto clear_tree_failed;
+	VM_WARN_ON(error);
 
 	/* Point of no return */
 	vms_complete_munmap_vmas(&vms, &mas_detach);

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ