lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cf72fda8-5fc5-437b-a290-6a2c883e6adf@lucifer.local>
Date: Wed, 2 Oct 2024 19:28:06 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Bert Karwatzki <spasswolf@....de>
Cc: "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v8 14/21] mm/mmap: Avoid zeroing vma tree in mmap_region()

On Wed, Oct 02, 2024 at 06:19:18PM GMT, Lorenzo Stoakes wrote:

[snip]

>
> Current status - I litearlly cannot repro this even doing exactly what you're
> doing, so I wonder if your exact GPU or a file system you're using or something
> is a factor here and there's something which implements a custom .mmap callback
> or vm_ops->close() that is somehow interfacing with this, or if this being a
> file thing is somehow a factor.
>
> Recreating the scenario as best I can with anon mappings, it seems immediately
> before it triggers we are in the position on the left in the range with the
> problematic node, and then immediately after we are in the right (plus an
> invalid entry/pivot for 0x68000000).
>
> The final action that triggers the problem is mapping [0x1b90000, 0x1bae000)
> PROT_NONE, MAP_RESERVE which merges with A and D, and we unmap B and C:
>
> 01740000-017c0000 ---p 00000000 00:00 0       01740000-017c0000 ---p 00000000 00:00 0
> 017c0000-01b40000 rw-p 00000000 00:00 0	      017c0000-01b40000 rw-p 00000000 00:00 0
> 01b40000-01b50000 ---p 00000000 00:00 0	      01b40000-01b50000 ---p 00000000 00:00 0
> 01b50000-01b56000 rw-p 00000000 00:00 0	      01b50000-01b56000 rw-p 00000000 00:00 0
> 01b56000-01b60000 ---p 00000000 00:00 0	      01b56000-01b60000 ---p 00000000 00:00 0
> 01b60000-01b70000 ---p 00000000 00:00 0	      01b60000-01b70000 ---p 00000000 00:00 0
> 01b70000-01b80000 ---p 00000000 00:00 0	      01b70000-01b80000 ---p 00000000 00:00 0
> 01b80000-01b86000 rw-p 00000000 00:00 0	      01b80000-01b86000 rw-p 00000000 00:00 0
> 01b86000-01b90000 ---p 00000000 00:00 0 * A   01b86000-68000000 ---p 00000000 00:00 0
> 01b90000-01b91000 rwxp 00000000 00:00 0 * B   < invalid 0x68000000 entry/pivot >
> 01b91000-01bae000 rw-p 00000000 00:00 0 * C
> 01bae000-68000000 ---p 00000000 00:00 0 * D
>
> It seems based on some of the VMA flags that we _must_ be mapping files here,
> e.g. some have VM_EXEC and others are mising VM_MAYREAD which indicates a
> read-only file mapping. Probably given low addresses we are setting up a binary
> set of mappings or such? Would align with PROT_NONE mappings also.
>
> This really makes me think, combined with the fact I really _cannot_ repro this
> (on intel GPU hardware and ext4 file system) that there are some 'special'
> mappings going on here.
>
> The fact we're unmapping 2 VMAs and then removing a final one in a merge does
> suggest something is going wrong in the interaction between these two events.
>
> I wonder if the merge logic is possibly struggling with the (detached but
> present) VMAs still being there as we try to expand an existing VMA?
>
> Though my money's on a call_mmap() or .close() call doing something weird here.
>
> Investigation carries on...

Hey Bert - sorry to be a pain, but try as I might I cannot repro this.

I've attached a quite thorough hacky printk patch here, it's going to
generate a ton of noise, so I really think this one has to be a link to an
off-list dmesg or we're going to break lei again, sorry Andrew.

If you could repro with this patch applied + the usual debug config
settings and send it back I'd appreciate it!

This should hopefully eek out a little more information to help figure
things out.

Also if you could share your .config, ulimit -a and
/proc/sys/vm/max_map_count that'd be great too, thanks!

Again, much much appreciated.

Cheers, Lorenzo

----8<----
>From d85fb5d2fd096e84681bdb6da8b5d37f0464ff84 Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Date: Wed, 2 Oct 2024 09:19:28 +0100
Subject: [PATCH] hack: mm: see if we can get some more information

Add some dreadful printk() hacks so we can try to get some more information
on what's going on.
---
 mm/internal.h | 15 ++++++++++++
 mm/mmap.c     | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 mm/vma.c      | 34 ++++++++++++++++++++++++--
 3 files changed, 112 insertions(+), 3 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 93083bbeeefa..cd9414b4651d 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1443,4 +1443,19 @@ static inline void accept_page(struct page *page)
 }
 #endif /* CONFIG_UNACCEPTED_MEMORY */

+static inline bool check_interesting(unsigned long start, unsigned long end)
+{
+	const unsigned long interesting_start = 0x1740000;
+	/* Include off-by-one on purpose.*/
+	const unsigned long interesting_end = 0x68000000 + 1;
+
+	/*  interesting_start            interesting_end
+	 *          |--------------------------|
+	 *           ============================> end
+	 *        <=============================   start
+	 */
+	return end > interesting_start && /* after or overlaps... */
+		start < interesting_end;  /* ...overlaps. */
+}
+
 #endif	/* __MM_INTERNAL_H */
diff --git a/mm/mmap.c b/mm/mmap.c
index dd4b35a25aeb..8a1d5c0da86f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1341,6 +1341,18 @@ struct vm_area_struct *expand_stack(struct mm_struct *mm, unsigned long addr)
 	return vma;
 }

+static void ljs_dump(struct mm_struct *mm,
+		     unsigned long addr, unsigned long len,
+		     vm_flags_t vm_flags, bool is_unmap)
+{
+	if (!check_interesting(addr, addr + len))
+		return;
+
+	pr_err("LJS: %s mm=%p [0x%lx, 0x%lx) [vm_flags=%lu]\n",
+	       is_unmap ? "munmap" : "mmap", mm, addr, addr + len,
+		vm_flags);
+}
+
 /* do_munmap() - Wrapper function for non-maple tree aware do_munmap() calls.
  * @mm: The mm_struct
  * @start: The start address to munmap
@@ -1354,6 +1366,8 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
 {
 	VMA_ITERATOR(vmi, mm, start);

+	ljs_dump(mm, start, len, 0, true);
+
 	return do_vmi_munmap(&vmi, mm, start, len, uf, false);
 }

@@ -1375,6 +1389,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	VMA_ITERATOR(vmi, mm, addr);
 	VMG_STATE(vmg, mm, &vmi, addr, end, vm_flags, pgoff);

+	ljs_dump(mm, addr, len, vm_flags, false);
+
 	vmg.file = file;
 	/* Find the first overlapping VMA */
 	vma = vma_find(&vmi, end);
@@ -1390,6 +1406,12 @@ unsigned long mmap_region(struct file *file, unsigned long addr,

 		vmg.next = vms.next;
 		vmg.prev = vms.prev;
+
+		if (check_interesting(addr, addr + len))
+			pr_err("LJS: prev=[%lx, %lx), next=[%lx, %lx)\n",
+			       vmg.prev ? vmg.prev->vm_start : 0, vmg.prev ? vmg.prev->vm_end : 0,
+			       vmg.next ? vmg.next->vm_start : 0, vmg.next ? vmg.next->vm_end : 0);
+
 		vma = NULL;
 	} else {
 		vmg.next = vma_iter_next_rewind(&vmi, &vmg.prev);
@@ -1413,9 +1435,29 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 		vmg.flags = vm_flags;
 	}

+	if (check_interesting(addr, addr + len)) {
+		char *special = vm_flags & VM_SPECIAL ? "special" : "";
+		char *has_file = file ? "file-backed" : "";
+
+		pr_err("LJS: Interesting [%lx, %lx) flags=%lu, special=[%s] file=[%s]\n",
+		       addr, addr+len, vm_flags, special, has_file);
+	}
+
 	vma = vma_merge_new_range(&vmg);
-	if (vma)
+	if (vma) {
+		if (check_interesting(addr, addr + len)) {
+			pr_err("LJS: Merged to [%lx, %lx), addr=%lx, end=%lx\n",
+			       vma->vm_start, vma->vm_end, vma_iter_addr(&vmi),
+			       vma_iter_end(&vmi));
+
+			mt_validate(&mm->mm_mt);
+		}
+
 		goto expanded;
+	} else if (check_interesting(addr, addr + len)) {
+		pr_err("LJS: Failed to merge [%lx, %lx), reset...\n",
+		       addr, addr + len);
+	}
 	/*
 	 * Determine the object being mapped and call the appropriate
 	 * specific mapper. the address has already been validated, but
@@ -1441,6 +1483,11 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 		if (error)
 			goto unmap_and_free_vma;

+		if (check_interesting(addr, addr + len)) {
+			pr_err("LJS: call_mmap() on [%lx, %lx) old_flags=%lu new_flags=%lu new range=[%lx, %lx)\n",
+			       addr, addr + end, vm_flags, vma->vm_flags, vma->vm_start, vma->vm_end);
+		}
+
 		if (vma_is_shared_maywrite(vma)) {
 			error = mapping_map_writable(file->f_mapping);
 			if (error)
@@ -1467,6 +1514,11 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 			/* If this fails, state is reset ready for a reattempt. */
 			merge = vma_merge_new_range(&vmg);

+			if (check_interesting(addr, addr + len))
+				pr_err("LJS: flags changed for [%lx, %lx) from %lu to %lu %s",
+				       vma->vm_start, vma->vm_end, vm_flags, vma->vm_flags,
+				       merge ? "merged" : "");
+
 			if (merge) {
 				/*
 				 * ->mmap() can change vma->vm_file and fput
@@ -1510,10 +1562,18 @@ unsigned long mmap_region(struct file *file, unsigned long addr,

 	/* Lock the VMA since it is modified after insertion into VMA tree */
 	vma_start_write(vma);
+
+	if (check_interesting(addr, addr + len))
+		pr_err("LJS: mm=%p: iter store addr=%lx, end=%lx, vma=[%lx, %lx)\n",
+		       mm, vma_iter_addr(&vmi), vma_iter_end(&vmi), vma->vm_start, vma->vm_end);
+
 	vma_iter_store(&vmi, vma);
 	mm->map_count++;
 	vma_link_file(vma);

+	if (check_interesting(addr, addr + len))
+		mt_validate(&mm->mm_mt);
+
 	/*
 	 * vma_merge_new_range() calls khugepaged_enter_vma() too, the below
 	 * call covers the non-merge case.
@@ -1530,6 +1590,10 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	perf_event_mmap(vma);

 	/* Unmap any existing mapping in the area */
+
+	if (check_interesting(addr, addr + len))
+		mt_validate(&mm->mm_mt);
+
 	vms_complete_munmap_vmas(&vms, &mas_detach);

 	vm_stat_account(mm, vm_flags, pglen);
diff --git a/mm/vma.c b/mm/vma.c
index 4737afcb064c..33f80e82704b 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -1108,8 +1108,13 @@ void vms_clean_up_area(struct vma_munmap_struct *vms,
 	vms_clear_ptes(vms, mas_detach, true);
 	mas_set(mas_detach, 0);
 	mas_for_each(mas_detach, vma, ULONG_MAX)
-		if (vma->vm_ops && vma->vm_ops->close)
+		if (vma->vm_ops && vma->vm_ops->close) {
+			if (check_interesting(vma->vm_start, vma->vm_end))
+				pr_err("LJS: mm=%p Closing [%lx, %lx)\n",
+				       vma->vm_mm, vma->vm_start, vma->vm_end);
+
 			vma->vm_ops->close(vma);
+		}
 	vms->closed_vm_ops = true;
 }

@@ -1179,6 +1184,10 @@ int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
 	struct vm_area_struct *next = NULL;
 	int error;

+	if (check_interesting(vms->vma->vm_start, vms->vma->vm_end))
+		pr_err("LJS2 vms->start=%lx, vms->vma->vm_start=%lx\n",
+		       vms->start, vms->vma->vm_start);
+
 	/*
 	 * If we need to split any vma, do it now to save pain later.
 	 * Does it split the first one?
@@ -1202,6 +1211,11 @@ int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
 			goto start_split_failed;
 		}

+		if (check_interesting(vms->vma->vm_start, vms->vma->vm_end))
+			pr_err("LJS: mm=%p vms=[%lx, %lx) split START of [%lx, %lx)\n",
+			       vms->vma->vm_mm, vms->start, vms->end,
+			       vms->vma->vm_start, vms->vma->vm_end);
+
 		error = __split_vma(vms->vmi, vms->vma, vms->start, 1);
 		if (error)
 			goto start_split_failed;
@@ -1217,12 +1231,23 @@ int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
 	for_each_vma_range(*(vms->vmi), next, vms->end) {
 		long nrpages;

+		if (check_interesting(vms->vma->vm_start, vms->vma->vm_end))
+			pr_err("LJS: mm=%p vms=[%lx, %lx) UNMAP [%lx, %lx)\n",
+			       vms->vma->vm_mm, vms->start, vms->end,
+			       next->vm_start, next->vm_end);
+
 		if (!can_modify_vma(next)) {
 			error = -EPERM;
 			goto modify_vma_failed;
 		}
 		/* Does it split the end? */
 		if (next->vm_end > vms->end) {
+
+			if (check_interesting(next->vm_start, next->vm_end))
+				pr_err("LJS: mm=%p vms=[%lx, %lx) split END of [%lx, %lx)\n",
+				       next->vm_mm, vms->start, vms->end,
+				       next->vm_start, next->vm_end);
+
 			error = __split_vma(vms->vmi, next, vms->end, 0);
 			if (error)
 				goto end_split_failed;
@@ -1295,9 +1320,14 @@ int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
 	}
 #endif

-	while (vma_iter_addr(vms->vmi) > vms->start)
+	while (vma_iter_addr(vms->vmi) > vms->start) {
 		vma_iter_prev_range(vms->vmi);

+		if (check_interesting(vms->vma->vm_start, vms->vma->vm_end))
+			pr_err("LJS3: addr=%lx, vms->start=%lx\n",
+			       vma_iter_addr(vms->vmi), vms->start);
+	}
+
 	vms->clear_ptes = true;
 	return 0;

--
2.46.2

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ