[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cf72fda8-5fc5-437b-a290-6a2c883e6adf@lucifer.local>
Date: Wed, 2 Oct 2024 19:28:06 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Bert Karwatzki <spasswolf@....de>
Cc: "Liam R . Howlett" <Liam.Howlett@...cle.com>,
Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v8 14/21] mm/mmap: Avoid zeroing vma tree in mmap_region()
On Wed, Oct 02, 2024 at 06:19:18PM GMT, Lorenzo Stoakes wrote:
[snip]
>
> Current status - I litearlly cannot repro this even doing exactly what you're
> doing, so I wonder if your exact GPU or a file system you're using or something
> is a factor here and there's something which implements a custom .mmap callback
> or vm_ops->close() that is somehow interfacing with this, or if this being a
> file thing is somehow a factor.
>
> Recreating the scenario as best I can with anon mappings, it seems immediately
> before it triggers we are in the position on the left in the range with the
> problematic node, and then immediately after we are in the right (plus an
> invalid entry/pivot for 0x68000000).
>
> The final action that triggers the problem is mapping [0x1b90000, 0x1bae000)
> PROT_NONE, MAP_RESERVE which merges with A and D, and we unmap B and C:
>
> 01740000-017c0000 ---p 00000000 00:00 0 01740000-017c0000 ---p 00000000 00:00 0
> 017c0000-01b40000 rw-p 00000000 00:00 0 017c0000-01b40000 rw-p 00000000 00:00 0
> 01b40000-01b50000 ---p 00000000 00:00 0 01b40000-01b50000 ---p 00000000 00:00 0
> 01b50000-01b56000 rw-p 00000000 00:00 0 01b50000-01b56000 rw-p 00000000 00:00 0
> 01b56000-01b60000 ---p 00000000 00:00 0 01b56000-01b60000 ---p 00000000 00:00 0
> 01b60000-01b70000 ---p 00000000 00:00 0 01b60000-01b70000 ---p 00000000 00:00 0
> 01b70000-01b80000 ---p 00000000 00:00 0 01b70000-01b80000 ---p 00000000 00:00 0
> 01b80000-01b86000 rw-p 00000000 00:00 0 01b80000-01b86000 rw-p 00000000 00:00 0
> 01b86000-01b90000 ---p 00000000 00:00 0 * A 01b86000-68000000 ---p 00000000 00:00 0
> 01b90000-01b91000 rwxp 00000000 00:00 0 * B < invalid 0x68000000 entry/pivot >
> 01b91000-01bae000 rw-p 00000000 00:00 0 * C
> 01bae000-68000000 ---p 00000000 00:00 0 * D
>
> It seems based on some of the VMA flags that we _must_ be mapping files here,
> e.g. some have VM_EXEC and others are mising VM_MAYREAD which indicates a
> read-only file mapping. Probably given low addresses we are setting up a binary
> set of mappings or such? Would align with PROT_NONE mappings also.
>
> This really makes me think, combined with the fact I really _cannot_ repro this
> (on intel GPU hardware and ext4 file system) that there are some 'special'
> mappings going on here.
>
> The fact we're unmapping 2 VMAs and then removing a final one in a merge does
> suggest something is going wrong in the interaction between these two events.
>
> I wonder if the merge logic is possibly struggling with the (detached but
> present) VMAs still being there as we try to expand an existing VMA?
>
> Though my money's on a call_mmap() or .close() call doing something weird here.
>
> Investigation carries on...
Hey Bert - sorry to be a pain, but try as I might I cannot repro this.
I've attached a quite thorough hacky printk patch here, it's going to
generate a ton of noise, so I really think this one has to be a link to an
off-list dmesg or we're going to break lei again, sorry Andrew.
If you could repro with this patch applied + the usual debug config
settings and send it back I'd appreciate it!
This should hopefully eek out a little more information to help figure
things out.
Also if you could share your .config, ulimit -a and
/proc/sys/vm/max_map_count that'd be great too, thanks!
Again, much much appreciated.
Cheers, Lorenzo
----8<----
>From d85fb5d2fd096e84681bdb6da8b5d37f0464ff84 Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Date: Wed, 2 Oct 2024 09:19:28 +0100
Subject: [PATCH] hack: mm: see if we can get some more information
Add some dreadful printk() hacks so we can try to get some more information
on what's going on.
---
mm/internal.h | 15 ++++++++++++
mm/mmap.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++-
mm/vma.c | 34 ++++++++++++++++++++++++--
3 files changed, 112 insertions(+), 3 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index 93083bbeeefa..cd9414b4651d 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1443,4 +1443,19 @@ static inline void accept_page(struct page *page)
}
#endif /* CONFIG_UNACCEPTED_MEMORY */
+static inline bool check_interesting(unsigned long start, unsigned long end)
+{
+ const unsigned long interesting_start = 0x1740000;
+ /* Include off-by-one on purpose.*/
+ const unsigned long interesting_end = 0x68000000 + 1;
+
+ /* interesting_start interesting_end
+ * |--------------------------|
+ * ============================> end
+ * <============================= start
+ */
+ return end > interesting_start && /* after or overlaps... */
+ start < interesting_end; /* ...overlaps. */
+}
+
#endif /* __MM_INTERNAL_H */
diff --git a/mm/mmap.c b/mm/mmap.c
index dd4b35a25aeb..8a1d5c0da86f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1341,6 +1341,18 @@ struct vm_area_struct *expand_stack(struct mm_struct *mm, unsigned long addr)
return vma;
}
+static void ljs_dump(struct mm_struct *mm,
+ unsigned long addr, unsigned long len,
+ vm_flags_t vm_flags, bool is_unmap)
+{
+ if (!check_interesting(addr, addr + len))
+ return;
+
+ pr_err("LJS: %s mm=%p [0x%lx, 0x%lx) [vm_flags=%lu]\n",
+ is_unmap ? "munmap" : "mmap", mm, addr, addr + len,
+ vm_flags);
+}
+
/* do_munmap() - Wrapper function for non-maple tree aware do_munmap() calls.
* @mm: The mm_struct
* @start: The start address to munmap
@@ -1354,6 +1366,8 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
{
VMA_ITERATOR(vmi, mm, start);
+ ljs_dump(mm, start, len, 0, true);
+
return do_vmi_munmap(&vmi, mm, start, len, uf, false);
}
@@ -1375,6 +1389,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
VMA_ITERATOR(vmi, mm, addr);
VMG_STATE(vmg, mm, &vmi, addr, end, vm_flags, pgoff);
+ ljs_dump(mm, addr, len, vm_flags, false);
+
vmg.file = file;
/* Find the first overlapping VMA */
vma = vma_find(&vmi, end);
@@ -1390,6 +1406,12 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vmg.next = vms.next;
vmg.prev = vms.prev;
+
+ if (check_interesting(addr, addr + len))
+ pr_err("LJS: prev=[%lx, %lx), next=[%lx, %lx)\n",
+ vmg.prev ? vmg.prev->vm_start : 0, vmg.prev ? vmg.prev->vm_end : 0,
+ vmg.next ? vmg.next->vm_start : 0, vmg.next ? vmg.next->vm_end : 0);
+
vma = NULL;
} else {
vmg.next = vma_iter_next_rewind(&vmi, &vmg.prev);
@@ -1413,9 +1435,29 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vmg.flags = vm_flags;
}
+ if (check_interesting(addr, addr + len)) {
+ char *special = vm_flags & VM_SPECIAL ? "special" : "";
+ char *has_file = file ? "file-backed" : "";
+
+ pr_err("LJS: Interesting [%lx, %lx) flags=%lu, special=[%s] file=[%s]\n",
+ addr, addr+len, vm_flags, special, has_file);
+ }
+
vma = vma_merge_new_range(&vmg);
- if (vma)
+ if (vma) {
+ if (check_interesting(addr, addr + len)) {
+ pr_err("LJS: Merged to [%lx, %lx), addr=%lx, end=%lx\n",
+ vma->vm_start, vma->vm_end, vma_iter_addr(&vmi),
+ vma_iter_end(&vmi));
+
+ mt_validate(&mm->mm_mt);
+ }
+
goto expanded;
+ } else if (check_interesting(addr, addr + len)) {
+ pr_err("LJS: Failed to merge [%lx, %lx), reset...\n",
+ addr, addr + len);
+ }
/*
* Determine the object being mapped and call the appropriate
* specific mapper. the address has already been validated, but
@@ -1441,6 +1483,11 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
if (error)
goto unmap_and_free_vma;
+ if (check_interesting(addr, addr + len)) {
+ pr_err("LJS: call_mmap() on [%lx, %lx) old_flags=%lu new_flags=%lu new range=[%lx, %lx)\n",
+ addr, addr + end, vm_flags, vma->vm_flags, vma->vm_start, vma->vm_end);
+ }
+
if (vma_is_shared_maywrite(vma)) {
error = mapping_map_writable(file->f_mapping);
if (error)
@@ -1467,6 +1514,11 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
/* If this fails, state is reset ready for a reattempt. */
merge = vma_merge_new_range(&vmg);
+ if (check_interesting(addr, addr + len))
+ pr_err("LJS: flags changed for [%lx, %lx) from %lu to %lu %s",
+ vma->vm_start, vma->vm_end, vm_flags, vma->vm_flags,
+ merge ? "merged" : "");
+
if (merge) {
/*
* ->mmap() can change vma->vm_file and fput
@@ -1510,10 +1562,18 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
/* Lock the VMA since it is modified after insertion into VMA tree */
vma_start_write(vma);
+
+ if (check_interesting(addr, addr + len))
+ pr_err("LJS: mm=%p: iter store addr=%lx, end=%lx, vma=[%lx, %lx)\n",
+ mm, vma_iter_addr(&vmi), vma_iter_end(&vmi), vma->vm_start, vma->vm_end);
+
vma_iter_store(&vmi, vma);
mm->map_count++;
vma_link_file(vma);
+ if (check_interesting(addr, addr + len))
+ mt_validate(&mm->mm_mt);
+
/*
* vma_merge_new_range() calls khugepaged_enter_vma() too, the below
* call covers the non-merge case.
@@ -1530,6 +1590,10 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
perf_event_mmap(vma);
/* Unmap any existing mapping in the area */
+
+ if (check_interesting(addr, addr + len))
+ mt_validate(&mm->mm_mt);
+
vms_complete_munmap_vmas(&vms, &mas_detach);
vm_stat_account(mm, vm_flags, pglen);
diff --git a/mm/vma.c b/mm/vma.c
index 4737afcb064c..33f80e82704b 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -1108,8 +1108,13 @@ void vms_clean_up_area(struct vma_munmap_struct *vms,
vms_clear_ptes(vms, mas_detach, true);
mas_set(mas_detach, 0);
mas_for_each(mas_detach, vma, ULONG_MAX)
- if (vma->vm_ops && vma->vm_ops->close)
+ if (vma->vm_ops && vma->vm_ops->close) {
+ if (check_interesting(vma->vm_start, vma->vm_end))
+ pr_err("LJS: mm=%p Closing [%lx, %lx)\n",
+ vma->vm_mm, vma->vm_start, vma->vm_end);
+
vma->vm_ops->close(vma);
+ }
vms->closed_vm_ops = true;
}
@@ -1179,6 +1184,10 @@ int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
struct vm_area_struct *next = NULL;
int error;
+ if (check_interesting(vms->vma->vm_start, vms->vma->vm_end))
+ pr_err("LJS2 vms->start=%lx, vms->vma->vm_start=%lx\n",
+ vms->start, vms->vma->vm_start);
+
/*
* If we need to split any vma, do it now to save pain later.
* Does it split the first one?
@@ -1202,6 +1211,11 @@ int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
goto start_split_failed;
}
+ if (check_interesting(vms->vma->vm_start, vms->vma->vm_end))
+ pr_err("LJS: mm=%p vms=[%lx, %lx) split START of [%lx, %lx)\n",
+ vms->vma->vm_mm, vms->start, vms->end,
+ vms->vma->vm_start, vms->vma->vm_end);
+
error = __split_vma(vms->vmi, vms->vma, vms->start, 1);
if (error)
goto start_split_failed;
@@ -1217,12 +1231,23 @@ int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
for_each_vma_range(*(vms->vmi), next, vms->end) {
long nrpages;
+ if (check_interesting(vms->vma->vm_start, vms->vma->vm_end))
+ pr_err("LJS: mm=%p vms=[%lx, %lx) UNMAP [%lx, %lx)\n",
+ vms->vma->vm_mm, vms->start, vms->end,
+ next->vm_start, next->vm_end);
+
if (!can_modify_vma(next)) {
error = -EPERM;
goto modify_vma_failed;
}
/* Does it split the end? */
if (next->vm_end > vms->end) {
+
+ if (check_interesting(next->vm_start, next->vm_end))
+ pr_err("LJS: mm=%p vms=[%lx, %lx) split END of [%lx, %lx)\n",
+ next->vm_mm, vms->start, vms->end,
+ next->vm_start, next->vm_end);
+
error = __split_vma(vms->vmi, next, vms->end, 0);
if (error)
goto end_split_failed;
@@ -1295,9 +1320,14 @@ int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
}
#endif
- while (vma_iter_addr(vms->vmi) > vms->start)
+ while (vma_iter_addr(vms->vmi) > vms->start) {
vma_iter_prev_range(vms->vmi);
+ if (check_interesting(vms->vma->vm_start, vms->vma->vm_end))
+ pr_err("LJS3: addr=%lx, vms->start=%lx\n",
+ vma_iter_addr(vms->vmi), vms->start);
+ }
+
vms->clear_ptes = true;
return 0;
--
2.46.2
Powered by blists - more mailing lists