lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3548ca67-ce58-3bc6-fef5-348b98d7678b@google.com>
Date:   Fri, 1 Sep 2023 00:29:07 -0700 (PDT)
From:   Hugh Dickins <hughd@...gle.com>
To:     Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>
cc:     Hugh Dickins <hughd@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Bagas Sanjaya <bagasdotme@...il.com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        regressions@...ts.linux.dev
Subject: Re: 6.6/regression/bisected - after commit a349d72fd9efc87c8fd1d16d3164752d84a7275b
 system stopped booting

On Fri, 1 Sep 2023, Mikhail Gavrilov wrote:

> Hi,
> next release cycle, and another regression.
> Yesterday after another kernel update in Fedora Rawhide system stopped booting.

Many thanks for reporting, Mike: I'm sorry that it never showed up
while in linux-next, leaving you to be the one to hit it again.

> Today thanks to git bisect, I found out that this is a commit:
> 
> ❯ git bisect bad
> a349d72fd9efc87c8fd1d16d3164752d84a7275b is the first bad commit
> commit a349d72fd9efc87c8fd1d16d3164752d84a7275b
> Author: Hugh Dickins <hughd@...gle.com>
> Date:   Tue Jul 11 21:30:40 2023 -0700
> 
>     mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s
...
>     Before putting them to use (several commits later), add rcu_read_lock() to
>     pte_offset_map(), and rcu_read_unlock() to pte_unmap().  Make this a
>     separate commit, since it risks exposing imbalances: prior commits have
>     fixed all the known imbalances, but we may find some have been missed.

I assume that it is such an imbalance - somewhere omitting to
pte_unmap() after a pte_offset_map(); but I cannot see where.

> It looks like the hang happens so early that when booting into a
> working kernel and running "journalctl -b -1" I see in the console the
> log of the previous kernel which was booted before the problematic
> kernel.
> Therefore, I apologize that I can't provide the kernel logs.
> I can provides only photos when backtrace appears on my monitor:
> Here we waiting: https://ibb.co/5xmm0BH
> And then I see backtrace: https://ibb.co/TLLGFNP
> 
> Unfortunately I can't revert commit
> a349d72fd9efc87c8fd1d16d3164752d84a7275b for testing more fresh builds
> because of conflicts.
> 
> My hardware: https://linux-hardware.org/?probe=dd5735f315
> I also attached kernel build config and full bisect log.

Thanks for all the info, which has helped in several ways.  The only
thing I can do is to offer you a debug (and then keep running) patch -
suitable for the config you showed there, not for anyone else's config.

I've never used stackdepot before, but I've tried this out in good and
bad cases, and expect it to work for you, shedding light on where is
going wrong - machine should boot up fine, and in dmesg you'll find one
stacktrace between "WARNING: pte_map..." and "End of pte_map..." lines.

To apply on top of a349d72fd9ef ("mm/pgtable: add rcu_read_lock() and
rcu_read_unlock()s"), the bad end point of your bisection; but if you
prefer, I can provide a version to go on top of whatever later Linus
commit suits you.

Patch not for general consumption, just for Mike's debugging:
please report back the stacktrace shown - thanks!

Hugh

---
 include/linux/pgtable.h |  5 +----
 mm/memory.c             |  1 +
 mm/mremap.c             |  1 +
 mm/pgtable-generic.c    | 40 ++++++++++++++++++++++++++++++++++++++--
 4 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 5134edcec668..131392f1c33e 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -106,10 +106,7 @@ static inline pte_t *__pte_map(pmd_t *pmd, unsigned long address)
 {
 	return pte_offset_kernel(pmd, address);
 }
-static inline void pte_unmap(pte_t *pte)
-{
-	rcu_read_unlock();
-}
+void pte_unmap(pte_t *pte);
 #endif
 
 /* Find an entry in the second-level page table.. */
diff --git a/mm/memory.c b/mm/memory.c
index 44d11812a88f..b1ee8ab51978 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1033,6 +1033,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
 		ret = -ENOMEM;
 		goto out;
 	}
+	pte_unmap(NULL);	/* avoid warning when knowingly nested */
 	src_pte = pte_offset_map_nolock(src_mm, src_pmd, addr, &src_ptl);
 	if (!src_pte) {
 		pte_unmap_unlock(dst_pte, dst_ptl);
diff --git a/mm/mremap.c b/mm/mremap.c
index 11e06e4ab33b..56d981add487 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -175,6 +175,7 @@ static int move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd,
 		err = -EAGAIN;
 		goto out;
 	}
+	pte_unmap(NULL);	/* avoid warning when knowingly nested */
 	new_pte = pte_offset_map_nolock(mm, new_pmd, new_addr, &new_ptl);
 	if (!new_pte) {
 		pte_unmap_unlock(old_pte, old_ptl);
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index 400e5a045848..87cbdc73beda 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -232,11 +232,47 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address,
 #endif
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
+#include <linux/stacktrace.h>
+#include <linux/stackdepot.h>
+#include <linux/timekeeping.h>
+
+static depot_stack_handle_t depot_stack;
+
+static void pte_map(void)
+{
+	static bool done = false;
+	unsigned long entries[16];
+	unsigned int nr_entries;
+
+	/* rcu_read_lock(); */
+	if (raw_smp_processor_id() != 0 || done)
+		return;
+	if (depot_stack) {
+		pr_warn("WARNING: pte_map was not pte_unmapped:\n");
+		stack_depot_print(depot_stack);
+		pr_warn("End of pte_map warning.\n");
+		done = true;
+		return;
+	}
+	nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
+	depot_stack = stack_depot_save(entries, nr_entries, GFP_NOWAIT);
+	if (ktime_get_seconds() > 1800)	/* give up after half an hour */
+		done = true;
+}
+
+void pte_unmap(pte_t *pte)
+{
+	/* rcu_read_unlock(); */
+	if (raw_smp_processor_id() != 0)
+		return;
+	depot_stack = 0;
+}
+
 pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp)
 {
 	pmd_t pmdval;
 
-	rcu_read_lock();
+	pte_map();
 	pmdval = pmdp_get_lockless(pmd);
 	if (pmdvalp)
 		*pmdvalp = pmdval;
@@ -250,7 +286,7 @@ pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp)
 	}
 	return __pte_map(&pmdval, addr);
 nomap:
-	rcu_read_unlock();
+	pte_unmap(NULL);
 	return NULL;
 }
 
-- 
2.35.3

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ