lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aJHmm9WXEJqVkyTD@hyeyoo>
Date: Tue, 5 Aug 2025 20:10:19 +0900
From: Harry Yoo <harry.yoo@...cle.com>
To: Naresh Kamboju <naresh.kamboju@...aro.org>
Cc: Linux ARM <linux-arm-kernel@...ts.infradead.org>,
        linux-mm <linux-mm@...ck.org>,
        open list <linux-kernel@...r.kernel.org>, lkft-triage@...ts.linaro.org,
        Linux Regressions <regressions@...ts.linux.dev>,
        Thomas Gleixner <tglx@...utronix.de>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Anders Roxell <anders.roxell@...aro.org>,
        Arnd Bergmann <arnd@...db.de>,
        Dan Carpenter <dan.carpenter@...aro.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        "Paul E. McKenney" <paulmck@...nel.org>, Will Deacon <will@...nel.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Ben Copeland <benjamin.copeland@...aro.org>,
        kunit-dev@...glegroups.com
Subject: Re: next-20250729: PREEMPT_RT: rock Pi 4b Internal error Oops
 kmem_cache_alloc_bulk_noprof - kernel locking rtmutex.c at
 __rt_mutex_slowlock_locked

On Tue, Aug 05, 2025 at 03:37:38PM +0530, Naresh Kamboju wrote:
> On Mon, 4 Aug 2025 at 13:26, Harry Yoo <harry.yoo@...cle.com> wrote:
> >
> > On Sat, Aug 02, 2025 at 03:45:51PM +0530, Naresh Kamboju wrote:
> > > Regressions found while validating Linux next on the Radxa Rock Pi 4B
> > > platform, we observed kernel crashes and deadlock warnings when running LTP
> > > syscall and controller tests under specific PREEMPT_RT configurations.
> > > These issues appear to be regressions introduced in next-20250729.
> > >
> > > * CONFIG_EXPERT=y
> > > * CONFIG_PREEMPT_RT=y
> > > * CONFIG_LAZY_PREEMPT=y
> > >
> > > Regression Analysis:
> > > - New regression? Yes
> > > - Reproducibility? Intermittent
> > >
> > > First seen on the next-20250729
> > > Good: next-20250728
> > > Bad: next-20250729 and next-20250801
> > >
> > > Test regression: next-20250729 rock Pi 4b Internal error Oops
> > > kmem_cache_alloc_bulk_noprof
> > > Test regression: next-20250729 rock Pi 4b WARNING kernel locking
> > > rtmutex.c at __rt_mutex_slowlock_locked
> > > Test regression: next-20250729 rock Pi 4b WARNING kernel rcu
> > > tree_plugin.h at rcu_note_context_switch
> > >
> > > Reported-by: Linux Kernel Functional Testing <lkft@...aro.org>
> >
> > Thanks for the report, Naresh!
> >
> > based on the stack trace, I think there might be a use-after-free or
> > buffer overflow bug that could trigger this.
> >
> > Could you please try to reproduce it with KASAN enabled to confirm that
> > it is the case?
> 
> I have recompiled the kernel with KASAN enabled and rerun the KUNIT tests,
> along with the LTP syscall tests, in an effort to reproduce the previously
> reported issue.
> 
> While the LTP syscall tests did not reproduce the problem,

Thanks for checking it!
It is unfortunate that the error is not reproduced with KASAN :(

We can still try slab_debug=FPU or slab_debug=FPUZ boot parameter.
If we're lucky, that may help narrow down who corrupted the freelist.
Could you please give it a try if it’s not too much trouble?
It won't require rebuilding the kernel as SLUB_DEBUG is already enabled.

...and a few questions to help investigate it further:

- Is it something that is triggered only on (rock PI 4B) AND (PREEMPT_RT=y)
  AND (LAZY_PREEMPT=y), but not on other boards or the same board with
  different preemption models?

- With given infrastructure you're using, would it be reasonable to do
  bisection?

Unfortunately if the freelist chain is corrupted when we allocate objects,
it's hard to tell who it is, without further information. 

> I consistently
> observed a null pointer dereference during KUNIT testing, specifically in
> the kunit_fault test, as shown in the log below.
>
> I’ve seen this same crash across several kernel versions, and it is always
> reproducible when running KUNIT tests.
> 
> Could you please confirm if this behavior is expected from the
> kunit_fault test, or if it indicates an issue that requires further
> investigation?

I can confirm that this is an expected behavior. The test case
voluntarily dereference a NULL pointer and checks if the task was
killed because of it. The test case was added recently (since v6.10)

Thanks for your assistance!

-- 
Cheers,
Harry / Hyeonggon

> ## Boot log
> [   69.507629]     KTAP version 1
> [   69.507638]     # Subtest: kunit_fault
> [   69.507651]     # module: kunit_test
> [   69.507677]     1..1
> [   69.508631] Unable to handle kernel paging request at virtual
> address dfff800000000000
> [   69.508661] KASAN: null-ptr-deref in range
> [0x0000000000000000-0x0000000000000007]
> [   69.508676] Mem abort info:
> [   69.508684]   ESR = 0x0000000096000005
> [   69.508695]   EC = 0x25: DABT (current EL), IL = 32 bits
> [   69.508709]   SET = 0, FnV = 0
> [   69.508719]   EA = 0, S1PTW = 0
> [   69.508730]   FSC = 0x05: level 1 translation fault
> [   69.508742] Data abort info:
> [   69.508750]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
> [   69.508761]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [   69.508774]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [   69.508787] [dfff800000000000] address between user and kernel address ranges
> [   69.508804] Internal error: Oops: 0000000096000005 [#1]  SMP
> [   69.508819] Modules linked in:
> [   69.508846] CPU: 3 UID: 0 PID: 683 Comm: kunit_try_catch Tainted: G
>    B            N  6.16.0-next-20250801 #1 PREEMPT_RT
> [   69.508873] Tainted: [B]=BAD_PAGE, [N]=TEST
> [   69.508881] Hardware name: Radxa ROCK Pi 4B (DT)
> [   69.508891] pstate: 10000005 (nzcV daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   69.508907] pc : kunit_test_null_dereference+0x70/0x170
> [   69.508940] lr : kunit_generic_run_threadfn_adapter+0x88/0x100
> [   69.508959] sp : ffff80008a867d30
> [   69.508967] x29: ffff80008a867d90 x28: 0000000000000000 x27: 0000000000000000
> [   69.508992] x26: 0000000000000000 x25: 1fffe00001777601 x24: 0000000000000004
> [   69.509017] x23: ffff00000bbbb00c x22: ffff800081203028 x21: ffff00000201aa08
> [   69.509042] x20: 1ffff0001150cfa6 x19: ffff800088077970 x18: ffff800089386ed0
> [   69.509067] x17: 0000000000000001 x16: ffff0000d1660de8 x15: 0000000000000000
> [   69.509091] x14: 1fffe0001a2cc0c0 x13: 0002000000000000 x12: ffff6000022f7620
> [   69.509116] x11: 1fffe000022f761f x10: ffff6000022f761f x9 : ffff8000811fa7b8
> [   69.509141] x8 : ffff80008a867c18 x7 : 0000000000000000 x6 : 0000000041b58ab3
> [   69.509165] x5 : ffff70001150cfa6 x4 : 00000000f1f1f1f1 x3 : 0000000000000003
> [   69.509189] x2 : dfff800000000000 x1 : ffff0000117ba800 x0 : ffff800088077970
> [   69.509214] Call trace:
> [   69.509223]  kunit_test_null_dereference+0x70/0x170 (P)
> [   69.509246]  kunit_generic_run_threadfn_adapter+0x88/0x100
> [   69.509267]  kthread+0x328/0x648
> [   69.509286]  ret_from_fork+0x10/0x20
> [   69.509316] Code: b90004a3 d5384101 52800063 aa0003f3 (39c00042)
> [   69.509330] ---[ end trace 0000000000000000 ]---

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ