linux-kernel - Re: PROBLEM: kernel crashes when running xfsdump since ~6.4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240620063742.7qugmebodtlogn5r@oppo.com>
Date: Thu, 20 Jun 2024 14:37:42 +0800
From: Hailong Liu <hailong.liu@...o.com>
To: Nick Bowler <nbowler@...conx.ca>
CC: <linux-kernel@...r.kernel.org>, Linux regressions mailing list
	<regressions@...ts.linux.dev>, <linux-mm@...ck.org>,
	<sparclinux@...r.kernel.org>, "Uladzislau Rezki (Sony)" <urezki@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: PROBLEM: kernel crashes when running xfsdump since ~6.4

On Thu, 20. Jun 02:19, Nick Bowler wrote:
> Hi,
>
> After upgrading my sparc to 6.9.5 I noticed that attempting to run
> xfsdump instantly (within a couple seconds) and reliably crashes the
> kernel.  The same problem is also observed on 6.10-rc4.
>
> This is a regression introduced around 6.4 timeframe.  6.3 appears
> to work fine and xfsdump goes about its business dumping stuff.
>
> Bisection implicates the following:
>
>   062eacf57ad91b5c272f89dc964fd6dd9715ea7d is the first bad commit
>   commit 062eacf57ad91b5c272f89dc964fd6dd9715ea7d
>   Author: Uladzislau Rezki (Sony) <urezki@...il.com>
>   Date:   Thu Mar 30 21:06:38 2023 +0200
>
>       mm: vmalloc: remove a global vmap_blocks xarray
>
> This reverts pretty easily on top of v6.10-rc4, as long as I first
> revert fa1c77c13ca5 ("mm: vmalloc: rename addr_to_vb_xarray() function")
> as this just causes conflicts.  Then there is one easily-corrected build
> failure (adjust the one remaining &vbq->vmap_blocks back to &vmap_blocks).
>
> If I do all of that then the kernel is not crashing anymore.
>
> A splat like this one is output on the console when the crash occurs (varies a bit):
>
>   spitfire_data_access_exception: SFSR[000000000080100d] SFAR[0000000000c51ba0], going.
>                 \|/ ____ \|/
>                 "@'/ .. \`@"
>                 /_| \__/ |_\
>                    \__U_/
>   xfsdump(2028): Dax [#1]
>   CPU: 0 PID: 2028 Comm: xfsdump Not tainted 6.9.5 #199
>   TSTATE: 0000000811001607 TPC: 0000000000974fc4 TNPC: 0000000000974fc8 Y: 00000000    Not tainted
>   TPC: <queued_spin_lock_slowpath+0x1d0/0x2cc>
>   g0: 0000000000aa9110 g1: 0000000000c51ba0 g2: 444b000000000000 g3: 0000000000c560c0
>   g4: fffff800a71a1f00 g5: fffff800bebb6000 g6: fffff800ac0ec000 g7: 0000000000040000
>   o0: 0000000000000002 o1: 00000000000007d8 o2: fffff800a4131420 o3: ffffffff0000ffff
>   o4: 00000000900a2001 o5: 0000000000c4f5a0 sp: fffff800ac0eeac1 ret_pc: 0000000000040000
>   RPC: <0x40000>
>   l0: fffff800a40098c0 l1: 0000000100800000 l2: 0000000000000000 l3: 0000000000000103
>   l4: fffff800a40081b0 l5: 0000000000aeec00 l6: fffff800a40080a0 l7: 0000000101000000
>   i0: 0000000000c4f5a0 i1: 00000000900a2001 i2: 0000000000000000 i3: fffff800bf807b80
>   i4: 0000000000000000 i5: fffff800bf807b80 i6: fffff800ac0eeb71 i7: 0000000000503438
>   I7: <vm_map_ram+0x210/0x724>
>   Call Trace:
>   [<0000000000503438>] vm_map_ram+0x210/0x724
>   [<00000000006661f8>] _xfs_buf_map_pages+0x58/0xa0
>   [<0000000000667058>] xfs_buf_get_map+0x668/0x7a4
>   [<00000000006673e0>] xfs_buf_read_map+0x20/0x160
>   [<0000000000667548>] xfs_buf_readahead_map+0x28/0x38
>   [<000000000067a4f8>] xfs_iwalk_ichunk_ra.isra.0+0xa8/0xc4
>   [<000000000067a8f0>] xfs_iwalk_ag+0x1c0/0x260
>   [<000000000067ab08>] xfs_iwalk+0xdc/0x130
>   [<0000000000679fc8>] xfs_bulkstat+0x10c/0x140
>   [<0000000000695528>] xfs_compat_ioc_fsbulkstat+0x1a4/0x1e8
>   [<000000000069572c>] xfs_file_compat_ioctl+0x8c/0x1f4
>   [<0000000000534ab0>] compat_sys_ioctl+0x9c/0xfc
>   [<0000000000406214>] linux_sparc_syscall32+0x34/0x60
>   Disabling lock debugging due to kernel taint
>   Caller[0000000000503438]: vm_map_ram+0x210/0x724
>   Caller[00000000006661f8]: _xfs_buf_map_pages+0x58/0xa0
>   Caller[0000000000667058]: xfs_buf_get_map+0x668/0x7a4
>   Caller[00000000006673e0]: xfs_buf_read_map+0x20/0x160
>   Caller[0000000000667548]: xfs_buf_readahead_map+0x28/0x38
>   Caller[000000000067a4f8]: xfs_iwalk_ichunk_ra.isra.0+0xa8/0xc4
>   Caller[000000000067a8f0]: xfs_iwalk_ag+0x1c0/0x260
>   Caller[000000000067ab08]: xfs_iwalk+0xdc/0x130
>   Caller[0000000000679fc8]: xfs_bulkstat+0x10c/0x140
>   Caller[0000000000695528]: xfs_compat_ioc_fsbulkstat+0x1a4/0x1e8
>   Caller[000000000069572c]: xfs_file_compat_ioctl+0x8c/0x1f4
>   Caller[0000000000534ab0]: compat_sys_ioctl+0x9c/0xfc
>   Caller[0000000000406214]: linux_sparc_syscall32+0x34/0x60
>   Caller[00000000f789ccdc]: 0xf789ccdc
>   Instruction DUMP:
>    8610e0c0
>    8400c002
>    c458a0f8
>   <f6704002>
>    c206e008
>    80a06000
>    12400012
>    01000000
>    81408000
>
> Let me know if you need any more info!
>
> Thanks,
>   Nick
>
I guess you can patch this
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-hotfixes-unstable&id=00468d41c20cac748c2e4bfcf003283d554673f5

--
help you, help me,
Hailong.