[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cb93bc0a-5412-46fd-8fe1-3e13b5b08cca@linux.ibm.com>
Date: Thu, 26 Jun 2025 11:52:07 +0530
From: Venkat Rao Bagalkote <venkat88@...ux.ibm.com>
To: Arnd Bergmann <arnd@...db.de>,
Marek Szyprowski
<m.szyprowski@...sung.com>,
Arnd Bergmann <arnd@...nel.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>
Cc: Jan Kara <jack@...e.cz>, Alexander Mikhalitsyn <alexander@...alicyn.com>,
Jann Horn <jannh@...gle.com>, Luca Boccassi <luca.boccassi@...il.com>,
Jeff Layton <jlayton@...nel.org>,
Roman Kisel <romank@...ux.microsoft.com>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] coredump: reduce stack usage in vfs_coredump()
On 25/06/25 6:59 pm, Arnd Bergmann wrote:
> On Wed, Jun 25, 2025, at 13:54, Marek Szyprowski wrote:
>> On 25.06.2025 13:41, Marek Szyprowski wrote:
>>> This change appears in today's linux-next (next-20250625) as commit
>>> fb82645d3f72 ("coredump: reduce stack usage in vfs_coredump()"). In my
>>> tests I found that it causes a kernel oops on some of my ARM 32bit
>>> Exynos based boards. This is really strange, because I don't see any
>>> obvious problem in this patch. Reverting $subject on top of linux-next
>>> hides/fixes the oops. I suspect some kind of use-after-free issue, but
>>> I cannot point anything related. Here is the kernel log from one of
>>> the affected boards (I've intentionally kept the register and stack
>>> dumps):
>> I've just checked once again and found the source of the issue.
>> vfs_coredump() calls coredump_cleanup(), which calls coredump_finish(),
>> which performs the following dereference:
>>
>> next = current->signal->core_state->dumper.next
>>
>> of the core_state assigned in zap_threads() called from coredump_wait().
>> It looks that core_state cannot be moved into coredump_wait() without
>> refactoring/cleaning this first.
IBM CI has also reported the similar crash, while running ./check
tests/generic/228 from xfstests. This issue is observed on both xfs and
ext4.
Traces:
[28956.438544] run fstests generic/228 at 2025-06-26 01:02:28
[28956.806452] coredump: 4746(sysctl): Unsafe core_pattern used with
fs.suid_dumpable=2: pipe handler or fully qualified core dump path
required. Set kernel.core_pattern before fs.suid_dumpable.
[28956.809279] BUG: Unable to handle kernel data access at
0x3437342e65727d2f
[28956.809287] Faulting instruction address: 0xc0000000010fe718
[28956.809292] Oops: Kernel access of bad area, sig: 11 [#1]
[28956.809297] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=8192 NUMA pSeries
[28956.809303] Modules linked in: loop nft_fib_inet nft_fib_ipv4
nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
nft_reject nft_ct nft_chain_nat nf_nat bonding nf_conntrack
nf_defrag_ipv6 tls nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink
pseries_rng vmx_crypto xfs sr_mod cdrom sd_mod sg ibmvscsi ibmveth
scsi_transport_srp fuse
[28956.809347] CPU: 25 UID: 0 PID: 4748 Comm: xfs_io Kdump: loaded Not
tainted 6.16.0-rc3-next-20250625 #1 VOLUNTARY
[28956.809355] Hardware name: IBM,8375-42A POWER9 (architected) 0x4e0202
0xf000005 of:IBM,FW950.80 (VL950_131) hv:phyp pSeries
[28956.809360] NIP: c0000000010fe718 LR: c0000000001d0d20 CTR:
0000000000000000
[28956.809365] REGS: c00000009a80f720 TRAP: 0380 Not tainted
(6.16.0-rc3-next-20250625)
[28956.809370] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
CR: 88008844 XER: 20040000
[28956.809385] CFAR: c0000000001d0d1c IRQMASK: 1
[28956.809385] GPR00: c0000000001d0d20 c00000009a80f9c0 c000000001648100
3437342e65727d2f
[28956.809385] GPR04: 0000000000000003 0000000000000000 0000000000000000
fffffffffffe0000
[28956.809385] GPR08: c0000000c97baa00 0000000000000033 c0000000b9039000
0000000000008000
[28956.809385] GPR12: c0000000001dd158 c000000017f91300 0000000000000000
0000000000000000
[28956.809385] GPR16: 0000000000000000 0000000000000018 c0000000b9039000
c0000000b9039d60
[28956.809385] GPR20: c0000000b9039080 c0000000b9039d48 0000000000040100
0000000000000001
[28956.809385] GPR24: 0000000008430000 c00000009a80fd30 c0000000c97baa00
c000000002baf820
[28956.809385] GPR28: 3437342e65727d2f 0000000000000000 0000000000000003
0000000000000000
[28956.809444] NIP [c0000000010fe718] _raw_spin_lock_irqsave+0x34/0xb0
[28956.809452] LR [c0000000001d0d20] try_to_wake_up+0x6c/0x828
[28956.809459] Call Trace:
[28956.809462] [c00000009a80f9c0] [c00000009a80fa10] 0xc00000009a80fa10
(unreliable)
[28956.809469] [c00000009a80f9f0] [0000000000000000] 0x0
[28956.809474] [c00000009a80fa80] [c0000000006f1958]
vfs_coredump+0x254/0x5c8
[28956.809481] [c00000009a80fbf0] [c00000000018cf3c] get_signal+0x454/0xb64
[28956.809488] [c00000009a80fcf0] [c00000000002188c] do_signal+0x7c/0x324
[28956.809496] [c00000009a80fd90] [c000000000022a00]
do_notify_resume+0xb0/0x13c
[28956.809502] [c00000009a80fdc0] [c000000000032508]
interrupt_exit_user_prepare_main+0x1ac/0x264
[28956.809510] [c00000009a80fe20] [c000000000032710]
syscall_exit_prepare+0x150/0x178
[28956.809516] [c00000009a80fe50] [c00000000000d068]
system_call_vectored_common+0x168/0x2ec
[28956.809525] ---- interrupt: 3000 at 0x7fff82b24bf4
[28956.809529] NIP: 00007fff82b24bf4 LR: 00007fff82b24bf4 CTR:
0000000000000000
[28956.809534] REGS: c00000009a80fe80 TRAP: 3000 Not tainted
(6.16.0-rc3-next-20250625)
[28956.809538] MSR: 800000000280f033
<SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 48004802 XER: 00000000
[28956.809554] IRQMASK: 0
[28956.809554] GPR00: 0000000000000135 00007ffffe2ecf50 00007fff82c37200
ffffffffffffffe5
[28956.809554] GPR04: 0000000000000000 0000000000000000 0000000006500000
00007fff82e3e120
[28956.809554] GPR08: 00007fff82e369e8 0000000000000000 0000000000000000
0000000000000000
[28956.809554] GPR12: 0000000000000000 00007fff82e3e120 0000000000000000
0000000000000000
[28956.809554] GPR16: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[28956.809554] GPR20: 0000000000000000 0000000000000000 0000000000000000
0000000000000001
[28956.809554] GPR24: 0000010009812f10 0000000000000000 0000000000000001
0000000123099fe8
[28956.809554] GPR28: 0000000000000000 0000000000000003 0000000000000000
0000000006500000
[28956.809610] NIP [00007fff82b24bf4] 0x7fff82b24bf4
[28956.809614] LR [00007fff82b24bf4] 0x7fff82b24bf4
[28956.809618] ---- interrupt: 3000
[28956.809621] Code: 38429a1c 7c0802a6 60000000 fbe1fff8 f821ffd1
8bed0932 63e90001 992d0932 a12d0008 3ce0fffe 5529083c 61290001
<7d001829> 7d063879 40c20018 7d063838
[28956.809641] ---[ end trace 0000000000000000 ]---
[28956.812734] pstore: backend (nvram) writing error (-1)
If you happen to fix this, please add below tag.
Reported-by: Venkat Rao Bagalkote <venkat88@...ux.ibm.com>
Regards,
Venkat.
> Thanks for the analysis, I agree that this can't work and my patch
> just needs to be dropped. The 'noinline_for_stack' change on
> its own is probably sufficient to avoid the warning, and I can
> respin a new version after more build testing.
>
> Arnd
>
Powered by blists - more mailing lists