linux-kernel - Re: [PATCH] coredump: reduce stack usage in vfs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cb93bc0a-5412-46fd-8fe1-3e13b5b08cca@linux.ibm.com>
Date: Thu, 26 Jun 2025 11:52:07 +0530
From: Venkat Rao Bagalkote <venkat88@...ux.ibm.com>
To: Arnd Bergmann <arnd@...db.de>,
        Marek Szyprowski
 <m.szyprowski@...sung.com>,
        Arnd Bergmann <arnd@...nel.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Christian Brauner <brauner@...nel.org>
Cc: Jan Kara <jack@...e.cz>, Alexander Mikhalitsyn <alexander@...alicyn.com>,
        Jann Horn <jannh@...gle.com>, Luca Boccassi <luca.boccassi@...il.com>,
        Jeff Layton <jlayton@...nel.org>,
        Roman Kisel <romank@...ux.microsoft.com>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] coredump: reduce stack usage in vfs_coredump()


On 25/06/25 6:59 pm, Arnd Bergmann wrote:
> On Wed, Jun 25, 2025, at 13:54, Marek Szyprowski wrote:
>> On 25.06.2025 13:41, Marek Szyprowski wrote:
>>> This change appears in today's linux-next (next-20250625) as commit
>>> fb82645d3f72 ("coredump: reduce stack usage in vfs_coredump()"). In my
>>> tests I found that it causes a kernel oops on some of my ARM 32bit
>>> Exynos based boards. This is really strange, because I don't see any
>>> obvious problem in this patch. Reverting $subject on top of linux-next
>>> hides/fixes the oops. I suspect some kind of use-after-free issue, but
>>> I cannot point anything related. Here is the kernel log from one of
>>> the affected boards (I've intentionally kept the register and stack
>>> dumps):
>> I've just checked once again and found the source of the issue.
>> vfs_coredump() calls coredump_cleanup(), which calls coredump_finish(),
>> which performs the following dereference:
>>
>> next = current->signal->core_state->dumper.next
>>
>> of the core_state assigned in zap_threads() called from coredump_wait().
>> It looks that core_state cannot be moved into coredump_wait() without
>> refactoring/cleaning this first.


IBM CI has also reported the similar crash, while running ./check 
tests/generic/228 from xfstests. This issue is observed on both xfs and 
ext4.


Traces:


[28956.438544] run fstests generic/228 at 2025-06-26 01:02:28
[28956.806452] coredump: 4746(sysctl): Unsafe core_pattern used with 
fs.suid_dumpable=2: pipe handler or fully qualified core dump path 
required. Set kernel.core_pattern before fs.suid_dumpable.
[28956.809279] BUG: Unable to handle kernel data access at 
0x3437342e65727d2f
[28956.809287] Faulting instruction address: 0xc0000000010fe718
[28956.809292] Oops: Kernel access of bad area, sig: 11 [#1]
[28956.809297] LE PAGE_SIZE=64K MMU=Hash  SMP NR_CPUS=8192 NUMA pSeries
[28956.809303] Modules linked in: loop nft_fib_inet nft_fib_ipv4 
nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 
nft_reject nft_ct nft_chain_nat nf_nat bonding nf_conntrack 
nf_defrag_ipv6 tls nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink 
pseries_rng vmx_crypto xfs sr_mod cdrom sd_mod sg ibmvscsi ibmveth 
scsi_transport_srp fuse
[28956.809347] CPU: 25 UID: 0 PID: 4748 Comm: xfs_io Kdump: loaded Not 
tainted 6.16.0-rc3-next-20250625 #1 VOLUNTARY
[28956.809355] Hardware name: IBM,8375-42A POWER9 (architected) 0x4e0202 
0xf000005 of:IBM,FW950.80 (VL950_131) hv:phyp pSeries
[28956.809360] NIP:  c0000000010fe718 LR: c0000000001d0d20 CTR: 
0000000000000000
[28956.809365] REGS: c00000009a80f720 TRAP: 0380   Not tainted 
(6.16.0-rc3-next-20250625)
[28956.809370] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  
CR: 88008844  XER: 20040000
[28956.809385] CFAR: c0000000001d0d1c IRQMASK: 1
[28956.809385] GPR00: c0000000001d0d20 c00000009a80f9c0 c000000001648100 
3437342e65727d2f
[28956.809385] GPR04: 0000000000000003 0000000000000000 0000000000000000 
fffffffffffe0000
[28956.809385] GPR08: c0000000c97baa00 0000000000000033 c0000000b9039000 
0000000000008000
[28956.809385] GPR12: c0000000001dd158 c000000017f91300 0000000000000000 
0000000000000000
[28956.809385] GPR16: 0000000000000000 0000000000000018 c0000000b9039000 
c0000000b9039d60
[28956.809385] GPR20: c0000000b9039080 c0000000b9039d48 0000000000040100 
0000000000000001
[28956.809385] GPR24: 0000000008430000 c00000009a80fd30 c0000000c97baa00 
c000000002baf820
[28956.809385] GPR28: 3437342e65727d2f 0000000000000000 0000000000000003 
0000000000000000
[28956.809444] NIP [c0000000010fe718] _raw_spin_lock_irqsave+0x34/0xb0
[28956.809452] LR [c0000000001d0d20] try_to_wake_up+0x6c/0x828
[28956.809459] Call Trace:
[28956.809462] [c00000009a80f9c0] [c00000009a80fa10] 0xc00000009a80fa10 
(unreliable)
[28956.809469] [c00000009a80f9f0] [0000000000000000] 0x0
[28956.809474] [c00000009a80fa80] [c0000000006f1958] 
vfs_coredump+0x254/0x5c8
[28956.809481] [c00000009a80fbf0] [c00000000018cf3c] get_signal+0x454/0xb64
[28956.809488] [c00000009a80fcf0] [c00000000002188c] do_signal+0x7c/0x324
[28956.809496] [c00000009a80fd90] [c000000000022a00] 
do_notify_resume+0xb0/0x13c
[28956.809502] [c00000009a80fdc0] [c000000000032508] 
interrupt_exit_user_prepare_main+0x1ac/0x264
[28956.809510] [c00000009a80fe20] [c000000000032710] 
syscall_exit_prepare+0x150/0x178
[28956.809516] [c00000009a80fe50] [c00000000000d068] 
system_call_vectored_common+0x168/0x2ec
[28956.809525] ---- interrupt: 3000 at 0x7fff82b24bf4
[28956.809529] NIP:  00007fff82b24bf4 LR: 00007fff82b24bf4 CTR: 
0000000000000000
[28956.809534] REGS: c00000009a80fe80 TRAP: 3000   Not tainted 
(6.16.0-rc3-next-20250625)
[28956.809538] MSR:  800000000280f033 
<SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 48004802  XER: 00000000
[28956.809554] IRQMASK: 0
[28956.809554] GPR00: 0000000000000135 00007ffffe2ecf50 00007fff82c37200 
ffffffffffffffe5
[28956.809554] GPR04: 0000000000000000 0000000000000000 0000000006500000 
00007fff82e3e120
[28956.809554] GPR08: 00007fff82e369e8 0000000000000000 0000000000000000 
0000000000000000
[28956.809554] GPR12: 0000000000000000 00007fff82e3e120 0000000000000000 
0000000000000000
[28956.809554] GPR16: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[28956.809554] GPR20: 0000000000000000 0000000000000000 0000000000000000 
0000000000000001
[28956.809554] GPR24: 0000010009812f10 0000000000000000 0000000000000001 
0000000123099fe8
[28956.809554] GPR28: 0000000000000000 0000000000000003 0000000000000000 
0000000006500000
[28956.809610] NIP [00007fff82b24bf4] 0x7fff82b24bf4
[28956.809614] LR [00007fff82b24bf4] 0x7fff82b24bf4
[28956.809618] ---- interrupt: 3000
[28956.809621] Code: 38429a1c 7c0802a6 60000000 fbe1fff8 f821ffd1 
8bed0932 63e90001 992d0932 a12d0008 3ce0fffe 5529083c 61290001 
<7d001829> 7d063879 40c20018 7d063838
[28956.809641] ---[ end trace 0000000000000000 ]---
[28956.812734] pstore: backend (nvram) writing error (-1)


If you happen to fix this, please add below tag.


Reported-by: Venkat Rao Bagalkote <venkat88@...ux.ibm.com>


Regards,

Venkat.

> Thanks for the analysis, I agree that this can't work and my patch
> just needs to be dropped. The 'noinline_for_stack' change on
> its own is probably sufficient to avoid the warning, and I can
> respin a new version after more build testing.
>
>       Arnd
>