[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <31838b50-e420-405a-af23-6c6ede281386@linux.ibm.com>
Date: Mon, 13 Oct 2025 14:53:50 +0530
From: Nilay Shroff <nilay@...ux.ibm.com>
To: namcao@...utronix.de
Cc: maddy@...ux.ibm.com, mpe@...erman.id.au, npiggin@...il.com,
christophe.leroy@...roup.eu, tglx@...utronix.de, maz@...nel.org,
gautam@...ux.ibm.com, linuxppc-dev@...ts.ozlabs.org,
Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
"linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>
Subject: [bug report][PPC]: rmod nvme driver causes the kernel panic
Hi Nam,
On the latest upstream mainline kernel, I am encountering a kernel
crash when attempting to unload the NVMe driver module (rmmod nvme)
on a POWER9 system. The crash appears to be triggered by the recent
work on using MSI parent domains, discussed here:
https://lore.kernel.org/all/cover.1754903590.git.namcao@linutronix.de/
System details:
===============
Architecture: PowerPC (POWER9, IBM 9008-22L)
Kernel: 6.18.0-rc1 (mainline, unmodified)
Platform: pSeries / PHYP
Reproducibility: Always, when running rmmod nvme
Crash trace:
============
Kernel attempted to read user page (8) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on read at 0x00000008
Faulting instruction address: 0xc000000000b30638
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: rpadlpar_io rpaphp bonding tls nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet n
CPU: 14 UID: 0 PID: 1973 Comm: rmmod Not tainted 6.18.0-rc1 #63 VOLUNTARY
Hardware name: IBM,9008-22L POWER9 (architected) 0x4e0202 0xf000005 of:IBM,FW950.80 (VL950_131) hv:phyp pSeries
NIP: c000000000b30638 LR: c000000000111d90 CTR: c000000000111d6c
REGS: c00000011f1076e0 TRAP: 0300 Not tainted (6.18.0-rc1)
MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 48008228 XER: 200400cf
CFAR: c00000000000d9cc DAR: 0000000000000008 DSISR: 40000000 IRQMASK: 0
GPR00: c000000000111d90 c00000011f107980 c000000001da8100 0000000000000000
GPR04: c0000000bcf535e8 0000000000000000 73efa01ced0dd290 00000000b0734e18
GPR08: 0000000ffb4c0000 c0000000bcf53540 0000000000000000 0000000048008222
GPR12: c000000000111d6c c000000017ff1c80 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 0000000000000001 c0000000b70bd980 c0000000995890c8
GPR28: 0000000000000000 c0000000bcf53590 c00000000e79c800 c0000000995890c8
NIP [c000000000b30638] msi_desc_to_pci_dev+0x8/0x14
LR [c000000000111d90] pseries_msi_ops_teardown+0x24/0x38
Call Trace:
[c00000011f107980] [c0000000995890c8] 0xc0000000995890c8 (unreliable)
[c00000011f1079a0] [c000000000276118] msi_remove_device_irq_domain+0x9c/0x18c
[c00000011f1079e0] [c00000000027623c] msi_device_data_release+0x34/0xa8
[c00000011f107a10] [c000000000c657b8] release_nodes+0xac/0x1f0
[c00000011f107ab0] [c000000000c675e8] devres_release_all+0xc0/0x138
[c00000011f107b20] [c000000000c5bb8c] device_unbind_cleanup+0x2c/0xb0
[c00000011f107b50] [c000000000c5dfc8] device_release_driver_internal+0x2fc/0x34c
[c00000011f107ba0] [c000000000c5e0c4] driver_detach+0x74/0xe0
[c00000011f107bd0] [c000000000c5b3e0] bus_remove_driver+0x94/0x140
[c00000011f107c50] [c000000000c5f1c8] driver_unregister+0x48/0x88
[c00000011f107cc0] [c000000000b228ec] pci_unregister_driver+0x40/0x128
[c00000011f107d10] [c008000004b6834c] nvme_exit+0x20/0x7cd4 [nvme]
[c00000011f107d30] [c0000000002becb8] __do_sys_delete_module.constprop.0+0x1ac/0x3ec
[c00000011f107e10] [c000000000032324] system_call_exception+0x134/0x360
[c00000011f107e50] [c00000000000cedc] system_call_vectored_common+0x15c/0x2ec
Analysis:
=========
>From tracing the cleanup path, it appears that the crash happens because the MSI
descriptor is freed before the MSI teardown is invoked. Specifically, during NVMe
module unload (rmmod nvme), the call sequence is as follows:
cleanup_module
-> pci_unregister_driver
-> driver_unregister
-> bus_remove_driver
-> driver_detach
-> device_release_driver_internal
-> device_remove
-> pci_device_remove
-> nvme_remove
-> nvme_dev_disable
-> pci_free_irq_vectors
-> pci_disable_msix
-> pci_free_msi_irqs
-> pci_msi_teardown_msi_irqs ==> here we free msi_desc
Later, when call stack continue unwinding through,
-> device_release_driver_internal
-> device_unbind_cleanup
-> devres_release_all
-> release_nodes
-> msi_device_data_release
-> msi_remove_device_irq_domain
-> pseries_msi_ops_teardown => here the freed msi_desc is dereferenced, leads to crash
Possible Cause:
===============
This looks like a cleanup ordering issue introduced by the recent MSI parent
domain rework. The PCI/MSI teardown seems to assume that the MSI descriptor
remains valid until after the domain teardown path executes — which no longer
appears to hold true in this sequence.
Expected behavior:
==================
The rmmod nvme operation should cleanly unload the module without triggering a
crash or accessing freed MSI descriptors.
Additional notes:
=================
- The crash reproduces consistently on PowerPC (pseries, PHYP).
- It did not occur before the MSI parent domain series was merged.
- Likely to affect other MSI-capable PCI drivers.
Let me know if you need any further details. Also if you fix this bug,
I'd be glad to assist you validating the fix on PPC.
Thanks,
--Nilay
Powered by blists - more mailing lists