[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <958ef380be4ea488698fab05245d631998c32a48.camel@linux.ibm.com>
Date: Mon, 03 Nov 2025 12:23:08 +0100
From: Niklas Schnelle <schnelle@...ux.ibm.com>
To: Huacai Chen <chenhuacai@...nel.org>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>, Jan Kiszka
<jan.kiszka@...mens.com>,
Bibo Mao <maobibo@...ngson.cn>,
linux-s390
<linux-s390@...r.kernel.org>, loongarch@...ts.linux.dev,
Farhan Ali
<alifm@...ux.ibm.com>,
Matthew Rosato <mjrosato@...ux.ibm.com>,
Tianrui
Zhao <zhaotianrui@...ngson.cn>,
Gerald Schaefer
<gerald.schaefer@...ux.ibm.com>,
Heiko Carstens <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>,
Alexander Gordeev
<agordeev@...ux.ibm.com>,
Sven Schnelle <svens@...ux.ibm.com>,
Christian
Borntraeger <borntraeger@...ux.ibm.com>,
Gerd Bayer
<gbayer@...ux.ibm.com>, linux-kernel@...r.kernel.org,
linux-pci@...r.kernel.org
Subject: Re: [PATCH v5 1/2] PCI: Fix isolated PCI function probing with ARI
and SR-IOV
On Mon, 2025-11-03 at 17:50 +0800, Huacai Chen wrote:
> Hi, Niklas,
>
> On Wed, Oct 29, 2025 at 5:42 PM Niklas Schnelle <schnelle@...ux.ibm.com> wrote:
> >
> > When the isolated PCI function probing mechanism is used in conjunction
> > with ARI or SR-IOV it may not find all available PCI functions. In the
> > case of ARI the problem is that next_ari_fn() always returns -ENODEV if
> > dev is NULL and thus if fn 0 is missing the scan stops.
> >
> > For SR-IOV things are more complex. Here the problem is that the check
> > for multifunction may fail. One example where this can occur is if the
> > first passed-through function is a VF with devfn 8. Now in
> > pci_scan_slot() this means it is fn 0 and thus multifunction doesn't get
> > set. Since VFs don't get multifunction set via PCI_HEADER_TYPE_MFD it
> > remains unset and probing stops even if there is a devfn 9.
> >
> > Now at the moment both of these issues are hidden on s390. The first one
> > because ARI is detected as disabled as struct pci_bus's self is NULL
> > even though firmware does enable and use ARI. The second issue is hidden
> > as a side effect of commit 25f39d3dcb48 ("s390/pci: Ignore RID for
> > isolated VFs"). This is because VFs are either put on their own virtual
> > bus if the parent PF is not passed-through to the same instance or VFs
> > are hotplugged once SR-IOV is enabled on the parent PF and then
> > pci_scan_single_device() is used.
> >
> > Still especially the first issue prevents correct detection of ARI and
> > the second might be a problem for other users of isolated function
> > probing. Fix both issues by keeping things as simple as possible. If
> > isolated function probing is enabled simply scan every possible devfn.
> I'm very sorry, but applying this patch on top of commit a02fd05661d7
> ("PCI: Extend isolated function probing to LoongArch") we fail to
> boot.
>
> Boot log:
> [ 10.365340] megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16
> 00:01:03 EST 2006)
> [ 10.372628] megaraid: 2.20.5.1 (Release Date: Thu Nov 16 15:32:35 EST 2006)
> [ 10.379564] megasas: 07.734.00.00-rc1
> [ 10.383222] mpt3sas version 54.100.00.00 loaded
> [ 10.388304] nvme nvme0: pci function 0000:08:00.0
> [ 10.395088] Freeing initrd memory: 45632K
> [ 10.469822] ------------[ cut here ]------------
> [ 10.474409] WARNING: CPU: 0 PID: 247 at drivers/ata/libahci.c:233
> ahci_enable_ahci+0x64/0xb8
> [ 10.482804] Modules linked in:
> [ 10.485838] CPU: 0 UID: 0 PID: 247 Comm: kworker/0:11 Not tainted
> 6.18.0-rc3 #1 PREEMPT(full)
> [ 10.494397] Hardware name: To be filled by O.E.M.To be fill To be
> filled by O.E.M.To be fill/To be filled by O.E.M.To be fill, BIOS
> Loongson-UDK2018-V4.0.
> [ 10.508139] Workqueue: events work_for_cpu_fn
> [ 10.512468] pc 900000000103be2c ra 900000000103be28 tp
> 900000010ae44000 sp 900000010ae47be0
> [ 10.520769] a0 0000000000000000 a1 00000000000000b0 a2
> 0000000000000001 a3 9000000001810e0c
> [ 10.529069] a4 9000000002343e20 a5 0000000000000001 a6
> 0000000000000010 a7 0000000000000000
> [ 10.537373] t0 d10951fa66920f31 t1 d10951fa66920f31 t2
> 0000000000001280 t3 000000000674c000
> [ 10.545673] t4 0000000000000000 t5 0000000000000000 t6
> 9000000008002480 t7 00000000000000b4
> [ 10.553972] t8 90000001055eab90 u0 900000010ae47b68 s9
> 9000000002221a50 s0 0000000000000000
> [ 10.562272] s1 ffff800032435800 s2 0000000000000000 s3
> ffffffff80000000 s4 9000000002221570
> [ 10.570571] s5 0000000000000005 s6 9000000101ccf0b8 s7
> 90000000023dd000 s8 900000010ae47d08
> [ 10.578869] ra: 900000000103be28 ahci_enable_ahci+0x60/0xb8
> [ 10.584665] ERA: 900000000103be2c ahci_enable_ahci+0x64/0xb8
> [ 10.590461] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
> [ 10.596609] PRMD: 00000004 (PPLV0 +PIE -PWE)
> [ 10.600937] EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
> [ 10.605698] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
> [ 10.610458] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0)
> [ 10.615994] PRID: 0014d010 (Loongson-64bit, Loongson-3C6000/S)
> [ 10.621875] CPU: 0 UID: 0 PID: 247 Comm: kworker/0:11 Not tainted
> 6.18.0-rc3 #1 PREEMPT(full)
> [ 10.621877] Hardware name: To be filled by O.E.M.To be fill To be
> filled by O.E.M.To be fill/To be filled by O.E.M.To be fill, BIOS
> Loongson-UDK2018-V4.0.
> [ 10.621878] Workqueue: events work_for_cpu_fn
> [ 10.621881] Stack : 900000010ae47848 0000000000000000
> 90000000002436bc 900000010ae44000
> [ 10.621884] 900000010ae47820 900000010ae47828
> 0000000000000000 900000010ae47968
> [ 10.621887] 900000010ae47960 900000010ae47960
> 900000010ae47630 0000000000000001
> [ 10.621890] 0000000000000001 900000010ae47828
> d10951fa66920f31 9000000100414300
> [ 10.621893] 80000000ffffe34d fffffffffffffffe
> 000000000000034f 000000000000002f
> [ 10.621896] 0000000000000063 0000000000000001
> 000000000674c000 9000000002221a50
> [ 10.621899] 0000000000000000 0000000000000000
> 90000000020b6500 90000000023dd000
> [ 10.621902] 00000000000000e9 0000000000000009
> 0000000000000002 90000000023dd000
> [ 10.621905] 900000010ae47d08 0000000000000000
> 90000000002436d4 0000000000000000
> [ 10.621908] 00000000000000b0 0000000000000004
> 0000000000000000 0000000000071c1d
> [ 10.621910] ...
> [ 10.621912] Call Trace:
> [ 10.621913] [<90000000002436d4>] show_stack+0x5c/0x180
> [ 10.621918] [<900000000023f328>] dump_stack_lvl+0x6c/0x9c
> [ 10.621923] [<9000000000266eb8>] __warn+0x80/0x108
> [ 10.621927] [<90000000017d1910>] report_bug+0x158/0x2a8
> [ 10.621932] [<900000000180b610>] do_bp+0x2d0/0x340
> [ 10.621938] [<9000000000241da0>] handle_bp+0x120/0x1c0
> [ 10.621940] [<900000000103be2c>] ahci_enable_ahci+0x64/0xb8
> [ 10.621943] [<900000000103beb8>] ahci_save_initial_config+0x38/0x4d8
> [ 10.621946] [<90000000010391b4>] ahci_init_one+0x354/0x1088
> [ 10.621950] [<9000000000d16cdc>] local_pci_probe+0x44/0xb8
> [ 10.621953] [<9000000000286f78>] work_for_cpu_fn+0x18/0x30
> [ 10.621956] [<900000000028a840>] process_one_work+0x160/0x330
> [ 10.621961] [<900000000028b208>] worker_thread+0x330/0x460
> [ 10.621964] [<9000000000295fdc>] kthread+0x11c/0x138
> [ 10.621968] [<900000000180b740>] ret_from_kernel_thread+0x28/0xa8
> [ 10.621971] [<9000000000241484>] ret_from_kernel_thread_asm+0xc/0x88
> [ 10.621973]
>
>
This looks like a warning telling us that AHCI enable failed / timed
out. Do you have Panic on Warn on that this directly causes a boot
failure? The only relation I can see with my patch is that maybe this
AHCI device wasn't probed before and somehow isn't working?
Thanks,
Niklas
Powered by blists - more mailing lists