lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAhV-H7iMKmLnisD-874D2ZC919sDYeWy3tw=+eUqifK--6-Dg@mail.gmail.com>
Date: Wed, 5 Nov 2025 09:01:05 +0800
From: Huacai Chen <chenhuacai@...nel.org>
To: Niklas Schnelle <schnelle@...ux.ibm.com>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>, Jan Kiszka <jan.kiszka@...mens.com>, 
	Bibo Mao <maobibo@...ngson.cn>, linux-s390 <linux-s390@...r.kernel.org>, 
	loongarch@...ts.linux.dev, Farhan Ali <alifm@...ux.ibm.com>, 
	Matthew Rosato <mjrosato@...ux.ibm.com>, Tianrui Zhao <zhaotianrui@...ngson.cn>, 
	Gerald Schaefer <gerald.schaefer@...ux.ibm.com>, Heiko Carstens <hca@...ux.ibm.com>, 
	Vasily Gorbik <gor@...ux.ibm.com>, Alexander Gordeev <agordeev@...ux.ibm.com>, 
	Sven Schnelle <svens@...ux.ibm.com>, Christian Borntraeger <borntraeger@...ux.ibm.com>, 
	Gerd Bayer <gbayer@...ux.ibm.com>, linux-kernel@...r.kernel.org, 
	linux-pci@...r.kernel.org
Subject: Re: [PATCH v5 1/2] PCI: Fix isolated PCI function probing with ARI
 and SR-IOV

On Mon, Nov 3, 2025 at 7:23 PM Niklas Schnelle <schnelle@...ux.ibm.com> wrote:
>
> On Mon, 2025-11-03 at 17:50 +0800, Huacai Chen wrote:
> > Hi, Niklas,
> >
> > On Wed, Oct 29, 2025 at 5:42 PM Niklas Schnelle <schnelle@...ux.ibm.com> wrote:
> > >
> > > When the isolated PCI function probing mechanism is used in conjunction
> > > with ARI or SR-IOV it may not find all available PCI functions. In the
> > > case of ARI the problem is that next_ari_fn() always returns -ENODEV if
> > > dev is NULL and thus if fn 0 is missing the scan stops.
> > >
> > > For SR-IOV things are more complex. Here the problem is that the check
> > > for multifunction may fail. One example where this can occur is if the
> > > first passed-through function is a VF with devfn 8. Now in
> > > pci_scan_slot() this means it is fn 0 and thus multifunction doesn't get
> > > set. Since VFs don't get multifunction set via PCI_HEADER_TYPE_MFD it
> > > remains unset and probing stops even if there is a devfn 9.
> > >
> > > Now at the moment both of these issues are hidden on s390. The first one
> > > because ARI is detected as disabled as struct pci_bus's self is NULL
> > > even though firmware does enable and use ARI. The second issue is hidden
> > > as a side effect of commit 25f39d3dcb48 ("s390/pci: Ignore RID for
> > > isolated VFs"). This is because VFs are either put on their own virtual
> > > bus if the parent PF is not passed-through to the same instance or VFs
> > > are hotplugged once SR-IOV is enabled on the parent PF and then
> > > pci_scan_single_device() is used.
> > >
> > > Still especially the first issue prevents correct detection of ARI and
> > > the second might be a problem for other users of isolated function
> > > probing. Fix both issues by keeping things as simple as possible. If
> > > isolated function probing is enabled simply scan every possible devfn.
> > I'm very sorry, but applying this patch on top of commit a02fd05661d7
> > ("PCI: Extend isolated function probing to LoongArch") we fail to
> > boot.
> >
> > Boot log:
> > [   10.365340] megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16
> > 00:01:03 EST 2006)
> > [   10.372628] megaraid: 2.20.5.1 (Release Date: Thu Nov 16 15:32:35 EST 2006)
> > [   10.379564] megasas: 07.734.00.00-rc1
> > [   10.383222] mpt3sas version 54.100.00.00 loaded
> > [   10.388304] nvme nvme0: pci function 0000:08:00.0
> > [   10.395088] Freeing initrd memory: 45632K
> > [   10.469822] ------------[ cut here ]------------
> > [   10.474409] WARNING: CPU: 0 PID: 247 at drivers/ata/libahci.c:233
> > ahci_enable_ahci+0x64/0xb8
> > [   10.482804] Modules linked in:
> > [   10.485838] CPU: 0 UID: 0 PID: 247 Comm: kworker/0:11 Not tainted
> > 6.18.0-rc3 #1 PREEMPT(full)
> > [   10.494397] Hardware name: To be filled by O.E.M.To be fill To be
> > filled by O.E.M.To be fill/To be filled by O.E.M.To be fill, BIOS
> > Loongson-UDK2018-V4.0.
> > [   10.508139] Workqueue: events work_for_cpu_fn
> > [   10.512468] pc 900000000103be2c ra 900000000103be28 tp
> > 900000010ae44000 sp 900000010ae47be0
> > [   10.520769] a0 0000000000000000 a1 00000000000000b0 a2
> > 0000000000000001 a3 9000000001810e0c
> > [   10.529069] a4 9000000002343e20 a5 0000000000000001 a6
> > 0000000000000010 a7 0000000000000000
> > [   10.537373] t0 d10951fa66920f31 t1 d10951fa66920f31 t2
> > 0000000000001280 t3 000000000674c000
> > [   10.545673] t4 0000000000000000 t5 0000000000000000 t6
> > 9000000008002480 t7 00000000000000b4
> > [   10.553972] t8 90000001055eab90 u0 900000010ae47b68 s9
> > 9000000002221a50 s0 0000000000000000
> > [   10.562272] s1 ffff800032435800 s2 0000000000000000 s3
> > ffffffff80000000 s4 9000000002221570
> > [   10.570571] s5 0000000000000005 s6 9000000101ccf0b8 s7
> > 90000000023dd000 s8 900000010ae47d08
> > [   10.578869]    ra: 900000000103be28 ahci_enable_ahci+0x60/0xb8
> > [   10.584665]   ERA: 900000000103be2c ahci_enable_ahci+0x64/0xb8
> > [   10.590461]  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
> > [   10.596609]  PRMD: 00000004 (PPLV0 +PIE -PWE)
> > [   10.600937]  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
> > [   10.605698]  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
> > [   10.610458] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0)
> > [   10.615994]  PRID: 0014d010 (Loongson-64bit, Loongson-3C6000/S)
> > [   10.621875] CPU: 0 UID: 0 PID: 247 Comm: kworker/0:11 Not tainted
> > 6.18.0-rc3 #1 PREEMPT(full)
> > [   10.621877] Hardware name: To be filled by O.E.M.To be fill To be
> > filled by O.E.M.To be fill/To be filled by O.E.M.To be fill, BIOS
> > Loongson-UDK2018-V4.0.
> > [   10.621878] Workqueue: events work_for_cpu_fn
> > [   10.621881] Stack : 900000010ae47848 0000000000000000
> > 90000000002436bc 900000010ae44000
> > [   10.621884]         900000010ae47820 900000010ae47828
> > 0000000000000000 900000010ae47968
> > [   10.621887]         900000010ae47960 900000010ae47960
> > 900000010ae47630 0000000000000001
> > [   10.621890]         0000000000000001 900000010ae47828
> > d10951fa66920f31 9000000100414300
> > [   10.621893]         80000000ffffe34d fffffffffffffffe
> > 000000000000034f 000000000000002f
> > [   10.621896]         0000000000000063 0000000000000001
> > 000000000674c000 9000000002221a50
> > [   10.621899]         0000000000000000 0000000000000000
> > 90000000020b6500 90000000023dd000
> > [   10.621902]         00000000000000e9 0000000000000009
> > 0000000000000002 90000000023dd000
> > [   10.621905]         900000010ae47d08 0000000000000000
> > 90000000002436d4 0000000000000000
> > [   10.621908]         00000000000000b0 0000000000000004
> > 0000000000000000 0000000000071c1d
> > [   10.621910]         ...
> > [   10.621912] Call Trace:
> > [   10.621913] [<90000000002436d4>] show_stack+0x5c/0x180
> > [   10.621918] [<900000000023f328>] dump_stack_lvl+0x6c/0x9c
> > [   10.621923] [<9000000000266eb8>] __warn+0x80/0x108
> > [   10.621927] [<90000000017d1910>] report_bug+0x158/0x2a8
> > [   10.621932] [<900000000180b610>] do_bp+0x2d0/0x340
> > [   10.621938] [<9000000000241da0>] handle_bp+0x120/0x1c0
> > [   10.621940] [<900000000103be2c>] ahci_enable_ahci+0x64/0xb8
> > [   10.621943] [<900000000103beb8>] ahci_save_initial_config+0x38/0x4d8
> > [   10.621946] [<90000000010391b4>] ahci_init_one+0x354/0x1088
> > [   10.621950] [<9000000000d16cdc>] local_pci_probe+0x44/0xb8
> > [   10.621953] [<9000000000286f78>] work_for_cpu_fn+0x18/0x30
> > [   10.621956] [<900000000028a840>] process_one_work+0x160/0x330
> > [   10.621961] [<900000000028b208>] worker_thread+0x330/0x460
> > [   10.621964] [<9000000000295fdc>] kthread+0x11c/0x138
> > [   10.621968] [<900000000180b740>] ret_from_kernel_thread+0x28/0xa8
> > [   10.621971] [<9000000000241484>] ret_from_kernel_thread_asm+0xc/0x88
> > [   10.621973]
> >
> >
>
> This looks like a warning telling us that AHCI enable failed / timed
> out. Do you have Panic on Warn on that this directly causes a boot
> failure? The only relation I can see with my patch is that maybe this
> AHCI device wasn't probed before and somehow isn't working?
The rootfs is on the AHCI controller, so AHCI failure causes the boot
failure, without this patch no boot problems.

Huacai

>
> Thanks,
> Niklas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ