lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <958ef380be4ea488698fab05245d631998c32a48.camel@linux.ibm.com>
Date: Mon, 03 Nov 2025 12:23:08 +0100
From: Niklas Schnelle <schnelle@...ux.ibm.com>
To: Huacai Chen <chenhuacai@...nel.org>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>, Jan Kiszka
 <jan.kiszka@...mens.com>,
        Bibo Mao <maobibo@...ngson.cn>,
        linux-s390
 <linux-s390@...r.kernel.org>, loongarch@...ts.linux.dev,
        Farhan Ali
 <alifm@...ux.ibm.com>,
        Matthew Rosato <mjrosato@...ux.ibm.com>,
        Tianrui
 Zhao	 <zhaotianrui@...ngson.cn>,
        Gerald Schaefer
 <gerald.schaefer@...ux.ibm.com>,
        Heiko Carstens <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>,
        Alexander Gordeev
 <agordeev@...ux.ibm.com>,
        Sven Schnelle <svens@...ux.ibm.com>,
        Christian
 Borntraeger <borntraeger@...ux.ibm.com>,
        Gerd Bayer	
 <gbayer@...ux.ibm.com>, linux-kernel@...r.kernel.org,
        linux-pci@...r.kernel.org
Subject: Re: [PATCH v5 1/2] PCI: Fix isolated PCI function probing with ARI
 and SR-IOV

On Mon, 2025-11-03 at 17:50 +0800, Huacai Chen wrote:
> Hi, Niklas,
> 
> On Wed, Oct 29, 2025 at 5:42 PM Niklas Schnelle <schnelle@...ux.ibm.com> wrote:
> > 
> > When the isolated PCI function probing mechanism is used in conjunction
> > with ARI or SR-IOV it may not find all available PCI functions. In the
> > case of ARI the problem is that next_ari_fn() always returns -ENODEV if
> > dev is NULL and thus if fn 0 is missing the scan stops.
> > 
> > For SR-IOV things are more complex. Here the problem is that the check
> > for multifunction may fail. One example where this can occur is if the
> > first passed-through function is a VF with devfn 8. Now in
> > pci_scan_slot() this means it is fn 0 and thus multifunction doesn't get
> > set. Since VFs don't get multifunction set via PCI_HEADER_TYPE_MFD it
> > remains unset and probing stops even if there is a devfn 9.
> > 
> > Now at the moment both of these issues are hidden on s390. The first one
> > because ARI is detected as disabled as struct pci_bus's self is NULL
> > even though firmware does enable and use ARI. The second issue is hidden
> > as a side effect of commit 25f39d3dcb48 ("s390/pci: Ignore RID for
> > isolated VFs"). This is because VFs are either put on their own virtual
> > bus if the parent PF is not passed-through to the same instance or VFs
> > are hotplugged once SR-IOV is enabled on the parent PF and then
> > pci_scan_single_device() is used.
> > 
> > Still especially the first issue prevents correct detection of ARI and
> > the second might be a problem for other users of isolated function
> > probing. Fix both issues by keeping things as simple as possible. If
> > isolated function probing is enabled simply scan every possible devfn.
> I'm very sorry, but applying this patch on top of commit a02fd05661d7
> ("PCI: Extend isolated function probing to LoongArch") we fail to
> boot.
> 
> Boot log:
> [   10.365340] megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16
> 00:01:03 EST 2006)
> [   10.372628] megaraid: 2.20.5.1 (Release Date: Thu Nov 16 15:32:35 EST 2006)
> [   10.379564] megasas: 07.734.00.00-rc1
> [   10.383222] mpt3sas version 54.100.00.00 loaded
> [   10.388304] nvme nvme0: pci function 0000:08:00.0
> [   10.395088] Freeing initrd memory: 45632K
> [   10.469822] ------------[ cut here ]------------
> [   10.474409] WARNING: CPU: 0 PID: 247 at drivers/ata/libahci.c:233
> ahci_enable_ahci+0x64/0xb8
> [   10.482804] Modules linked in:
> [   10.485838] CPU: 0 UID: 0 PID: 247 Comm: kworker/0:11 Not tainted
> 6.18.0-rc3 #1 PREEMPT(full)
> [   10.494397] Hardware name: To be filled by O.E.M.To be fill To be
> filled by O.E.M.To be fill/To be filled by O.E.M.To be fill, BIOS
> Loongson-UDK2018-V4.0.
> [   10.508139] Workqueue: events work_for_cpu_fn
> [   10.512468] pc 900000000103be2c ra 900000000103be28 tp
> 900000010ae44000 sp 900000010ae47be0
> [   10.520769] a0 0000000000000000 a1 00000000000000b0 a2
> 0000000000000001 a3 9000000001810e0c
> [   10.529069] a4 9000000002343e20 a5 0000000000000001 a6
> 0000000000000010 a7 0000000000000000
> [   10.537373] t0 d10951fa66920f31 t1 d10951fa66920f31 t2
> 0000000000001280 t3 000000000674c000
> [   10.545673] t4 0000000000000000 t5 0000000000000000 t6
> 9000000008002480 t7 00000000000000b4
> [   10.553972] t8 90000001055eab90 u0 900000010ae47b68 s9
> 9000000002221a50 s0 0000000000000000
> [   10.562272] s1 ffff800032435800 s2 0000000000000000 s3
> ffffffff80000000 s4 9000000002221570
> [   10.570571] s5 0000000000000005 s6 9000000101ccf0b8 s7
> 90000000023dd000 s8 900000010ae47d08
> [   10.578869]    ra: 900000000103be28 ahci_enable_ahci+0x60/0xb8
> [   10.584665]   ERA: 900000000103be2c ahci_enable_ahci+0x64/0xb8
> [   10.590461]  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
> [   10.596609]  PRMD: 00000004 (PPLV0 +PIE -PWE)
> [   10.600937]  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
> [   10.605698]  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
> [   10.610458] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0)
> [   10.615994]  PRID: 0014d010 (Loongson-64bit, Loongson-3C6000/S)
> [   10.621875] CPU: 0 UID: 0 PID: 247 Comm: kworker/0:11 Not tainted
> 6.18.0-rc3 #1 PREEMPT(full)
> [   10.621877] Hardware name: To be filled by O.E.M.To be fill To be
> filled by O.E.M.To be fill/To be filled by O.E.M.To be fill, BIOS
> Loongson-UDK2018-V4.0.
> [   10.621878] Workqueue: events work_for_cpu_fn
> [   10.621881] Stack : 900000010ae47848 0000000000000000
> 90000000002436bc 900000010ae44000
> [   10.621884]         900000010ae47820 900000010ae47828
> 0000000000000000 900000010ae47968
> [   10.621887]         900000010ae47960 900000010ae47960
> 900000010ae47630 0000000000000001
> [   10.621890]         0000000000000001 900000010ae47828
> d10951fa66920f31 9000000100414300
> [   10.621893]         80000000ffffe34d fffffffffffffffe
> 000000000000034f 000000000000002f
> [   10.621896]         0000000000000063 0000000000000001
> 000000000674c000 9000000002221a50
> [   10.621899]         0000000000000000 0000000000000000
> 90000000020b6500 90000000023dd000
> [   10.621902]         00000000000000e9 0000000000000009
> 0000000000000002 90000000023dd000
> [   10.621905]         900000010ae47d08 0000000000000000
> 90000000002436d4 0000000000000000
> [   10.621908]         00000000000000b0 0000000000000004
> 0000000000000000 0000000000071c1d
> [   10.621910]         ...
> [   10.621912] Call Trace:
> [   10.621913] [<90000000002436d4>] show_stack+0x5c/0x180
> [   10.621918] [<900000000023f328>] dump_stack_lvl+0x6c/0x9c
> [   10.621923] [<9000000000266eb8>] __warn+0x80/0x108
> [   10.621927] [<90000000017d1910>] report_bug+0x158/0x2a8
> [   10.621932] [<900000000180b610>] do_bp+0x2d0/0x340
> [   10.621938] [<9000000000241da0>] handle_bp+0x120/0x1c0
> [   10.621940] [<900000000103be2c>] ahci_enable_ahci+0x64/0xb8
> [   10.621943] [<900000000103beb8>] ahci_save_initial_config+0x38/0x4d8
> [   10.621946] [<90000000010391b4>] ahci_init_one+0x354/0x1088
> [   10.621950] [<9000000000d16cdc>] local_pci_probe+0x44/0xb8
> [   10.621953] [<9000000000286f78>] work_for_cpu_fn+0x18/0x30
> [   10.621956] [<900000000028a840>] process_one_work+0x160/0x330
> [   10.621961] [<900000000028b208>] worker_thread+0x330/0x460
> [   10.621964] [<9000000000295fdc>] kthread+0x11c/0x138
> [   10.621968] [<900000000180b740>] ret_from_kernel_thread+0x28/0xa8
> [   10.621971] [<9000000000241484>] ret_from_kernel_thread_asm+0xc/0x88
> [   10.621973]
> 
> 

This looks like a warning telling us that AHCI enable failed / timed
out. Do you have Panic on Warn on that this directly causes a boot
failure? The only relation I can see with my patch is that maybe this
AHCI device wasn't probed before and somehow isn't working?

Thanks,
Niklas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ