[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <298aaf6b2815e59d1a94efffdd0e3b002c000cea.camel@linux.ibm.com>
Date: Fri, 28 Nov 2025 14:30:40 +0100
From: Niklas Schnelle <schnelle@...ux.ibm.com>
To: Huacai Chen <chenhuacai@...nel.org>,
Tianrui Zhao
<zhaotianrui@...ngson.cn>,
Bibo Mao <maobibo@...ngson.cn>, Bjorn Helgaas
<bhelgaas@...gle.com>
Cc: Jan Kiszka <jan.kiszka@...mens.com>,
linux-s390
<linux-s390@...r.kernel.org>, loongarch@...ts.linux.dev,
Farhan Ali
<alifm@...ux.ibm.com>,
Matthew Rosato <mjrosato@...ux.ibm.com>,
Gerald
Schaefer <gerald.schaefer@...ux.ibm.com>,
Heiko Carstens
<hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>,
Alexander Gordeev
<agordeev@...ux.ibm.com>,
Sven Schnelle <svens@...ux.ibm.com>,
Christian
Borntraeger <borntraeger@...ux.ibm.com>,
Gerd Bayer
<gbayer@...ux.ibm.com>, linux-kernel@...r.kernel.org,
linux-pci@...r.kernel.org
Subject: Re: [PATCH v5 1/2] PCI: Fix isolated PCI function probing with ARI
and SR-IOV
On Mon, 2025-11-10 at 14:08 +0100, Niklas Schnelle wrote:
> On Fri, 2025-11-07 at 15:19 +0800, Huacai Chen wrote:
> > On Wed, Nov 5, 2025 at 5:46 PM Niklas Schnelle <schnelle@...ux.ibm.com> wrote:
> > >
> > > On Wed, 2025-11-05 at 09:01 +0800, Huacai Chen wrote:
> > > > On Mon, Nov 3, 2025 at 7:23 PM Niklas Schnelle <schnelle@...ux.ibm.com> wrote:
> > > > >
> > > > > On Mon, 2025-11-03 at 17:50 +0800, Huacai Chen wrote:
> > > > > > Hi, Niklas,
> > > > > >
> > > > > > On Wed, Oct 29, 2025 at 5:42 PM Niklas Schnelle <schnelle@...ux.ibm.com> wrote:
--- snip ---
> > > > > > >
> > > > > > > Still especially the first issue prevents correct detection of ARI and
> > > > > > > the second might be a problem for other users of isolated function
> > > > > > > probing. Fix both issues by keeping things as simple as possible. If
> > > > > > > isolated function probing is enabled simply scan every possible devfn.
> > > > > > I'm very sorry, but applying this patch on top of commit a02fd05661d7
> > > > > > ("PCI: Extend isolated function probing to LoongArch") we fail to
> > > > > > boot.
> > > > > >
> > > > > > Boot log:
> > > > > >
--- snip ---
> > > > >
> > > > >
> > > > > This looks like a warning telling us that AHCI enable failed / timed
> > > > > out. Do you have Panic on Warn on that this directly causes a boot
> > > > > failure? The only relation I can see with my patch is that maybe this
> > > > > AHCI device wasn't probed before and somehow isn't working?
> > > > The rootfs is on the AHCI controller, so AHCI failure causes the boot
> > > > failure, without this patch no boot problems.
> > > >
> > > > Huacai
> > > >
> > >
> > > Ok, I'm going to need more details to make sense of this. Can you tell
> > > me if ARI is enabled for that bus? Did you test with both patches or
> > > just this one? Could you provide lspci -vv from a good boot and can you
> > > tell which AHCI device the root device is on? Also could you clarify
> > > why you set hypervisor_isolated_pci_functions() in particular this
> > > seems like a bare metal boot, right? When running in KVM do you pass-
> > > through individual PCI functions with the guest seeing a devfn other
> > > than 0 alone, i.e. a missing devfn 0? Or do you need this for bare
> > > metal for some reason? If you don't need it for bare metal, does the
> > > boot work if you return 0 from hypervisor_isolated_pci_functions() with
> > > this patch?
> > 1. ARI isn't enabled.
> > 2. Only test the first patch.
> > 3. This is a bare metal boot.
> > 4. If hypervisor_isolated_pci_functions() return 0 then boot is OK
> > 5. PCI information please see the attachment.
> >
> > Huacai
>
> Thanks for the input. As far as I can see the lspci from a good boot
> shows no holes in your devfn space so this particular system doesn't
> seem to need the isolated function probing at all. But even then using
> it should only try out all devfns and thus never skip one that is found
> without isolated function probing.
>
> To sanity check this, I just booted my personal AMD Ryzen 3900X system
> with this series plus a two-liner to unconditionally enable isolated
> function probing also on x86_64 and it came up fine including AMD
> graphics and my Intel NIC with enabled SR-IOV.
>
> So I'm really perplexed and coming back to the thought that a device on
> your system is misbehaving when probing is attempted and maybe due to a
> similar issue as what I saw with SR-IOV it wasn't probed so far but
> really should be probed if isolated function probing is enabled. I also
> still don't understand your use-case. If it is for VMs then maybe you
> could limit it to those? Otherwise it feels like this is just a hack to
> probe an odd topology and I wonder if you should rather set
> PCI_SCAN_ALL_PCIE_DEVS to find those?
>
> Thanks,
> Niklas
Hi LoongArch Maintainers, Hi Bjorn,
Sorry for the ping but I'd really like to somehow get this unstuck and
I haven't heard back on my previous questions. From my testing on s390
this patch fixes a real logic error which prevents the scanning of some
devfns which I believe should be scanned if isolated functions are
possible. And in all my testing, including on x86 as stated in the
previous mail, the code does exactly what I think it is supposed to do.
So to me it really looks like something goes wrong with your use of
hypervisor_isolated_pci_functions() on your specific hardware.
One idea I had is if maybe you need to somehow exclude known empty
slots in you config space accessors?
And just in general I'd really like to better understand your use-case
for the isolated PCI functions. And speaking of that, I'm sorry for
having been so blunt in my last mail saying that it felt like a hack.
I'm just worried, that we've run into incompatible interpretations or
uses of this feature that now prevent us from fixing actual bugs.
Thanks in advance,
Niklas Schnelle
Powered by blists - more mailing lists