lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAAhV-H7fgaZUuFSpE0VsMtptnrUTzh0TS=B9ZBUZ_=TH-XjKtg@mail.gmail.com>
Date: Mon, 1 Dec 2025 22:45:07 +0800
From: Huacai Chen <chenhuacai@...nel.org>
To: Niklas Schnelle <schnelle@...ux.ibm.com>
Cc: Tianrui Zhao <zhaotianrui@...ngson.cn>, Bibo Mao <maobibo@...ngson.cn>, 
	Bjorn Helgaas <bhelgaas@...gle.com>, Jan Kiszka <jan.kiszka@...mens.com>, 
	linux-s390 <linux-s390@...r.kernel.org>, loongarch@...ts.linux.dev, 
	Farhan Ali <alifm@...ux.ibm.com>, Matthew Rosato <mjrosato@...ux.ibm.com>, 
	Gerald Schaefer <gerald.schaefer@...ux.ibm.com>, Heiko Carstens <hca@...ux.ibm.com>, 
	Vasily Gorbik <gor@...ux.ibm.com>, Alexander Gordeev <agordeev@...ux.ibm.com>, 
	Sven Schnelle <svens@...ux.ibm.com>, Christian Borntraeger <borntraeger@...ux.ibm.com>, 
	Gerd Bayer <gbayer@...ux.ibm.com>, linux-kernel@...r.kernel.org, 
	linux-pci@...r.kernel.org
Subject: Re: [PATCH v5 1/2] PCI: Fix isolated PCI function probing with ARI
 and SR-IOV

On Fri, Nov 28, 2025 at 9:30 PM Niklas Schnelle <schnelle@...ux.ibm.com> wrote:
>
> On Mon, 2025-11-10 at 14:08 +0100, Niklas Schnelle wrote:
> > On Fri, 2025-11-07 at 15:19 +0800, Huacai Chen wrote:
> > > On Wed, Nov 5, 2025 at 5:46 PM Niklas Schnelle <schnelle@...ux.ibm.com> wrote:
> > > >
> > > > On Wed, 2025-11-05 at 09:01 +0800, Huacai Chen wrote:
> > > > > On Mon, Nov 3, 2025 at 7:23 PM Niklas Schnelle <schnelle@...ux.ibm.com> wrote:
> > > > > >
> > > > > > On Mon, 2025-11-03 at 17:50 +0800, Huacai Chen wrote:
> > > > > > > Hi, Niklas,
> > > > > > >
> > > > > > > On Wed, Oct 29, 2025 at 5:42 PM Niklas Schnelle <schnelle@...ux.ibm.com> wrote:
> --- snip ---
> > > > > > > >
> > > > > > > > Still especially the first issue prevents correct detection of ARI and
> > > > > > > > the second might be a problem for other users of isolated function
> > > > > > > > probing. Fix both issues by keeping things as simple as possible. If
> > > > > > > > isolated function probing is enabled simply scan every possible devfn.
> > > > > > > I'm very sorry, but applying this patch on top of commit a02fd05661d7
> > > > > > > ("PCI: Extend isolated function probing to LoongArch") we fail to
> > > > > > > boot.
> > > > > > >
> > > > > > > Boot log:
> > > > > > >
> --- snip ---
> > > > > >
> > > > > >
> > > > > > This looks like a warning telling us that AHCI enable failed / timed
> > > > > > out. Do you have Panic on Warn on that this directly causes a boot
> > > > > > failure? The only relation I can see with my patch is that maybe this
> > > > > > AHCI device wasn't probed before and somehow isn't working?
> > > > > The rootfs is on the AHCI controller, so AHCI failure causes the boot
> > > > > failure, without this patch no boot problems.
> > > > >
> > > > > Huacai
> > > > >
> > > >
> > > > Ok, I'm going to need more details to make sense of this. Can you tell
> > > > me if ARI is enabled for that bus? Did you test with both patches or
> > > > just this one? Could you provide lspci -vv from a good boot and can you
> > > > tell which AHCI device the root device is on? Also could you clarify
> > > > why you set hypervisor_isolated_pci_functions() in particular this
> > > > seems like a bare metal boot, right? When running in KVM do you pass-
> > > > through individual PCI functions with the guest seeing a devfn other
> > > > than 0 alone, i.e. a missing devfn 0? Or do you need this for bare
> > > > metal for some reason? If you don't need it for bare metal, does the
> > > > boot work if you return 0 from hypervisor_isolated_pci_functions() with
> > > > this patch?
> > > 1. ARI isn't enabled.
> > > 2. Only test the first patch.
> > > 3. This is a bare metal boot.
> > > 4. If hypervisor_isolated_pci_functions() return 0 then boot is OK
> > > 5. PCI information please see the attachment.
> > >
> > > Huacai
> >
> > Thanks for the input. As far as I can see the lspci from a good boot
> > shows no holes in your devfn space so this particular system doesn't
> > seem to need the isolated function probing at all. But even then using
> > it should only try out all devfns and thus never skip one that is found
> > without isolated function probing.
> >
> > To sanity check this, I just booted my personal AMD Ryzen 3900X system
> > with this series plus a two-liner to unconditionally enable isolated
> > function probing also on x86_64 and it came up fine including AMD
> > graphics and my Intel NIC with enabled SR-IOV.
> >
> > So I'm really perplexed and coming back to the thought that a device on
> > your system is misbehaving when probing is attempted and maybe due to a
> > similar issue as what I saw with SR-IOV it wasn't probed so far but
> > really should be probed if isolated function probing is enabled. I also
> > still don't understand your use-case. If it is for VMs then maybe you
> > could limit it to those? Otherwise it feels like this is just a hack to
> > probe an odd topology and I wonder if you should rather set
> > PCI_SCAN_ALL_PCIE_DEVS to find those?
> >
> > Thanks,
> > Niklas
>
> Hi LoongArch Maintainers, Hi Bjorn,
>
> Sorry for the ping but I'd really like to somehow get this unstuck and
> I haven't heard back on my previous questions. From my testing on s390
> this patch fixes a real logic error which prevents the scanning of some
> devfns which I believe should be scanned if isolated functions are
> possible. And in all my testing, including on x86 as stated in the
> previous mail, the code does exactly what I think it is supposed to do.
> So to me it really looks like something goes wrong with your use of
> hypervisor_isolated_pci_functions() on your specific hardware.
>
> One idea I had is if maybe you need to somehow exclude known empty
> slots in you config space accessors?
>
> And just in general I'd really like to better understand your use-case
> for the isolated PCI functions. And speaking of that, I'm sorry for
> having been so blunt in my last mail saying that it felt like a hack.
> I'm just worried, that we've run into incompatible interpretations or
> uses of this feature that now prevent us from fixing actual bugs.
Sorry for the late reply, Let me describe what problem LoongArch has.

You said that "it feels like this is just a hack to probe an odd
topology". Yes, to some extent you are right.

1, One of our SoC (LS2K3000) has a special device which has func1 but
without func0. To let the PCI core scan func1 we can only make
hypervisor_isolated_pci_functions() return true.
2, In the above case, PCI_SCAN_ALL_PCIE_DEVS has no help.
3, Though we change hypervisor_isolated_pci_functions() to resolve the
above problem, it also lets us pass isolated PCI functions to a guest
OS instance.

As a summary, for real machines commit a02fd05661d73a850 is a hack to
probe an odd device, for virtual machines it allows passing isolated
PCI functions.



Huacai

>
> Thanks in advance,
> Niklas Schnelle

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ