lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <298aaf6b2815e59d1a94efffdd0e3b002c000cea.camel@linux.ibm.com>
Date: Fri, 28 Nov 2025 14:30:40 +0100
From: Niklas Schnelle <schnelle@...ux.ibm.com>
To: Huacai Chen <chenhuacai@...nel.org>,
        Tianrui Zhao
 <zhaotianrui@...ngson.cn>,
        Bibo Mao <maobibo@...ngson.cn>, Bjorn Helgaas
 <bhelgaas@...gle.com>
Cc: Jan Kiszka <jan.kiszka@...mens.com>,
        linux-s390
 <linux-s390@...r.kernel.org>, loongarch@...ts.linux.dev,
        Farhan Ali
 <alifm@...ux.ibm.com>,
        Matthew Rosato	 <mjrosato@...ux.ibm.com>,
        Gerald
 Schaefer <gerald.schaefer@...ux.ibm.com>,
        Heiko Carstens
 <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>,
        Alexander Gordeev
 <agordeev@...ux.ibm.com>,
        Sven Schnelle <svens@...ux.ibm.com>,
        Christian
 Borntraeger <borntraeger@...ux.ibm.com>,
        Gerd Bayer	
 <gbayer@...ux.ibm.com>, linux-kernel@...r.kernel.org,
        linux-pci@...r.kernel.org
Subject: Re: [PATCH v5 1/2] PCI: Fix isolated PCI function probing with ARI
 and SR-IOV

On Mon, 2025-11-10 at 14:08 +0100, Niklas Schnelle wrote:
> On Fri, 2025-11-07 at 15:19 +0800, Huacai Chen wrote:
> > On Wed, Nov 5, 2025 at 5:46 PM Niklas Schnelle <schnelle@...ux.ibm.com> wrote:
> > > 
> > > On Wed, 2025-11-05 at 09:01 +0800, Huacai Chen wrote:
> > > > On Mon, Nov 3, 2025 at 7:23 PM Niklas Schnelle <schnelle@...ux.ibm.com> wrote:
> > > > > 
> > > > > On Mon, 2025-11-03 at 17:50 +0800, Huacai Chen wrote:
> > > > > > Hi, Niklas,
> > > > > > 
> > > > > > On Wed, Oct 29, 2025 at 5:42 PM Niklas Schnelle <schnelle@...ux.ibm.com> wrote:
--- snip ---
> > > > > > > 
> > > > > > > Still especially the first issue prevents correct detection of ARI and
> > > > > > > the second might be a problem for other users of isolated function
> > > > > > > probing. Fix both issues by keeping things as simple as possible. If
> > > > > > > isolated function probing is enabled simply scan every possible devfn.
> > > > > > I'm very sorry, but applying this patch on top of commit a02fd05661d7
> > > > > > ("PCI: Extend isolated function probing to LoongArch") we fail to
> > > > > > boot.
> > > > > > 
> > > > > > Boot log:
> > > > > > 
--- snip ---
> > > > > 
> > > > > 
> > > > > This looks like a warning telling us that AHCI enable failed / timed
> > > > > out. Do you have Panic on Warn on that this directly causes a boot
> > > > > failure? The only relation I can see with my patch is that maybe this
> > > > > AHCI device wasn't probed before and somehow isn't working?
> > > > The rootfs is on the AHCI controller, so AHCI failure causes the boot
> > > > failure, without this patch no boot problems.
> > > > 
> > > > Huacai
> > > > 
> > > 
> > > Ok, I'm going to need more details to make sense of this. Can you tell
> > > me if ARI is enabled for that bus? Did you test with both patches or
> > > just this one? Could you provide lspci -vv from a good boot and can you
> > > tell which AHCI device the root device is on? Also could you clarify
> > > why you set hypervisor_isolated_pci_functions() in particular this
> > > seems like a bare metal boot, right? When running in KVM do you pass-
> > > through individual PCI functions with the guest seeing a devfn other
> > > than 0 alone, i.e. a missing devfn 0? Or do you need this for bare
> > > metal for some reason? If you don't need it for bare metal, does the
> > > boot work if you return 0 from hypervisor_isolated_pci_functions() with
> > > this patch?
> > 1. ARI isn't enabled.
> > 2. Only test the first patch.
> > 3. This is a bare metal boot.
> > 4. If hypervisor_isolated_pci_functions() return 0 then boot is OK
> > 5. PCI information please see the attachment.
> > 
> > Huacai
> 
> Thanks for the input. As far as I can see the lspci from a good boot
> shows no holes in your devfn space so this particular system doesn't
> seem to need the isolated function probing at all. But even then using
> it should only try out all devfns and thus never skip one that is found
> without isolated function probing.
> 
> To sanity check this, I just booted my personal AMD Ryzen 3900X system
> with this series plus a two-liner to unconditionally enable isolated
> function probing also on x86_64 and it came up fine including AMD
> graphics and my Intel NIC with enabled SR-IOV. 
> 
> So I'm really perplexed and coming back to the thought that a device on
> your system is misbehaving when probing is attempted and maybe due to a
> similar issue as what I saw with SR-IOV it wasn't probed so far but
> really should be probed if isolated function probing is enabled. I also
> still don't understand your use-case. If it is for VMs then maybe you
> could limit it to those? Otherwise it feels like this is just a hack to
> probe an odd topology and I wonder if you should rather set
> PCI_SCAN_ALL_PCIE_DEVS to find those?
> 
> Thanks,
> Niklas

Hi LoongArch Maintainers, Hi Bjorn,

Sorry for the ping but I'd really like to somehow get this unstuck and
I haven't heard back on my previous questions. From my testing on s390
this patch fixes a real logic error which prevents the scanning of some
devfns which I believe should be scanned if isolated functions are
possible. And in all my testing, including on x86 as stated in the
previous mail, the code does exactly what I think it is supposed to do.
So to me it really looks like something goes wrong with your use of
hypervisor_isolated_pci_functions() on your specific hardware.

One idea I had is if maybe you need to somehow exclude known empty
slots in you config space accessors?

And just in general I'd really like to better understand your use-case
for the isolated PCI functions. And speaking of that, I'm sorry for
having been so blunt in my last mail saying that it felt like a hack.
I'm just worried, that we've run into incompatible interpretations or
uses of this feature that now prevent us from fixing actual bugs.

Thanks in advance,
Niklas Schnelle

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ