[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8f9c9950-1612-6e2d-388a-ce69cf3aae1a@linux.intel.com>
Date: Tue, 7 Oct 2025 18:43:03 +0300 (EEST)
From: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
To: Val Packett <val@...kett.cool>
cc: linux-pci@...r.kernel.org, Bjorn Helgaas <bhelgaas@...gle.com>,
Krzysztof Wilczyński <kw@...ux.com>,
"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
LKML <linux-kernel@...r.kernel.org>,
Lucas De Marchi <lucas.demarchi@...el.com>
Subject: Re: [PATCH 1/2] PCI: Setup bridge resources earlier
On Mon, 6 Oct 2025, Val Packett wrote:
> On 10/6/25 7:46 AM, Ilpo Järvinen wrote:
> > On Mon, 6 Oct 2025, Val Packett wrote:
> > > On 9/24/25 10:42 AM, Ilpo Järvinen wrote:
> > > > Bridge windows are read twice from PCI Config Space, the first read is
> > > > made from pci_read_bridge_windows() which does not setup the device's
> > > > resources. It causes problems down the road as child resources of the
> > > > bridge cannot check whether they reside within the bridge window or
> > > > not.
> > > >
> > > > Setup the bridge windows already in pci_read_bridge_windows().
> > > >
> > > > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
> > > Looks like this change has broken the WiFi (but not NVMe) on my Snapdragon
> > > X1E
> > > laptop (Latitude 7455):
> > Thanks for the report.
> >
> > > qcom-pcie 1c08000.pci: PCI host bridge to bus 0004:00
> > > pci_bus 0004:00: root bus resource [bus 00-ff]
> > > pci_bus 0004:00: root bus resource [io 0x100000-0x1fffff] (bus address
> > > [0x0000-0xfffff])
> > So this looks the first change visible in the fragment you've taken from
> > the dmesg...
> >
> > > pci_bus 0004:00: root bus resource [mem 0x7c300000-0x7dffffff]
> > > pci 0004:00:00.0: [17cb:0111] type 01 class 0x060400 PCIe Root Port
> > > pci 0004:00:00.0: BAR 0 [mem 0x00000000-0x00000fff]
> > > pci 0004:00:00.0: PCI bridge to [bus 01-ff]
> > ...What I don't understand in these logs is how can the code changed in
> > pci_read_bridge_windows() affect the lines before this line as it is being
> > printed from pci_read_bridge_windows(). Maybe there are more 'PCI bridge
> > to' lines above the quoted part of the dmesg?
>
> Sorry for the confusion, the 0x100000 shift was caused by unrelated changes
> (Qcom/DWC ECAM support) and I wasn't diligent enough with which exact log I
> picked as the working one.
Okay, I certainly couldn't figure out how that could have happened, no
wonder then. :-)
> Here's the actual difference. Good:
>
> ❯ dmesg | rg 0004: | rg brid
> [ 1.780172] qcom-pcie 1c08000.pci: PCI host bridge to bus 0004:00
> [ 1.781930] pci 0004:00:00.0: PCI bridge to [bus 01-ff]
> [ 1.781972] pci 0004:00:00.0: bridge window [io 0x100000-0x100fff]
> [ 1.781998] pci 0004:00:00.0: bridge window [mem 0x00000000-0x000fffff]
> [ 1.782043] pci 0004:00:00.0: bridge window [mem 0x00000000-0x000fffff
> 64bit pref]
> [ 1.800769] pci 0004:00:00.0: PCI bridge to [bus 01-ff]
> [ 1.976893] pcieport 0004:00:00.0: bridge window [mem
> 0x7c400000-0x7c5fffff]: assigned
>
> Bad:
>
> ❯ dmesg | rg 0004: | rg brid
> [ 1.380369] qcom-pcie 1c08000.pci: PCI host bridge to bus 0004:00
> [ 1.442881] pci 0004:00:00.0: PCI bridge to [bus 01-ff]
> [ 1.449496] pci 0004:00:00.0: bridge window [io 0x100000-0x100fff]
> [ 1.462988] pci 0004:00:00.0: bridge window [mem 0x00000000-0x000fffff]
> [ 1.476661] pci 0004:00:00.0: bridge window [mem 0x00000000-0x000fffff
> 64bit pref]
> [ 1.502299] pci 0004:00:00.0: bridge window [mem 0x7c300000-0x7c3fffff]:
> assigned
> [ 1.509028] pci 0004:00:00.0: bridge window [mem 0x7c400000-0x7c4fffff
> 64bit pref]: assigned
> [ 1.509057] pci 0004:00:00.0: bridge window [io 0x100000-0x100fff]:
> assigned
> [ 1.509085] pci 0004:00:00.0: PCI bridge to [bus 01-ff]
> [ 1.509099] pci 0004:00:00.0: bridge window [io 0x100000-0x100fff]
> [ 1.509124] pci 0004:00:00.0: bridge window [mem 0x7c300000-0x7c3fffff]
> [ 1.509133] pci 0004:00:00.0: bridge window [mem 0x7c400000-0x7c4fffff
> 64bit pref]
Interpreting these logs is usually hard even when given the whole log, it
becomes even harder when they're filtered so that many lines of interest
are not shown (I tried to correlate the working case to the previous
"wrong" "correct" log but I've no guarantee the rest would remain same).
> I've also added log lines to pci_read_bridge_bases where the other calls to
> the same pci_read_bridge_* functions are called, and turns out they did *not*
> happen.
>
> So it seems to me that the good reason you were wondering about for why the
> resources were not set up in pci_read_bridge_windows, is that they must not be
> set up unconditionally!
>
> I think it's that early check in pci_read_bridge_bases that avoids the setup
> here:
>
> if (pci_is_root_bus(child)) /* It's a host bus, nothing to read */
> return;
If there's a PCI device as is the case in pci_read_bridge_windows()
which inputs non-NULL pci_dev, the config space of that device can be read
normally (or should be readable normally, AFAIK). The case where bus->self
is NULL is different, we can't read from a non-existing PCI device, but
it doesn't apply to pci_read_bridge_windows().
I don't think reading the window is the real issue here but how the
resource fitting algorithm corners itself by reserving space for bridge
windows before it knows their sizes, so basically these lines from the
earlier email:
pci 0004:00:00.0: bridge window [mem 0x7c300000-0x7c3fffff]: assigned
pci 0004:00:00.0: bridge window [mem 0x7c400000-0x7c4fffff 64bit pref]: assigned
pci 0004:00:00.0: BAR 0 [mem 0x7c500000-0x7c500fff]: assigned
...which seem to occur before the child buses have been scanned so that
space reserved is either hotplug reservation or due to "old_size" lower
bounding. That non-prefetchable bridge window is too small to fit the
child resources.
Could you try passing pci=hpmemsize=0M to kernel command line if that
helps?
The other case is the "old_size" in calculate_memsize() which too can
cause the same effect preventing sizing bridge window truly to zero when
it's not needed (== disable it == not assign it at all at that point).
Forcing it to zero would perhaps be worth a test (or removing the max()
related to old_size)
I've no idea why the old_size should decide anything, I hate that black
magic but I've just not dared to remove it (it's hard to know why some
things were made in the past, there could have been some HW issue worked
around by such odd feature but it's so old code that there isn't any real
information about whys anymore to find).
pci=realloc on command line might help too, but I'm not sure. There seems
to be some extra space within the root bus resource so it might work.
I'm not sure what call chain is causing the assignment of those 3 bridge
windows. One easy way to find out where it comes from would be to put
WARN_ON(res->start == 0x7c400000); into pci_assign_resource() next to the
line which prints "...: assigned".
--
i.
Powered by blists - more mailing lists