[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <o5rhgujo2ftmmnqdwpmbr44nlm6ygrouamoxziomlhnsbkww34@cp2zgmc4luhz>
Date: Fri, 24 Oct 2025 18:00:28 -0500
From: Lucas De Marchi <lucas.demarchi@...el.com>
To: Bjorn Helgaas <helgaas@...nel.org>
CC: <intel-xe@...ts.freedesktop.org>, <linux-pci@...r.kernel.org>,
<dri-devel@...ts.freedesktop.org>, Ilpo Järvinen
<ilpo.jarvinen@...ux.intel.com>, Icenowy Zheng <uwu@...nowy.me>, Vivian Wang
<wangruikang@...as.ac.cn>, Thomas Hellström
<thomas.hellstrom@...ux.intel.com>, Rodrigo Vivi <rodrigo.vivi@...el.com>,
Bjorn Helgaas <bhelgaas@...gle.com>, Simon Richter
<Simon.Richter@...yros.de>, <linux-kernel@...r.kernel.org>,
<stable@...r.kernel.org>
Subject: Re: [PATCH 1/2] PCI: Release BAR0 of an integrated bridge to allow
GPU BAR resize
On Fri, Oct 24, 2025 at 05:44:01PM -0500, Bjorn Helgaas wrote:
>On Thu, Sep 18, 2025 at 01:58:56PM -0700, Lucas De Marchi wrote:
>> From: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
>>
>> Resizing BAR to a larger size has to release upstream bridge windows in
>> order make the bridge windows larger as well (and to potential relocate
>> them into a larger free block within iomem space). Some GPUs have an
>> integrated PCI switch that has BAR0. The resource allocation assigns
>> space for that BAR0 as it does for any resource.
>>
>> An extra resource on a bridge will pin its upstream bridge window in
>> place which prevents BAR resize for anything beneath that bridge.
>>
>> Nothing in the pcieport driver provided by PCI core, which typically is
>> the driver bound to these bridges, requires that BAR0. Because of that,
>> releasing the extra BAR does not seem to have notable downsides but
>> comes with a clear upside.
>>
>> Therefore, release BAR0 of such switches using a quirk and clear its
>> flags to prevent any new invocation of the resource assignment
>> algorithm from assigning the resource again.
>>
>> Due to other siblings within the PCI hierarchy of all the devices
>> integrated into the GPU, some other devices may still have to be
>> manually removed before the resize is free of any bridge window pins.
>> Such siblings can be released through sysfs to unpin windows while
>> leaving access to GPU's sysfs entries required for initiating the
>> resize operation, whereas removing the topmost bridge this quirk
>> targets would result in removing the GPU device as well so no manual
>> workaround for this problem exists.
>>
>> Reported-by: Lucas De Marchi <lucas.demarchi@...el.com>
>> Link: https://lore.kernel.org/linux-pci/fl6tx5ztvttg7txmz2ps7oyd745wg3lwcp3h7esmvnyg26n44y@owo2ojiu2mov/
>> Link: https://lore.kernel.org/intel-xe/20250721173057.867829-1-uwu@icenowy.me/
>> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
>> Cc: <stable@...r.kernel.org> # v6.12+
>> Signed-off-by: Lucas De Marchi <lucas.demarchi@...el.com>
>> ---
>>
>> Remarks from Ilpo: this feels quite hacky to me and I'm working towards a
>> better solution which is to consider Resizable BAR maximum size the
>> resource fitting algorithm. But then, I don't expect the better solution
>> to be something we want to push into stable due to extremely invasive
>> dependencies. So maybe consider this an interim/legacy solution to the
>> resizing problem and remove it once the algorithmic approach works (or
>> more precisely retain it only in the old kernel versions).
>> ---
>> drivers/pci/quirks.c | 23 +++++++++++++++++++++++
>> 1 file changed, 23 insertions(+)
>>
>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> index d97335a401930..9b1c08de3aa89 100644
>> --- a/drivers/pci/quirks.c
>> +++ b/drivers/pci/quirks.c
>> @@ -6338,3 +6338,26 @@ static void pci_mask_replay_timer_timeout(struct pci_dev *pdev)
>> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9750, pci_mask_replay_timer_timeout);
>> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9755, pci_mask_replay_timer_timeout);
>> #endif
>> +
>> +/*
>> + * PCI switches integrated into Intel Arc GPUs have BAR0 that prevents
>> + * resizing the BARs of the GPU device due to that bridge BAR0 pinning the
>> + * bridge window it's under in place. Nothing in pcieport requires that
>> + * BAR0.
>> + *
>> + * Release and disable BAR0 permanently by clearing its flags to prevent
>> + * anything from assigning it again.
>
>Does "disabling BAR0" actually work? This quirk keeps the PCI core
>from assigning resources to the BAR, but I don't think we have a way
>to actually disable an individual BAR, do we?
>
>I think the only control is PCI_COMMAND_MEMORY, and the bridge must
>have PCI_COMMAND_MEMORY enabled so memory accesses to downstream
>devices work.
>
>No matter what we do to the struct resource, the hardware BAR still
>contains some address, and the bridge will decode any accesses that
>match the address in the BAR.
there's no real use for that BAR, so I don't think it matters if the hw
will still decode accesses to it - in this case I don't think it's
really important to really disable it. As long as pci can manage the
resources and not block the move of endpoint's BARs, it should work.
These 2 patches definitely fixed the rebar for me in a system with
Battle Mage GPU. I don't have access to it right now, but I can dig more
info about it if it's needed.
>
>Maybe we could effectively disable the BAR by setting it to some
>impossible address, i.e., something outside both the upstream and
>downstream bridge windows so memory accesses could never be routed to
>it?
yeah, I guess that would be possible, but given the above, do you see
any advantage on that?
thanks
Lucas De Marchi
>
>> + */
>> +static void pci_release_bar0(struct pci_dev *pdev)
>> +{
>> + struct resource *res = pci_resource_n(pdev, 0);
>> +
>> + if (!res->parent)
>> + return;
>> +
>> + pci_release_resource(pdev, 0);
>> + res->flags = 0;
>> +}
>> +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL, 0x4fa0, pci_release_bar0);
>> +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL, 0x4fa1, pci_release_bar0);
>> +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL, 0xe2ff, pci_release_bar0);
>>
>> --
>> 2.50.1
>>
Powered by blists - more mailing lists