[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZYWTyQpVXGFqCLZa@smile.fi.intel.com>
Date: Fri, 22 Dec 2023 15:48:57 +0200
From: Andy Shevchenko <andriy.shevchenko@...el.com>
To: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
Cc: linux-pci@...r.kernel.org, Bjorn Helgaas <bhelgaas@...gle.com>,
Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
Rob Herring <robh@...nel.org>,
Krzysztof Wilczyński <kw@...ux.com>,
Igor Mammedov <imammedo@...hat.com>, Lukas Wunner <lukas@...ner.de>,
Mika Westerberg <mika.westerberg@...ux.intel.com>,
"Rafael J . Wysocki" <rafael@...nel.org>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 7/7] PCI: Relax bridge window tail sizing rules
On Fri, Dec 22, 2023 at 02:29:01PM +0200, Ilpo Järvinen wrote:
> During remove & rescan cycle, PCI subsystem will recalculate and adjust
> the bridge window sizing that was initially done by "BIOS". The size
> calculation is based on the required alignment of the largest resource
> among the downstream resources as per pbus_size_mem() (unimportant or
> zero parameters marked with "..."):
>
> min_align = calculate_mem_align(aligns, max_order);
> size0 = calculate_memsize(size, ..., min_align);
>
> and then in calculate_memsize():
> size = ALIGN(max(size, ...) + ..., align);
>
> If the original bridge window sizing tried to conserve space, this will
> lead to massive increase of the required bridge window size when the
> downstream has a large disparity in BAR sizes. E.g., with 16MiB and
> 16GiB BARs this results in 32GiB bridge window size even if 16MiB BAR
> does not require gigabytes of space to fit.
>
> When doing remove & rescan for a bus that contains such a PCI device, a
> larger bridge window is suddenly required on rescan but when there is a
> bridge window upstream that is already assigned based on the original
> size, it cannot be enlarged to the new requirement. This causes the
> allocation of the bridge window to fail (0x600000000 > 0x400ffffff):
>
> pci 0000:02:01.0: PCI bridge to [bus 03]
> pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff]
> pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]
> pci 0000:01:00.0: PCI bridge to [bus 02-04]
> pci 0000:01:00.0: bridge window [mem 0x40400000-0x406fffff]
> pci 0000:01:00.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]
> ...
> pci_bus 0000:03: busn_res: [bus 03] is released
> pci 0000:03:00.0: reg 0x10: [mem 0x6400000000-0x6400ffffff 64bit pref]
> pci 0000:03:00.0: reg 0x18: [mem 0x6000000000-0x63ffffffff 64bit pref]
> pci 0000:03:00.0: reg 0x30: [mem 0x40400000-0x405fffff pref]
> pci 0000:02:01.0: PCI bridge to [bus 03]
> pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff]
> pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]
> pci 0000:02:01.0: BAR 9: no space for [mem size 0x600000000 64bit pref]
> pci 0000:02:01.0: BAR 9: failed to assign [mem size 0x600000000 64bit pref]
> pci 0000:02:01.0: BAR 8: assigned [mem 0x40400000-0x405fffff]
> pci 0000:03:00.0: BAR 2: no space for [mem size 0x400000000 64bit pref]
> pci 0000:03:00.0: BAR 2: failed to assign [mem size 0x400000000 64bit pref]
> pci 0000:03:00.0: BAR 0: no space for [mem size 0x01000000 64bit pref]
> pci 0000:03:00.0: BAR 0: failed to assign [mem size 0x01000000 64bit pref]
> pci 0000:03:00.0: BAR 6: assigned [mem 0x40400000-0x405fffff pref]
> pci 0000:02:01.0: PCI bridge to [bus 03]
> pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff]
>
> This is a major surprise for users who are suddenly left with a PCIe
> device that was working fine with the original bridge window sizing.
>
> Even if the already assigned bridge window could be enlarged by
> reallocation in some cases (something the current code does not attempt
> to do), it is not possible in general case and the large amount of
> wasted space at the tail of the bridge window may lead to other
> resource exhaustion problems on Root Complex level (think of multiple
> PCIe cards with VFs and BAR size disparity in a single system).
>
> PCI specifications only expect natural alignment for BARs (PCI Express
> Base Specification, rev. 6.1 sect. 7.5.1.2.1) and minimum of 1MiB
> alignment for the bridge window (PCI Express Base Specification,
> rev 6.1 sect. 7.5.1.3). The current bridge window tail alignment rule
> was introduced in the commit 5d0a8965aea9 ("[PATCH] 2.5.14: New PCI
> allocation code (alpha, arm, parisc) [2/2]") that only states:
> "pbus_size_mem: core stuff; tested with randomly generated sets of
> resources". It does not explain the motivation for the extra tail space
> allocated that is not truly needed by the downstream resources. As
> such, it is far from clear if it ever has been required by any HW.
>
> To prevent PCIe cards with BAR size disparity from becoming unusable
> after remove & rescan cycle, attempt to do a truly minimal allocation
> for memory resources if needed. First check if the normally calculated
> bridge window will not fit into an already assigned upstream resource.
> In such case, try with relaxed bridge window tail sizing rules instead
> where no extra tail space is requested beyond what the downstream
> resources require. Only enforce the alignment requirement of the bridge
> window itself (normally 1MiB).
>
> With this patch, the resources are successfully allocated:
>
> pci 0000:02:01.0: PCI bridge to [bus 03]
> pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff]
> pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]
> pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref] to [bus 03] requires relaxed alignment rules
> pci 0000:02:01.0: BAR 9: assigned [mem 0x6000000000-0x6400ffffff 64bit pref]
> pci 0000:02:01.0: BAR 8: assigned [mem 0x40400000-0x405fffff]
> pci 0000:03:00.0: BAR 2: assigned [mem 0x6000000000-0x63ffffffff 64bit pref]
> pci 0000:03:00.0: BAR 0: assigned [mem 0x6400000000-0x6400ffffff 64bit pref]
> pci 0000:03:00.0: BAR 6: assigned [mem 0x40400000-0x405fffff pref]
> pci 0000:02:01.0: PCI bridge to [bus 03]
> pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff]
> pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]
>
> This patch draws inspiration from the initial investigations and work
> by Mika Westerberg.
...
> + struct resource_constraint constraint = { .max = (resource_size_t)~0ULL,
RESOURCE_SIZE_MAX from limits.h.
> + .align = align };
Also I prefer the style
struct resource_constraint constraint = {
.max = RESOURCE_SIZE_MAX,
.align = align,
};
...
> + if (!r || r == &ioport_resource || r == &iomem_resource)
> + continue;
> + if (!r->parent || (r->flags & mask) != type)
Thinking more about these checks, r->parent should be NULL for the root
resources, hence this check basically covers the second part of the above.
But like you said it's a material for a separate investigation.
> + continue;
...
> + pci_dbg(bus->self,
> + "Assigned bridge window %pR to %pR cannot fit 0x%llx required for %s bridging to %pR\n",
> + r, &bus->busn_res,
> + (unsigned long long)size,
Yeah, casting here is a compromise between good looking message and
kernel code.
> + pci_name(downstream->self),
> + &downstream->busn_res);
> + }
...
> + pbus_upstream_assigned_limit(bus, mask | IORESOURCE_PREFETCH, type,
> + size0, add_align)) {
One line?
...
> + size0 = calculate_memsize(size, min_size, 0, 0,
> + resource_size(b_res), win_align);
One line?
--
With Best Regards,
Andy Shevchenko
Powered by blists - more mailing lists