[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <w3gGcmhWNmeGetzLnhgkjfx0JTEyIOKN5sDu-uShZ_7JWthMgGP6plgDuhDbkYyaA7vtGbdl1WbMTZ5zM80OyJoqUa69krqDpuhqDangkLY=@r26.me>
Date: Mon, 09 Jun 2025 14:22:55 +0000
From: Rio Liu <rio@....me>
To: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>, "regressions@...ts.linux.dev" <regressions@...ts.linux.dev>, "amd-gfx@...ts.freedesktop.org" <amd-gfx@...ts.freedesktop.org>
Subject: Re: [REGRESSION] amdgpu fails to load external RX 580 since PCI: Allow relaxed bridge window tail sizing for optional resources
On Monday, June 9th, 2025 at AM 5:09, Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com> wrote:
>
>
> On Mon, 9 Jun 2025, rio@....me wrote:
>
> > Hello,
> >
> > I have an external Radeon RX580 on my machine connected via Thunderbolt, and
> > since upgrading from 6.14.1 the setup stopped working. Dmesg showed warning from
> > resource sanity check, followed by a stack trace https://pastebin.com/njR55rQW.
> > Relevant snippet:
> >
> > [ 12.134907] amdgpu 0000:06:00.0: BAR 2 [mem 0x6000000000-0x60001fffff 64bit pref]: releasing
> > [ 12.134910] [drm:amdgpu_device_resize_fb_bar [amdgpu]] ERROR Problem resizing BAR0 (-16).
> > [ 12.135456] amdgpu 0000:06:00.0: BAR 2 [mem 0x6000000000-0x60001fffff 64bit pref]: assigned
> > [ 12.135524] amdgpu 0000:06:00.0: amdgpu: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
> > [ 12.135527] amdgpu 0000:06:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
> > [ 12.135536] resource: resource sanity check: requesting [mem 0x0000000000000000-0xffffffffffffffff], which spans more than PCI Bus 0000:00 [mem 0x000a0000-0x000bffff window]
> > [ 12.135542] ------------[ cut here ]------------
> > [ 12.135543] WARNING: CPU: 6 PID: 599 at arch/x86/mm/pat/memtype.c:721 memtype_reserve_io+0xfc/0x110
> > [ 12.135551] Modules linked in: ccm amdgpu(+) snd_hda_codec_realtek ...
> > [ 12.135652] CPU: 6 UID: 0 PID: 599 Comm: (udev-worker) Tainted: G S 6.15.0-13743-g8630c59e9936 #16 PREEMPT(full) 3b462c924b3ffd8156fc3b77bcc8ddbf7257fa57
> > [ 12.135654] Tainted: [S]=CPU_OUT_OF_SPEC
> > [ 12.135655] Hardware name: COPELION INTERNATIONAL INC. ZX Series/ZX Series, BIOS 1.07.08TCOP3 03/27/2020
> > [ 12.135656] RIP: 0010:memtype_reserve_io+0xfc/0x110
> > [ 12.135659] Code: aa fb ff ff b8 f0 ff ff ff eb 88 8b 54 24 04 4c 89 ee 48 89 df e8 04 fe ff ff 85 c0 75 db 8b 54 24 04 41 89 16 e9 69 ff ff ff <0f> 0b e9 4b ff ff ff e8 b8 5c fc 00 0f 1f 84 00 00 00 00 00 90 90
> >
> > Bisecting the stable branch pointed me to the following commit:
> >
> > commit 22df32c984be9e9145978acf011642da042a2af3 (HEAD)
> > Author: Ilpo Järvinen ilpo.jarvinen@...ux.intel.com
> > Date: Mon Dec 16 19:56:11 2024 +0200
> >
> > PCI: Allow relaxed bridge window tail sizing for optional resources
> >
> > [ Upstream commit 67f9085596ee55dd27b540ca6088ba0717ee511c ]
> >
> > I've tested on stable (as of now 8630c59e99363c4b655788fd01134aef9bcd9264), and
> > the issue persists. Reverting the offending commit via `git revert -n 22df32c984be9e9145978acf011642da042a2af3` allowed amdgpu to load again.
> > Dmesg: https://pastebin.com/xd76rDsW.
> >
> > Additional information
> > - Distribution: Artix
> > - Arch: x86_64
> > - Kernel config: https://pastebin.com/DWSERJL5
> > - eGPU adapter: https://www.adt.link/product/R43SG-TB3.html
> > - Booting with pci=realloc,hpbussize=0x33,hpmmiosize=256M,hpmmioprefsize=1G
> >
> > I'm reporting here as these are the contacts from the commit message.
> > Please let me know if there's a more appropriate place for this, as well
> > as any more information I can provide.
>
>
> Hi Rio,
>
> Thanks for the report and I'm sorry about causing this issue. Could you
> please try if the patch below solves the issue.
>
> --
> From b94823a193032b5f87114cff9e8edc5c67e4ef40 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= ilpo.jarvinen@...ux.intel.com
>
> Date: Mon, 9 Jun 2025 12:05:20 +0300
> Subject: [PATCH 1/1] PCI: Relaxed alignment should never increase min_align
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> When using relaxed tail alignment for the bridge window,
> pbus_size_mem() also tries to minimize min_align, which can under
> certain scenarios end up increasing min_align from that found by
> calculate_mem_align().
>
> Ensure min_align is not increased by the relaxed tail alignment.
>
> Eventually, it would be better to add calculate_relaxed_head_align()
> similar to calculate_mem_align() which finds out what alignment can be
> used for the head without introducing any gaps into the bridge window
> to give flexibility on head address too. But that looks relatively
> complex algorithm so it requires much more testing than fixing the
> immediate problem causing a regression.
>
> Reported-by: Rio rio@....me
>
> Signed-off-by: Ilpo Järvinen ilpo.jarvinen@...ux.intel.com
>
> ---
> drivers/pci/setup-bus.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 07c3d021a47e..f90d49cd07da 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1169,6 +1169,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
> resource_size_t children_add_size = 0;
> resource_size_t children_add_align = 0;
> resource_size_t add_align = 0;
> + resource_size_t relaxed_align;
>
> if (!b_res)
> return -ENOSPC;
> @@ -1246,8 +1247,9 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
> if (bus->self && size0 &&
>
> !pbus_upstream_space_available(bus, mask | IORESOURCE_PREFETCH, type,
> size0, min_align)) {
> - min_align = 1ULL << (max_order + __ffs(SZ_1M));
> - min_align = max(min_align, win_align);
> + relaxed_align = 1ULL << (max_order + __ffs(SZ_1M));
> + relaxed_align = max(relaxed_align, win_align);
> + min_align = min(min_align, relaxed_align);
> size0 = calculate_memsize(size, min_size, 0, 0, resource_size(b_res), win_align);
> pci_info(bus->self, "bridge window %pR to %pR requires relaxed alignment rules\n",
>
> b_res, &bus->busn_res);
>
> @@ -1261,8 +1263,9 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
> if (bus->self && size1 &&
>
> !pbus_upstream_space_available(bus, mask | IORESOURCE_PREFETCH, type,
> size1, add_align)) {
> - min_align = 1ULL << (max_order + __ffs(SZ_1M));
> - min_align = max(min_align, win_align);
> + relaxed_align = 1ULL << (max_order + __ffs(SZ_1M));
> + relaxed_align = max(min_align, win_align);
> + min_align = min(min_align, relaxed_align);
> size1 = calculate_memsize(size, min_size, add_size, children_add_size,
> resource_size(b_res), win_align);
> pci_info(bus->self,
>
>
> base-commit: 3719a04a80caf660f899a462cd8f3973bcfa676e
> --
> 2.39.5
Hello Ilpo,
I've tested the patch and it seems to fix the issue. Thank you!
Rio Liu
Powered by blists - more mailing lists