[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0899e629-eaaf-1000-72b5-52ad977677a8@manjaro.org>
Date: Wed, 15 Oct 2025 01:33:35 +0200
From: "Dragan Simic" <dsimic@...jaro.org>
To: "Bjorn Helgaas" <helgaas@...nel.org>
Cc: "FUKAUMI Naoki" <naoki@...xa.com>, manivannan.sadhasivam@....qualcomm.com, "Bjorn Helgaas" <bhelgaas@...gle.com>, "Manivannan Sadhasivam" <mani@...nel.org>, "Lorenzo Pieralisi" <lpieralisi@...nel.org>, Krzysztof WilczyĆski <kwilczynski@...nel.org>, "Rob Herring" <robh@...nel.org>, linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org, linux-arm-msm@...r.kernel.org, David E. Box <david.e.box@...ux.intel.com>, "Kai-Heng Feng" <kai.heng.feng@...onical.com>, Rafael J. Wysocki <rafael@...nel.org>, "Heiner Kallweit" <hkallweit1@...il.com>, "Chia-Lin Kao" <acelan.kao@...onical.com>, linux-rockchip@...ts.infradead.org, regressions@...ts.linux.dev
Subject: Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM
and Clock PM states set by BIOS for devicetree platforms
Hello all,
On Tuesday, October 14, 2025 20:49 CEST, Bjorn Helgaas <helgaas@...nel.org> wrote:
> On Wed, Oct 15, 2025 at 01:30:16AM +0900, FUKAUMI Naoki wrote:
> > I've noticed an issue on Radxa ROCK 5A/5B boards, which are based on the
> > Rockchip RK3588(S) SoC.
> >
> > When running Linux v6.18-rc1 or linux-next since 20250924, the kernel either
> > freezes or fails to probe M.2 Wi-Fi modules. This happens with several
> > different modules I've tested, including the Realtek RTL8852BE, MediaTek
> > MT7921E, and Intel AX210.
> >
> > I've found that reverting the following commit (i.e., the patch I'm replying
> > to) resolves the problem:
> > commit f3ac2ff14834a0aa056ee3ae0e4b8c641c579961
>
> Thanks for the report, and sorry for the regression.
>
> Since this affects several devices from different manufacturers and (I
> assume) different drivers, it seems likely that there's some issue
> with the Rockchip end, since ASPM probably works on these devices in
> other systems. So we should figure out if there's something wrong
> with the way we configure ASPM, which we could potentially fix, or if
> there's a hardware issue and we need some king of quirk to prevent
> usage of ASPM on the affected platforms.
>
> Can you collect a complete dmesg log when booting with
>
> ignore_loglevel pci=earlydump dyndbg="file drivers/pci/* +p"
>
> and the output of "sudo lspci -vv"?
>
> When the kernel freezes, can you give us any information about where,
> e.g., a log or screenshot?
>
> Do you know if any platforms other than Radxa ROCK 5A/5B have this
> problem?
After thinking quite a bit about it, I think we should revert this
patch and replace it with another patch that allows per-SoC, or
maybe even per-board, opting into the forced enablement of PCIe
ASPM. Let me explain, please.
When a new feature is introduced, it's expected that it may fail
on some hardware or with some specific setups, so quirking off such
instances, as time passes, is perfectly fine. Such a new feature
didn't work before it was implemented, so it's acceptable that it
fails in some instances after the introduction, and that it gets
quirked off as time passes and more testing is performed.
However, when some widespread feature, such as PCIe, has already
been in production for quite a while, introducing high-risk changes
to it in a blanket fashion, while intending to have the incompatible
or not-yet-ready platforms quirked off over time, simply isn't the
way to go. Breaking stuff intentionally to find out what actually
doesn't work is rarely a good option.
Thus, I'd suggest that this patch is replaced with nother patches,
which would introduce an additional ASPM opt-in switch to the PCI
binding, allowing SoCs or boards to have it enabled _after_ proper
testing is performed. The PCIe driver may emit a warning that ASPM
is to be enabled at some point in the future, to "bug" people about
the need to perform the testing, etc. With all that in place, we
could expect that in a year or two PCIe ASPM could eventually be
enabled everywhere. Getting everything tested is a massive endeavor,
but that's the only way not to break stuff.
Biting the bullet and hoping that it all goes well, I'd say, isn't
the right approach here.
Powered by blists - more mailing lists