[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20251202.184120.477191121209248305.rene@exactco.de>
Date: Tue, 02 Dec 2025 18:41:20 +0100 (CET)
From: René Rebe <rene@...ctco.de>
To: helgaas@...nel.org
Cc: linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
bhelgaas@...gle.com, glaubitz@...sik.fu-berlin.de,
riccardo.mottola@...ero.it, mani@...nel.org, briannorris@...omium.org,
rafael@...nel.org, lukas@...ner.de, mario.limonciello@....com
Subject: Re: [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC
systems
Hi,
thank you for your review.
On Tue, 2 Dec 2025 11:28:37 -0600, Bjorn Helgaas <helgaas@...nel.org> wrote:
> [+cc Mani, Brian (a5fb3ff63287 authors), Rafael, Lukas, Mario]
>
> On Tue, Dec 02, 2025 at 05:40:07PM +0100, René Rebe wrote:
> > Commit a5fb3ff63287 ("PCI: Allow PCI bridges to go to D3Hot on all
> > non-x86") was bisected to break various non-x86 RISC Unix systems,
> > e.g. sparc64, see two example oopses below. Fix by only allowing D3Hot
> > on modern ARM64, PPC64 and RISCV ISAs besides new enough x86.
>
> I think we need some kind of analysis of what is happening to the PCI
> devices here. I don't know why the CPU architecture per se would be
> related to PCI power management.
That surely would be the best, but given few maintainers work on older
architectures it might take a while. This is also old hw from before
2015, like the x86 DMI test. Given the commit enabled it for all that
previously failing the dmi year check due:
static inline int dmi_get_bios_year(void) { return -ENXIO; }
Is it not sensible to first reinstate this for such $arch also to
stable trees while we further work on this?
> pci_bridge_d3_possible() is already a barely maintainable hodge podge
> of random things that work and don't work. Generally speaking most of
> those cases relate to firmware.
Fair, but this is a rather simple hotfix, for a simple year chec, for
a commit that just recently broke this systems. I would also expect
this high performance Unix systems might not have been designed or
test with dynamic PCI power management in mind, ...
René
> > Sun Blade 1000:
> > ERROR(0): Cheetah error trap taken afsr[0010080005000000] afar[000007f900800000] TL1(0)
> > ERROR(0): TPC[100a05a4] TNPC[100a05a8] O7[42acc8] TSTATE[4411001603]
> > ERROR(0):
> > TPC<MakeIocReady+0xc/0x278 [mptbase]>
> > ERROR(0): M_SYND(0), E_SYND(0), Privileged
> > ERROR(0): Highest priority error (0000080000000000) "Bus error response from system bus"
> > ERROR(0): D-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000]
> > ERROR(0): D-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[0000000000000000]
> > ERROR(0): I-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000] u[0000000000000000] l[0000000000000000]
> > ERROR(0): I-cache INSN0[0000000000000000] INSN1[0000000000000000] INSN2[0000000000000000] INSN3[0000000000000000]
> > ERROR(0): I-cache INSN4[0000000000000000] INSN5[0000000000000000] INSN6[0000000000000000] INSN7[0000000000000000]
> > ERROR(0): E-cache idx[b08040] tag[000000001e008fa0]
> > ERROR(0): E-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[ffffffffffffffff]
> > Kernel panic - not syncing: Irrecoverable deferred error trap.
> > CPU: 0 UID: 0 PID: 46 Comm: (udev-worker) Not tainted 6.14.0-rc1-00001-ga5fb3ff63287 #18
> > Call Trace:
> > [<00000000004294b0>] panic+0xf0/0x370
> > [<0000000000435bc4>] cheetah_deferred_handler+0x2c8/0x2d8
> > [<0000000000405e88>] c_deferred+0x18/0x24
> > [<00000000100a05a4>] MakeIocReady+0xc/0x278 [mptbase]
>
> I assume both of these crashes are related to the
> CHIPREG_READ32(&ioc->chip->Doorbell) in mpt_GetIocState(), e.g., maybe
> that PCI read failed because an upstream bridge was not in D0 and
> therefore treated the read as an unsupported request.
>
> > [<00000000100a089c>] mpt_do_ioc_recovery+0x8c/0x1054 [mptbase]
> > [<000000001009f2d4>] mpt_attach+0x920/0xa68 [mptbase]
> > [<000000001012424c>] mptsas_probe+0x8/0x3e8 [mptsas]
> > [<0000000000788308>] local_pci_probe+0x24/0x70
> > [<0000000000788dac>] pci_device_probe+0x1c0/0x1d0
> > [<000000000082633c>] really_probe+0x13c/0x29c
> > [<0000000000826590>] __driver_probe_device+0xf4/0x104
> > [<0000000000826614>] driver_probe_device+0x24/0xa0
> > [<000000000082683c>] __driver_attach+0xe8/0x104
> > [<0000000000824da0>] bus_for_each_dev+0x58/0x84
> > [<0000000000825508>] bus_add_driver+0xdc/0x1f8
> > [<0000000000827110>] driver_register+0x70/0x120
> >
> > Niagara T1:
> > mptsas 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible
> > NON-RESUMABLE ERROR: Reporting on cpu 31
> > NON-RESUMABLE ERROR: TPC [0x0000000010184034] <MakeIocReady+0x10/0x298 [mptbase]>
> > NON-RESUMABLE ERROR: RAW [1f10000000000007:0000000e3179235c:0000000202000004:000000ea00300000
> > NON-RESUMABLE ERROR: 00000000001f0000:0000000000000000:0000000000000000:0000000000000000]
> > NON-RESUMABLE ERROR: handle [0x1f10000000000007] stick [0x0000000e3179235c]
> > NON-RESUMABLE ERROR: type [precise nonresumable]
> > NON-RESUMABLE ERROR: attrs [0x02000004] < PIO sp-faulted priv >
> > NON-RESUMABLE ERROR: raddr [0x000000ea00300000]
> > Kernel panic - not syncing: Non-resumable error.
> > CPU: 31 UID: 0 PID: 367 Comm: (udev-worker) Not tainted 6.16.12+3-sparc64-smp #1 NONE Debian 6.16.12-2+sparc64.1
> > Call Trace:
> > [<00000000004373c4>] dump_stack+0x8/0x18
> > [<0000000000429540>] panic+0xf4/0x398
> > [<000000000043afcc>] sun4v_nonresum_error+0x16c/0x240
> > [<0000000000406eb8>] sun4v_nonres_mondo+0xc8/0xd8
> > [<0000000010184034>] MakeIocReady+0x10/0x298 [mptbase]
> > [<00000000101844b4>] mpt_do_ioc_recovery+0x9c/0x1110 [mptbase]
> > [<00000000101836f8>] mpt_attach+0xb58/0xd20 [mptbase]
> > [<0000000010287f30>] mptsas_probe+0x10/0x440 [mptsas]
> > [<0000000000b3fab0>] local_pci_probe+0x30/0x80
> > [<0000000000b405d4>] pci_device_probe+0xb4/0x240
> > [<0000000000bfd348>] really_probe+0xc8/0x400
> > [<0000000000bfd70c>] __driver_probe_device+0x8c/0x160
> > [<0000000000bfd8c8>] driver_probe_device+0x28/0x100
> > [<0000000000bfdb7c>] __driver_attach+0xbc/0x1e0
> > [<0000000000bfacfc>] bus_for_each_dev+0x5c/0xc0
> > [<0000000000bfcafc>] driver_attach+0x1c/0x40
> > Press Stop-A (L1-A) from sun keyboard or send break
> > twice on console to return to the boot prom
> >
> > Fixes: a5fb3ff63287 ("PCI: Allow PCI bridges to go to D3Hot on all non-x86")
> > Signed-off-by: René Rebe <rene@...ctco.de>
> > ---
> > Tested on Sun Blade 1000, and shipping in all T2/Linux builds since 2025-08-01
> > ---
> > drivers/pci/pci.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index b14dd064006c..7619d2cfa66d 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -3033,9 +3033,9 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
> >
> > /*
> > * Out of caution, we only allow PCIe ports from 2015 or newer
> > - * into D3 on x86.
> > + * into D3 or other modern ISAs only.
> > */
> > - if (!IS_ENABLED(CONFIG_X86) || dmi_get_bios_year() >= 2015)
> > + if (IS_ENABLED(CONFIG_ARM64) || IS_ENABLED(CONFIG_PPC64) || IS_ENABLED(CONFIG_RISCV) || dmi_get_bios_year() >= 2015)
> > return true;
> > break;
> > }
> > --
> > 2.52.0
> >
> > --
> > René Rebe, ExactCODE GmbH, Berlin, Germany
> > https://exactco.de • https://t2linux.com • https://patreon.com/renerebe
--
René Rebe, ExactCODE GmbH, Berlin, Germany
https://exactco.de • https://t2linux.com • https://patreon.com/renerebe
Powered by blists - more mailing lists