[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240725153446.GA841157@bhelgaas>
Date: Thu, 25 Jul 2024 10:34:46 -0500
From: Bjorn Helgaas <helgaas@...nel.org>
To: Heiner Kallweit <hkallweit1@...il.com>
Cc: George-Daniel Matei <danielgeorgem@...omium.org>,
Bjorn Helgaas <bhelgaas@...gle.com>, linux-kernel@...r.kernel.org,
linux-pci@...r.kernel.org, nic_swsd@...ltek.com,
netdev@...r.kernel.org, "Rafael J. Wysocki" <rjw@...ysocki.net>
Subject: Re: [PATCH] PCI: r8169: add suspend/resume aspm quirk
[+cc Rafael in case you have suspend debug help]
On Tue, Jul 16, 2024 at 09:25:40PM +0200, Heiner Kallweit wrote:
> On 16.07.2024 14:13, George-Daniel Matei wrote:
> > On Thu, Jul 11, 2024 at 7:45 AM Heiner Kallweit <hkallweit1@...il.com> wrote:
> >> On 10.07.2024 17:09, George-Daniel Matei wrote:
> >>>>> Added aspm suspend/resume hooks that run
> >>>>> before and after suspend and resume to change
> >>>>> the ASPM states of the PCI bus in order to allow
> >>>>> the system suspend while trying to prevent card hangs
> >>>>
> >>>> Why is this needed? Is there a r8169 defect we're working around?
> >>>> A BIOS defect? Is there a problem report you can reference here?
> >>>
> >>> We encountered this issue while upgrading from kernel v6.1 to v6.6.
> >>> The system would not suspend with 6.6. We tracked down the problem to
> >>> the NIC of the device, mainly that the following code was removed in
> >>> 6.6:
> >>>> else if (tp->mac_version >= RTL_GIGA_MAC_VER_46)
> >>>> rc = pci_disable_link_state(pdev, PCIE_LINK_STATE_L1_2);1
> >>
> >> With this (older) 6.1 version everything is ok?
> >> Would mean that L1.1 is active and the system suspends (STR?) properly
> >> also with L1.1 being active.
> >>
> > Yes, with 6.1 everything was ok. L1 was active and just the L1.1 substate
> > was enabled, L1.2 was disabled.
> >
> >> Under 6.6 per default L1 (incl. sub-states) is disabled.
> >> Then you manually enable L1 (incl. L1.1, but not L1.2?) via sysfs,
> >> and now the system hangs on suspend?
> >>
> > Yes, in 6.6 L1 (+substates) is disabled. Like Bjorn mentioned, I
> > think that is because of 90ca51e8c654 ("r8169:
> > fix ASPM-related issues on a number of systems with NIC version from
> > RTL8168h". With L1 disabled the system would not suspend so I enabled
> > back L1 along with just L1.1 substate through sysfs, just to test, and
> > saw that the system could
>
> It still sounds very weird that a system does not suspend to ram
> just because ASPM L1 is disabled for a single device.
> What if a PCI device is used which doesn't support ASPM?
>
> Which subsystem fails to suspend? Can you provide a log showing
> the suspend error?
Can we push on this a little bit? The fact that suspend fails is
super interesting to me. I'd like to know exactly how this fails and
whether it's in the kernel or in firmware. If we're violating some
assumption firmware is making, maybe there would be a more generic
fix.
How exactly do you suspend? Is there any debugging output you can
collect while doing that? Maybe [1] has hints. I see a bunch of
trace_suspend_resume() calls, and I think they're connected to [2],
which looks like it might generate console/dmesg output, but I don't
know how to enable that.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/power/basic-pm-debugging.rst?id=v6.10
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/trace/events/power.h?id=v6.10#n247
Powered by blists - more mailing lists