[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210820210309.GA3357515@bjorn-Precision-5520>
Date: Fri, 20 Aug 2021 16:03:09 -0500
From: Bjorn Helgaas <helgaas@...nel.org>
To: Heiner Kallweit <hkallweit1@...il.com>
Cc: Kai-Heng Feng <kai.heng.feng@...onical.com>, nic_swsd@...ltek.com,
bhelgaas@...gle.com, davem@...emloft.net, kuba@...nel.org,
netdev@...r.kernel.org, linux-pci@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next v3 1/3] r8169: Implement dynamic ASPM mechanism
On Thu, Aug 19, 2021 at 05:45:22PM +0200, Heiner Kallweit wrote:
> On 19.08.2021 13:42, Bjorn Helgaas wrote:
> > On Thu, Aug 19, 2021 at 01:45:40PM +0800, Kai-Heng Feng wrote:
> >> r8169 NICs on some platforms have abysmal speed when ASPM is enabled.
> >> Same issue can be observed with older vendor drivers.
> >
> > On some platforms but not on others? Maybe the PCIe topology is a
> > factor? Do you have bug reports with data, e.g., "lspci -vv" output?
> >
> >> The issue is however solved by the latest vendor driver. There's a new
> >> mechanism, which disables r8169's internal ASPM when the NIC traffic has
> >> more than 10 packets, and vice versa.
> >
> > Presumably there's a time interval related to the 10 packets? For
> > example, do you want to disable ASPM if 10 packets are received (or
> > sent?) in a certain amount of time?
> >
> >> The possible reason for this is
> >> likely because the buffer on the chip is too small for its ASPM exit
> >> latency.
> >
> > Maybe this means the chip advertises incorrect exit latencies? If so,
> > maybe a quirk could override that?
> >
> >> Realtek confirmed that all their PCIe LAN NICs, r8106, r8168 and r8125
> >> use dynamic ASPM under Windows. So implement the same mechanism here to
> >> resolve the issue.
> >
> > What exactly is "dynamic ASPM"?
> >
> > I see Heiner's comment about this being intended only for a downstream
> > kernel. But why?
> >
> We've seen various more or less obvious symptoms caused by the broken
> ASPM support on Realtek network chips. Unfortunately Realtek releases
> neither datasheets nor errata information.
> Last time we attempted to re-enable ASPM numerous problem reports came
> in. These Realtek chips are used on basically every consumer mainboard.
> The proposed workaround has potential side effects: In case of a
> congestion in the chip it may take up to a second until ASPM gets
> disabled, what may affect performance, especially in case of alternating
> traffic patterns. Also we can't expect support from Realtek.
> Having said that my decision was that it's too risky to re-enable ASPM
> in mainline even with this workaround in place. Kai-Heng weights the
> power saving higher and wants to take the risk in his downstream kernel.
> If there are no problems downstream after few months, then this
> workaround may make it to mainline.
Since ASPM apparently works well on some platforms but not others, I'd
suspect some incorrect exit latencies.
Ideally we'd have some launchpad/bugzilla links, and a better
understanding of the problem, and maybe a quirk that makes this work
on all platforms without mucking up the driver with ASPM tweaks.
But I'm a little out of turn here because the only direct impact to
the PCI core is the pcie_aspm_supported() interface. It *looks* like
these patches don't actually touch the PCIe architected ASPM controls
in Link Control; all I see is mucking with Realtek-specific registers.
I think this is more work than it should be and likely to be not as
reliable as it should be. But I guess that's up to you guys.
Bjorn
Powered by blists - more mailing lists