lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <71aea1f6-749b-e379-70f4-653ac46e7f25@gmail.com>
Date:   Fri, 3 Sep 2021 22:00:47 +0200
From:   Heiner Kallweit <hkallweit1@...il.com>
To:     Kai-Heng Feng <kai.heng.feng@...onical.com>
Cc:     nic_swsd <nic_swsd@...ltek.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        David Miller <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Anthony Wong <anthony.wong@...onical.com>,
        Linux Netdev List <netdev@...r.kernel.org>,
        Linux PCI <linux-pci@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Bjorn Helgaas <helgaas@...nel.org>
Subject: Re: [RFC] [PATCH net-next v4] [PATCH 2/2] r8169: Implement dynamic
 ASPM mechanism

On 03.09.2021 17:56, Kai-Heng Feng wrote:
> On Tue, Aug 31, 2021 at 2:09 AM Bjorn Helgaas <helgaas@...nel.org> wrote:
>>
>> On Sat, Aug 28, 2021 at 01:14:52AM +0800, Kai-Heng Feng wrote:
>>> r8169 NICs on some platforms have abysmal speed when ASPM is enabled.
>>> Same issue can be observed with older vendor drivers.
>>>
>>> The issue is however solved by the latest vendor driver. There's a new
>>> mechanism, which disables r8169's internal ASPM when the NIC traffic has
>>> more than 10 packets, and vice versa. The possible reason for this is
>>> likely because the buffer on the chip is too small for its ASPM exit
>>> latency.
>>
>> This sounds like good speculation, but of course, it would be better
>> to have the supporting data.
>>
>> You say above that this problem affects r8169 on "some platforms."  I
>> infer that ASPM works fine on other platforms.  It would be extremely
>> interesting to have some data on both classes, e.g., "lspci -vv"
>> output for the entire system.
> 
> lspci data collected from working and non-working system can be found here:
> https://bugzilla.kernel.org/show_bug.cgi?id=214307
> 
>>
>> If r8169 ASPM works well on some systems, we *should* be able to make
>> it work well on *all* systems, because the device can't tell what
>> system it's in.  All the device can see are the latencies for entry
>> and exit for link states.
> 
> That's definitely better if we can make r8169 ASPM work for all platforms.
> 
>>
>> IIUC this patch makes the driver wake up every 1000ms.  If the NIC has
>> sent or received more than 10 packets in the last 1000ms, it disables
>> ASPM; otherwise it enables ASPM.
> 
> Yes, that's correct.
> 
>>
>> I asked these same questions earlier, but nothing changed, so I won't
>> raise them again if you don't think they're pertinent.  Some patch
>> splitting comments below.
> 
> Sorry about that. The lspci data is attached.
> 

Thanks for the additional details. I see that both systems have the L1
sub-states active. Do you also face the issue if L1 is enabled but
L1.2 and L1.2 are not? Setting the ASPM policy from powersupersave
to powersave should be sufficient to disable them.
I have a test system Asus PRIME H310I-PLUS, BIOS 2603 10/21/2019 with
the same RTL8168h chip version. With L1 active and sub-states inactive
everything is fine. With the sub-states activated I get few missed RX
errors when running iperf3.

One difference between your good and bad logs is the following.
(My test system shows the same LTR value like your bad system.)

Bad:
	Capabilities: [170 v1] Latency Tolerance Reporting
		Max snoop latency: 3145728ns
		Max no snoop latency: 3145728ns

Good:
	Capabilities: [170 v1] Latency Tolerance Reporting
		Max snoop latency: 1048576ns
		Max no snoop latency: 1048576ns

I have to admit that I'm not familiar with LTR and don't know whether
this difference could contribute to the differing behavior.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ