lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fc80b42a-e488-e8a2-9669-d33a5150ac9b@gmail.com>
Date:   Wed, 11 Jan 2023 21:17:29 +0100
From:   Heiner Kallweit <hkallweit1@...il.com>
To:     Alexander H Duyck <alexander.duyck@...il.com>,
        Jakub Kicinski <kuba@...nel.org>,
        David Miller <davem@...emloft.net>,
        Realtek linux nic maintainers <nic_swsd@...ltek.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Paolo Abeni <pabeni@...hat.com>
Cc:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Stephen Hemminger <stephen@...workplumber.org>
Subject: Re: [PATCH net-next resubmit v2] r8169: disable ASPM in case of tx
 timeout

On 11.01.2023 17:16, Alexander H Duyck wrote:
> On Tue, 2023-01-10 at 23:03 +0100, Heiner Kallweit wrote:
>> There are still single reports of systems where ASPM incompatibilities
>> cause tx timeouts. It's not clear whom to blame, so let's disable
>> ASPM in case of a tx timeout.
>>
>> v2:
>> - add one-time warning for informing the user
>>
>> Signed-off-by: Heiner Kallweit <hkallweit1@...il.com>
> 
>>>From past experience I have seen ASPM issues cause the device to
> disappear from the bus after failing to come out of L1. If that occurs
> this won't be able to recover after the timeout without resetting the
> bus itself. As such it may be necessary to disable the link states
> prior to using the device rather than waiting until after the error.
> That can be addressed in a follow-on patch if this doesn't resolve the
> issue.
> 

Interesting, reports about disappearing devices I haven't seen yet.
Symptoms I've seen differ, based on combination of more or less faulty
NIC chipset version, BIOS bugs, PCIe mainboard chipset.
Typically users experienced missed rx packets, tx timeouts or NIC lockups.
Disabling ASPM resulted in complaints of notebook users about reduced
system runtime on battery.
Meanwhile we found a good balance and reports about ASPM issues
became quite rare.
Just L1.2 still causes issues under load even with newer chipset versions,
therefore L1.2 is disabled per default.

> As for the code it looks fine to me.
> 
> Reviewed-by: Alexander Duyck <alexanderduyck@...com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ