[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <92181e0e-3ca0-b19c-71f3-607fbfdc40a3@gmail.com>
Date: Fri, 24 Feb 2023 21:21:32 +0100
From: Heiner Kallweit <hkallweit1@...il.com>
To: fk1xdcio@...k.com, netdev@...r.kernel.org
Subject: Re: 4-port ASMedia/RealTek RTL8125 2.5Gbps NIC freezes whole system
On 24.02.2023 15:37, fk1xdcio@...k.com wrote:
> I hope this is the correct place to ask this(?). I'm not sure if my large attachments will come through; this is my first attempt.
>
> I'm having problems getting this 4-port 2.5Gbps NIC to be stable. I have tried on multiple different physical systems both with Xeon server and i7 workstation chipsets and it behaves the same way on everything. Testing with latest Arch Linux and kernels 6.1, 6.2, and 5.15. I'm using the kernel default r8169 driver.
>
> The higher the load on the NIC the more likely the whole system freezes hard. Everything freezes including my serial console, SysRq doesn't work, even the motherboard hardware reset switch doesn't work(!). I have to cut power to the system to reset it.
>
> Disabling IOMMU is more stable but doesn't fix the issue. ASPM doesn't work correctly on this card either despite the ASMedia 1812 supposedly supporting it (lots of corrected PCIe errors). Enabling or disabling ASPM makes no difference.
>
> "SSU-TECH" (generic/counterfeit?) 4-port 2.5Gbps PCIe x4 card
> ASMedia ASM1812 PCIe switch (driver: pcieport)
> RTL8125BG x4 (driver: r8169)
>
> I have tested with a normal network configuration consisting of multiple machines and also with lookback cables plugging the card ports in to itself.
>
> I have attached the scripts I use with the loopback cables (crashsys.sh), lspci, and dmesg.
>
> System freezes almost immediately with:
> 3,1266,4284361895,-;pcieport 0000:04:02.0: Unable to change power state from D3hot to D0, device inaccessible
> SUBSYSTEM=pci
> DEVICE=+pci:0000:04:02.0
>
> If I set permanent D0 mode (power/control=on) then the error is different when the system freezes:
> r8169 0000:0d:00.0 enp13s0: rtl_chipcmd_cond == 1 (loop: 100, delay: 100).
>
> Is there anything I can do to get more debugging information? The system locks so hard that I haven't gotten much so far. It's unclear if the problem is happening in the pcieport driver, r8169, or somewhere else.
The network driver shouldn't be able to freeze the system. You can test whether vendor driver r8125 makes a difference.
This should provide us with an idea whether the root cause is at a lower level.
Powered by blists - more mailing lists