[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ce802481-87c3-1bb8-2ee4-fc3cd73d889a@gmail.com>
Date: Fri, 14 Jul 2023 09:42:04 +0200
From: Heiner Kallweit <hkallweit1@...il.com>
To: Kurt Kanzenbach <kurt@...utronix.de>,
Tobias Klausmann <tobias.klausmann@...enet.de>,
Linux regressions mailing list <regressions@...ts.linux.dev>
Cc: Realtek linux nic maintainers <nic_swsd@...ltek.com>,
netdev@...r.kernel.org
Subject: Re: r8169: transmit transmit queue timed out - v6.4 cycle
On 14.07.2023 09:16, Kurt Kanzenbach wrote:
> On Thu Jul 13 2023, Heiner Kallweit wrote:
>> On 13.07.2023 09:01, Kurt Kanzenbach wrote:
>>> Hello Heiner,
>>>
>>> On Mon Jul 10 2023, Heiner Kallweit wrote:
>>>> On 05.07.2023 00:25, Tobias Klausmann wrote:
>>>>> Hi, top posting as well, as im on vacation, too. The system does not
>>>>> allow disabling ASPM, it is a very constrained notebook BIOS, thus
>>>>> the suggestion is nit feasible. All in all the sugesstion seems not
>>>>> favorable for me, as it is unknown how many systems are broken the
>>>>> same way. Having a workaround adviced as default seems oretty wrong
>>>>> to me.
>>>>>
>>>>
>>>> To get a better understanding of the affected system:
>>>> Could you please provide a full dmesg log and the lspci -vv output?
>>>
>>> I'm having the same problem as described by Tobias on a desktop
>>> machine. v6.3 works; v6.4 results in transmit queue timeouts
>>> occasionally. Reverting 2ab19de62d67 ("r8169: remove ASPM restrictions
>>> now that ASPM is disabled during NAPI poll") "solves" the issue.
>>>
>>> From dmesg:
>>>
>>> |~ % dmesg | grep -i ASPM
>>> |[ 0.152746] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
>>> |[ 0.905100] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3]
>>> |[ 0.906508] acpi PNP0A08:00: FADT indicates ASPM is unsupported, using BIOS configuration
>>> |[ 1.156585] pci 10000:e1:00.0: can't override BIOS ASPM; OS doesn't have ASPM control
>>> |[ 1.300059] r8169 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control
>>>
>>> In addition, with commit 2ab19de62d67 in kernel regular messages like
>>> this show up:
>>>
>>> |[ 7487.214593] pcieport 0000:00:1c.2: AER: Corrected error received: 0000:03:00.0
>>>
>>> I'm happy to test any patches or provide more info if needed.
>>>
>> Thanks for the report. It's interesting that the issue seems to occur only on systems
>> where BIOS doesn't allow OS to control ASPM. Maybe this results in the PCI subsystem
>> not properly initializing something.
>> Kurt/Klaus: Could you please boot with cmd line parameter pcie_aspm=force and see
>> whether this changes something?
>> This parameter lets Linux ignore the BIOS setting. You should see a message
>> "PCIe ASPM is forcibly enabled" in the dmesg log with this parameter.
>
> Seems like this does not help. There are still PCIe errors:
>
> |~ # dmesg | grep -i ASPM
> |[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.4.2-gentoo-kurtOS root=/dev/nvme0n1p3 ro kvm-intel.nested=1 vga=794 pcie_aspm=force
> |[ 0.044016] Kernel command line: BOOT_IMAGE=/vmlinuz-6.4.2-gentoo-kurtOS root=/dev/nvme0n1p3 ro kvm-intel.nested=1 vga=794 pcie_aspm=force
> |[ 0.044048] PCIe ASPM is forcibly enabled
> |[ 0.153011] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
> |[ 0.916341] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3]
> |[ 0.917719] acpi PNP0A08:00: FADT indicates ASPM is unsupported, using BIOS configuration
> |~ # dmesg | grep -i r8169
> |[ 1.337417] r8169 0000:03:00.0 eth0: RTL8168h/8111h, 6c:3c:8c:2c:bd:de, XID 541, IRQ 164
> |[ 1.337422] r8169 0000:03:00.0 eth0: jumbo features [frames: 9194 bytes, tx checksumming: ko]
> |[ 2.833876] r8169 0000:03:00.0 enp3s0: renamed from eth0
> |[ 20.886564] Generic FE-GE Realtek PHY r8169-0-300:00: attached PHY driver (mii_bus:phy_addr=r8169-0-300:00, irq=MAC)
> |[ 21.168373] r8169 0000:03:00.0 enp3s0: Link is Down
> |[ 24.006543] r8169 0000:03:00.0 enp3s0: Link is Up - 1Gbps/Full - flow control off
> |~ # dmesg | tail
> |[ 20.886564] Generic FE-GE Realtek PHY r8169-0-300:00: attached PHY driver (mii_bus:phy_addr=r8169-0-300:00, irq=MAC)
> |[ 21.168373] r8169 0000:03:00.0 enp3s0: Link is Down
> |[ 24.006543] r8169 0000:03:00.0 enp3s0: Link is Up - 1Gbps/Full - flow control off
> |[ 24.006568] IPv6: ADDRCONF(NETDEV_CHANGE): enp3s0: link becomes ready
> |[ 24.567803] ACPI Warning: \_SB.PC00.PEG1.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20230331/nsarguments-61)
> |[ 41.563396] pcieport 0000:00:1c.2: AER: Corrected error received: 0000:03:00.0
> |[ 47.065441] pcieport 0000:00:1c.2: AER: Multiple Corrected error received: 0000:03:00.0
> |[ 54.264285] pcieport 0000:00:1c.2: AER: Corrected error received: 0000:03:00.0
> |[ 54.424210] pcieport 0000:00:1c.2: AER: Corrected error received: 0000:03:00.0
> |[ 55.443439] pcieport 0000:00:1c.2: AER: Corrected error received: 0000:03:00.0
>
But no tx timeout (yet)?
Now that ASPM is forced, could you please disable ASPM L1.2?
-> /sys/class/net/enp3s0/device/link/l1_2_aspm
That's what we did until 6.3 for RTL8168h on systems where
OS can control ASPM.
> Thanks,
> Kurt
Powered by blists - more mailing lists