[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b0e2f6fb-2a1b-4452-bf49-739a30925fde@gmail.com>
Date: Mon, 25 Sep 2023 17:59:20 +0200
From: Heiner Kallweit <hkallweit1@...il.com>
To: Martin Kjær Jørgensen <me@...y.org>
Cc: Jakub Kicinski <kuba@...nel.org>, netdev@...r.kernel.org,
nic_swsd@...ltek.com
Subject: Re: r8169 link up but no traffic, and watchdog error
On 25.09.2023 17:41, Martin Kjær Jørgensen wrote:
>
> On Mon, Sep 25 2023, Heiner Kallweit <hkallweit1@...il.com> wrote:
>
>> On 25.09.2023 13:30, Martin Kjær Jørgensen wrote:
>>>
>>> On Mon, Sep 25 2023, Heiner Kallweit <hkallweit1@...il.com> wrote:
>>>
>>>
>>> There are no PCI extension cards.
>>>
>>
>> Your BIOS signature indicates that the system is a Thinkstation P350.
>> According to the Lenovo website it comes with one Intel-based network port.
>> However you have additional 4 Realtek-based network ports on the mainboard?
>>
>
> Yes. 2 PCIE cards with two Realtek ethernet controllers each.
>
>>>> And does the problem occur with all of your NICs?
>>>
>>> No, only the Realtek ones.
>>>
>>>> The exact NIC type might provide a hint, best provide a full dmesg log.
>>> [ 1512.295490] RSP: 0018:ffffbc0240193e88 EFLAGS: 00000246
>>> [ 1512.295492] RAX: ffff998935680000 RBX: ffffdc023faa8e00 RCX: 000000000000001f
>>> [ 1512.295493] RDX: 0000000000000002 RSI: ffffffffb544f718 RDI: ffffffffb543bc32
>>> [ 1512.295494] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000018
>>> [ 1512.295495] R10: ffff9989356b1dc4 R11: 00000000000058a8 R12: ffffffffb5d981a0
>>> [ 1512.295496] R13: 000001601bd198ef R14: 0000000000000003 R15: 0000000000000000
>>> [ 1512.295497] ? cpuidle_enter_state+0xbd/0x440
>>> [ 1512.295499] cpuidle_enter+0x2d/0x40
>>> [ 1512.295501] do_idle+0x217/0x270
>>> [ 1512.295503] cpu_startup_entry+0x1d/0x20
>>> [ 1512.295505] start_secondary+0x11a/0x140
>>> [ 1512.295508] secondary_startup_64_no_verify+0x17e/0x18b
>>> [ 1512.295510] </TASK>
>>> [ 1512.295511] ---[ end trace 0000000000000000 ]---
>>> [ 1512.295526] r8169 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control
>>> [ 1531.322039] r8169 0000:03:00.0 enp3s0: Link is Down
>>> [ 1534.138489] r8169 0000:03:00.0 enp3s0: Link is Up - 1Gbps/Full - flow control rx/tx
>>> [ 1538.177385] r8169 0000:03:00.0 enp3s0: Link is Down
>>> [ 1566.174660] r8169 0000:03:00.0 enp3s0: Link is Up - 1Gbps/Full - flow control rx/tx
>>> [ 1567.839082] r8169 0000:03:00.0 enp3s0: Link is Down
>>> [ 1570.621088] r8169 0000:03:00.0 enp3s0: Link is Up - 1Gbps/Full - flow control rx/tx
>>> [ 1576.294267] r8169 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control
>>
>> Regarding the following: Issue occurs after few seconds of link-loss.
>> Was this an intentional link-down event?
>
> Yes, I intentionally unplug the cable at the other end for the link to go down.
>
>> And is issue always related to link-up after a link-loss period?
>>
>
> Yes, it happends after cable is plugged in again, so after a link-loss period.
>
Good to know. I heard this before, under unknown circumstances (Realtek doesn't publish
errata information) the NIC (unclear whether MAC or PHY) seems to hang up after link-loss
in rare cases. Vendor driver does a full hw init on each link-up, maybe this is to work
around the issue we talk about here.
>
>>> [ 1488.643231] r8169 0000:03:00.0 enp3s0: Link is Down
>>> [ 1506.576941] r8169 0000:03:00.0 enp3s0: Link is Up - 1Gbps/Full - flow control rx/tx
>>> [ 1512.295215] ------------[ cut here ]------------
>>> [ 1512.295219] NETDEV WATCHDOG: enp3s0 (r8169): transmit queue 0 timed out 5368 ms
Powered by blists - more mailing lists