lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <87msx1bezt.fsf@mkjws.danelec-net.lan> Date: Mon, 02 Oct 2023 10:28:28 +0200 From: Martin Kjær Jørgensen <me@...y.org> To: Heiner Kallweit <hkallweit1@...il.com> Cc: Jakub Kicinski <kuba@...nel.org>, netdev@...r.kernel.org, nic_swsd@...ltek.com Subject: Re: r8169 link up but no traffic, and watchdog error On Mon, Sep 25 2023, Heiner Kallweit <hkallweit1@...il.com> wrote: > On 25.09.2023 17:41, Martin Kjær Jørgensen wrote: >> >> On Mon, Sep 25 2023, Heiner Kallweit <hkallweit1@...il.com> wrote: >> >>> On 25.09.2023 13:30, Martin Kjær Jørgensen wrote: >>>> >>>> On Mon, Sep 25 2023, Heiner Kallweit <hkallweit1@...il.com> wrote: >>>> >>>> >>>> There are no PCI extension cards. >>>> >>> >>> Your BIOS signature indicates that the system is a Thinkstation P350. >>> According to the Lenovo website it comes with one Intel-based network port. >>> However you have additional 4 Realtek-based network ports on the mainboard? >>> >> >> Yes. 2 PCIE cards with two Realtek ethernet controllers each. >> >>>>> And does the problem occur with all of your NICs? >>>> >>>> No, only the Realtek ones. >>>> >>>>> The exact NIC type might provide a hint, best provide a full dmesg log. >>>> [ 1512.295490] RSP: 0018:ffffbc0240193e88 EFLAGS: 00000246 >>>> [ 1512.295492] RAX: ffff998935680000 RBX: ffffdc023faa8e00 RCX: 000000000000001f >>>> [ 1512.295493] RDX: 0000000000000002 RSI: ffffffffb544f718 RDI: ffffffffb543bc32 >>>> [ 1512.295494] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000018 >>>> [ 1512.295495] R10: ffff9989356b1dc4 R11: 00000000000058a8 R12: ffffffffb5d981a0 >>>> [ 1512.295496] R13: 000001601bd198ef R14: 0000000000000003 R15: 0000000000000000 >>>> [ 1512.295497] ? cpuidle_enter_state+0xbd/0x440 >>>> [ 1512.295499] cpuidle_enter+0x2d/0x40 >>>> [ 1512.295501] do_idle+0x217/0x270 >>>> [ 1512.295503] cpu_startup_entry+0x1d/0x20 >>>> [ 1512.295505] start_secondary+0x11a/0x140 >>>> [ 1512.295508] secondary_startup_64_no_verify+0x17e/0x18b >>>> [ 1512.295510] </TASK> >>>> [ 1512.295511] ---[ end trace 0000000000000000 ]--- >>>> [ 1512.295526] r8169 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control >>>> [ 1531.322039] r8169 0000:03:00.0 enp3s0: Link is Down >>>> [ 1534.138489] r8169 0000:03:00.0 enp3s0: Link is Up - 1Gbps/Full - flow control rx/tx >>>> [ 1538.177385] r8169 0000:03:00.0 enp3s0: Link is Down >>>> [ 1566.174660] r8169 0000:03:00.0 enp3s0: Link is Up - 1Gbps/Full - flow control rx/tx >>>> [ 1567.839082] r8169 0000:03:00.0 enp3s0: Link is Down >>>> [ 1570.621088] r8169 0000:03:00.0 enp3s0: Link is Up - 1Gbps/Full - flow control rx/tx >>>> [ 1576.294267] r8169 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control >>> >>> Regarding the following: Issue occurs after few seconds of link-loss. >>> Was this an intentional link-down event? >> >> Yes, I intentionally unplug the cable at the other end for the link to go down. >> >>> And is issue always related to link-up after a link-loss period? >>> >> >> Yes, it happends after cable is plugged in again, so after a link-loss period. >> >> >>>> [ 1488.643231] r8169 0000:03:00.0 enp3s0: Link is Down >>>> [ 1506.576941] r8169 0000:03:00.0 enp3s0: Link is Up - 1Gbps/Full - flow control rx/tx >>>> [ 1512.295215] ------------[ cut here ]------------ >>>> [ 1512.295219] NETDEV WATCHDOG: enp3s0 (r8169): transmit queue 0 timed out 5368 ms > > Could you please test whether the following helps? > > diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c > index 6351a2dc1..a2fbfff5a 100644 > --- a/drivers/net/ethernet/realtek/r8169_main.c > +++ b/drivers/net/ethernet/realtek/r8169_main.c > @@ -4596,7 +4596,9 @@ static void r8169_phylink_handler(struct net_device *ndev) > if (netif_carrier_ok(ndev)) { > rtl_link_chg_patch(tp); > pm_request_resume(d); > + netif_wake_queue(tp->dev); > } else { > + rtl_reset_work(tp); > pm_runtime_idle(d); > } This patch seems to have a good influence. I have applied it to a vanilla 6.1.55 kernel, and been using it for a while. No kernel netdev watchdog errors, and interface responds to traffic almost instantly when it gets up again after link loss :-)
Powered by blists - more mailing lists