lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <22546b48-27dd-4ac3-924d-aa2ed0386bb0@gmail.com>
Date: Sat, 23 Mar 2024 12:18:26 +0100
From: Heiner Kallweit <hkallweit1@...il.com>
To: Martin Kjær Jørgensen <me@...y.org>
Cc: Jakub Kicinski <kuba@...nel.org>, netdev@...r.kernel.org,
 nic_swsd@...ltek.com
Subject: Re: r8169 link up but no traffic, and watchdog error

On 22.03.2024 13:26, Heiner Kallweit wrote:
> On 22.03.2024 12:28, Martin Kjær Jørgensen wrote:
>>
>> On Mon, Sep 25 2023, Heiner Kallweit <hkallweit1@...il.com> wrote:
>>
>>> On 25.09.2023 17:41, Martin Kjær Jørgensen wrote:
>>>>
>>>> On Mon, Sep 25 2023, Heiner Kallweit <hkallweit1@...il.com> wrote:
>>>>
>>>>> On 25.09.2023 13:30, Martin Kjær Jørgensen wrote:
>>>>>>
>>>>>> On Mon, Sep 25 2023, Heiner Kallweit <hkallweit1@...il.com> wrote:
>>>>>>
>>>>>>
>>>>>> There are no PCI extension cards.
>>>>>>
>>>>>
>>>>> Your BIOS signature indicates that the system is a Thinkstation P350.
>>>>> According to the Lenovo website it comes with one Intel-based network port.
>>>>> However you have additional 4 Realtek-based network ports on the mainboard?
>>>>>
>>>>
>>>> Yes. 2 PCIE cards with two Realtek ethernet controllers each.
>>>>
>>>>>>> And does the problem occur with all of your NICs?
>>>>>>
>>>>>> No, only the Realtek ones.
>>>>>>
>>>>>>> The exact NIC type might provide a hint, best provide a full dmesg log.
>>>>>> [ 1512.295490] RSP: 0018:ffffbc0240193e88 EFLAGS: 00000246
>>>>>> [ 1512.295492] RAX: ffff998935680000 RBX: ffffdc023faa8e00 RCX: 000000000000001f
>>>>>> [ 1512.295493] RDX: 0000000000000002 RSI: ffffffffb544f718 RDI: ffffffffb543bc32
>>>>>> [ 1512.295494] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000018
>>>>>> [ 1512.295495] R10: ffff9989356b1dc4 R11: 00000000000058a8 R12: ffffffffb5d981a0
>>>>>> [ 1512.295496] R13: 000001601bd198ef R14: 0000000000000003 R15: 0000000000000000
>>>>>> [ 1512.295497]  ? cpuidle_enter_state+0xbd/0x440
>>>>>> [ 1512.295499]  cpuidle_enter+0x2d/0x40
>>>>>> [ 1512.295501]  do_idle+0x217/0x270
>>>>>> [ 1512.295503]  cpu_startup_entry+0x1d/0x20
>>>>>> [ 1512.295505]  start_secondary+0x11a/0x140
>>>>>> [ 1512.295508]  secondary_startup_64_no_verify+0x17e/0x18b
>>>>>> [ 1512.295510]  </TASK>
>>>>>> [ 1512.295511] ---[ end trace 0000000000000000 ]---
>>>>>> [ 1512.295526] r8169 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control
>>>>>> [ 1531.322039] r8169 0000:03:00.0 enp3s0: Link is Down
>>>>>> [ 1534.138489] r8169 0000:03:00.0 enp3s0: Link is Up - 1Gbps/Full - flow control rx/tx
>>>>>> [ 1538.177385] r8169 0000:03:00.0 enp3s0: Link is Down
>>>>>> [ 1566.174660] r8169 0000:03:00.0 enp3s0: Link is Up - 1Gbps/Full - flow control rx/tx
>>>>>> [ 1567.839082] r8169 0000:03:00.0 enp3s0: Link is Down
>>>>>> [ 1570.621088] r8169 0000:03:00.0 enp3s0: Link is Up - 1Gbps/Full - flow control rx/tx
>>>>>> [ 1576.294267] r8169 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control
>>>>>
>>>>> Regarding the following: Issue occurs after few seconds of link-loss.
>>>>> Was this an intentional link-down event?
>>>>
>>>> Yes, I intentionally unplug the cable at the other end for the link to go down.
>>>>
>>>>> And is issue always related to link-up after a link-loss period?
>>>>>
>>>>
>>>> Yes, it happends after cable is plugged in again, so after a link-loss period.
>>>>
>>> Good to know. I heard this before, under unknown circumstances (Realtek doesn't publish
>>> errata information) the NIC (unclear whether MAC or PHY) seems to hang up after link-loss
>>> in rare cases. Vendor driver does a full hw init on each link-up, maybe this is to work
>>> around the issue we talk about here.
>>>
>>>>
>>>>>> [ 1488.643231] r8169 0000:03:00.0 enp3s0: Link is Down
>>>>>> [ 1506.576941] r8169 0000:03:00.0 enp3s0: Link is Up - 1Gbps/Full - flow control rx/tx
>>>>>> [ 1512.295215] ------------[ cut here ]------------
>>>>>> [ 1512.295219] NETDEV WATCHDOG: enp3s0 (r8169): transmit queue 0 timed out 5368 ms
>>
>> I am seeing the behavior again with latest 6.6.21 kernel. Like last time, it
>> helps to manually shutdown and bring up the interface with 'ip link enp4s0
>> down/up'
>>
> Latest 6.1 kernel is ok?
> I'm not aware of related any change and other similar reports. Please bisect.
> 
In addition would be good to know whether it's the same with a current kenel: 6.8.1

>>
>> [243277.859725] r8169 0000:04:00.0 enp4s0: Link is Up - 1Gbps/Full - flow control off
>> [243283.400061] ------------[ cut here ]------------
>> [243283.400063] NETDEV WATCHDOG: enp4s0 (r8169): transmit queue 0 timed out 5537 ms
>> [243283.400070] WARNING: CPU: 3 PID: 2909804 at net/sched/sch_generic.c:525 dev_watchdog+0x225/0x230
>> [243283.400073] Modules linked in: tls nfnetlink_queue nfnetlink_log bluetooth ecdh_generic ecc xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink iptable_nat xt_addrtype iptable_filter ip_tables x_tables bpfilter br_netfilter overlay authenc echainiv geniv crypto_null esp4 xfrm_interface xfrm6_tunnel tunnel4 tunnel6 cmac xfrm_user xfrm_algo nls_utf8 cifs cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm ib_core cifs_md4 dns_resolver fscache netfs snd_seq_dummy snd_hrtimer snd_seq af_packet bridge stp llc cfg80211 rfkill nft_fib_ipv6 nft_nat nft_fib_ipv4 nft_fib nft_masq nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables libcrc32c nls_ascii nls_cp437 vfat fat intel_rapl_msr coretemp intel_rapl_common x86_pkg_temp_thermal intel_powerclamp ofpart cmdlinepart snd_usb_audio spi_nor iTCO_wdt intel_pmc_bxt kvm_intel mei_wdt snd_usbmidi_lib iTCO_vendor_support mei_pxp mei_hdcp ee1004 mtd watchdog snd_hwdep r8169 snd_ump kvm snd_rawmidi realtek uvcvideo snd_seq_device snd_pcm mdio_devres
>> [243283.400109]  videobuf2_vmalloc irqbypass of_mdio uvc videobuf2_memops fixed_phy snd_timer videobuf2_v4l2 think_lmi intel_cstate rtsx_usb_ms fwnode_mdio intel_uncore memstick e1000e rtc_cmos videodev mei_me ftdi_sio firmware_attributes_class joydev snd videobuf2_common i2c_i801 ptp spi_intel_pci libphy intel_wmi_thunderbolt wmi_bmof mei mc tiny_power_button pps_core spi_intel i2c_smbus usbserial soundcore mousedev thermal fan input_leds int3400_thermal acpi_thermal_rel intel_pmc_core acpi_tad acpi_pad evdev button mac_hid sch_fq_codel msr loop fuse efi_pstore nfnetlink efivarfs dmi_sysfs dm_crypt trusted asn1_encoder tee ext4 crc32c_generic crc16 mbcache jbd2 hid_logitech_hidpp hid_logitech_dj hid_jabra hid_generic rtsx_usb_sdmmc mmc_core led_class usbhid hid rtsx_usb crc32_pclmul crc32c_intel polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel i915 sha512_ssse3 sha256_ssse3 sha1_ssse3 xhci_pci ahci xhci_pci_renesas nvme libahci xhci_hcd nvme_core libata nvme_common i2c_algo_bit drm_buddy t10_pi ttm usbcore
>> [243283.400142]  crc64_rocksoft_generic aesni_intel drm_display_helper scsi_mod crc64_rocksoft crc_t10dif crct10dif_generic crct10dif_pclmul cec hwmon crypto_simd crc64 rc_core cryptd crct10dif_common usb_common scsi_common 8250 8250_base video serial_mctrl_gpio serial_base wmi backlight dm_mod dax
>> [243283.400150] CPU: 3 PID: 2909804 Comm: python3.11 Not tainted 6.6.21-gentoo-desktop-r1 #1
>> [243283.400152] Hardware name: LENOVO 30E30051UK/1052, BIOS S0AKT3EA 09/22/2023
>> [243283.400153] RIP: 0010:dev_watchdog+0x225/0x230
>> [243283.400154] Code: ff ff ff 48 89 ef c6 05 08 6c ee 00 01 e8 03 94 fa ff 45 89 f8 44 89 f1 48 89 ee 48 89 c2 48 c7 c7 98 c4 37 9a e8 8b 5c 79 ff <0f> 0b e9 27 ff ff ff 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90
>> [243283.400155] RSP: 0000:ffffc9000f3e3df8 EFLAGS: 00010292
>> [243283.400156] RAX: 0000000000000043 RBX: ffff88810b6f841c RCX: 0000000000000027
>> [243283.400157] RDX: ffff8890356e04c8 RSI: 0000000000000001 RDI: ffff8890356e04c0
>> [243283.400158] RBP: ffff88810b6f8000 R08: 0000000000000000 R09: ffffffff9a646ce0
>> [243283.400158] R10: ffffc9000f3e3cb0 R11: ffffffff9a726d28 R12: ffff88810b6f84c8
>> [243283.400159] R13: ffff88810b6e6800 R14: 0000000000000000 R15: 00000000000015a1
>> [243283.400159] FS:  00007fb111f89740(0000) GS:ffff8890356c0000(0000) knlGS:0000000000000000
>> [243283.400160] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [243283.400161] CR2: 00007fb1041fb030 CR3: 000000044b576002 CR4: 0000000000770ee0
>> [243283.400162] PKRU: 55555554
>> [243283.400162] Call Trace:
>> [243283.400163]  <TASK>
>> [243283.400164]  ? dev_watchdog+0x225/0x230
>> [243283.400165]  ? __warn+0x7c/0x130
>> [243283.400168]  ? dev_watchdog+0x225/0x230
>> [243283.400169]  ? report_bug+0x171/0x1a0
>> [243283.400172]  ? handle_bug+0x3a/0x70
>> [243283.400174]  ? exc_invalid_op+0x17/0x70
>> [243283.400175]  ? asm_exc_invalid_op+0x1a/0x20
>> [243283.400178]  ? dev_watchdog+0x225/0x230
>> [243283.400179]  ? dev_watchdog+0x225/0x230
>> [243283.400180]  ? __pfx_dev_watchdog+0x10/0x10
>> [243283.400181]  ? __pfx_dev_watchdog+0x10/0x10
>> [243283.400182]  call_timer_fn+0x1f/0x130
>> [243283.400184]  __run_timers.part.0+0x1bc/0x250
>> [243283.400186]  ? ktime_get+0x34/0xa0
>> [243283.400187]  run_timer_softirq+0x25/0x50
>> [243283.400188]  __do_softirq+0xbd/0x296
>> [243283.400190]  irq_exit_rcu+0x65/0x80
>> [243283.400191]  sysvec_apic_timer_interrupt+0x3e/0x90
>> [243283.400192]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
>> [243283.400194] RIP: 0033:0x7fb111bd3cd5
>> [243283.400195] Code: c8 48 8b 56 08 48 83 c6 08 48 85 d2 75 a7 48 85 c0 74 05 48 39 c3 75 09 49 8b 47 10 48 89 44 24 10 48 85 ed 0f 85 8e 00 00 00 <49> 8b 40 28 49 8b 4f 18 48 39 48 18 75 5d 49 8b 85 58 01 00 00 49
>> [243283.400196] RSP: 002b:00007ffd3916a9b0 EFLAGS: 00000246
>> [243283.400196] RAX: 00007fb111bd73c0 RBX: 000056339f603f78 RCX: 00007fb111f66aa8
>> [243283.400197] RDX: 0000000000000000 RSI: 00007fb111f04360 RDI: 00007fb111f66aa8
>> [243283.400197] RBP: 0000000000000000 R08: 00007fb10410c270 R09: 00007fb111f66aa0
>> [243283.400198] R10: 8d3a98eb5e44a685 R11: 1ffffffffffffffe R12: 00007ffd3916a9d4
>> [243283.400198] R13: 000056339f603eb0 R14: 00000000000000c8 R15: 00007fb111e0dcc0
>> [243283.400199]  </TASK>
>> [243283.400200] ---[ end trace 0000000000000000 ]---
>> [243283.400216] r8169 0000:04:00.0: can't disable ASPM; OS doesn't have ASPM control
>> [243295.067251] r8169 0000:04:00.0 enp4s0: Link is Down
>> [243297.620960] RTL8226 2.5Gbps PHY r8169-0-400:00: attached PHY driver (mii_bus:phy_addr=r8169-0-400:00, irq=MAC)
>> [243297.752106] r8169 0000:04:00.0 enp4s0: Link is Down
>> [243300.656125] r8169 0000:04:00.0 enp4s0: Link is Up - 1Gbps/Full - flow control off
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ