lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date: Sun, 26 Nov 2023 11:35:22 +0100
From: Heiner Kallweit <hkallweit1@...il.com>
To: Gregor Mlakar <turok256@...il.com>
Cc: "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: Linux kernel 6.6.2: Dragon RTL8125BG network card stopped working

On 26.11.2023 02:46, Gregor Mlakar wrote:
> Hello,
> 
> network card (Dragon RTL8125BG) on my motherboard (B650E Steel Legend WiFi) has stopped working on Arch Linux distribution with linux kernel 6.6.2 (both normal and zen kernel). If I revert back to kernel 6.6.1 it works fine. When I try to reboot, the PC gets stuck at line saying "watchdog did not stop!".
> 
> Motherboard:
> https://www.asrock.com/mb/AMD/B650E%20Steel%20Legend%20WiFi/index.asp#Specification <https://www.asrock.com/mb/AMD/B650E%20Steel%20Legend%20WiFi/index.asp#Specification>
> 
> dmesg (the last part with call trace keeps repeating every 122s):
> 
>     [    7.612105] r8169 0000:09:00.0 eth0: RTL8125B, xx:xx:xx:xx:xx:xx, XID 641, IRQ 116
>     [    7.612109] r8169 0000:09:00.0 eth0: jumbo features [frames: 9194 bytes, tx checksumming: ko]
>     [    7.659150] r8169 0000:09:00.0 enp9s0: renamed from eth0
>     [    7.708638] cryptd: max_cpu_qlen set to 1000
>     [    7.726830] Bluetooth: Core ver 2.22
>     [    7.726844] NET: Registered PF_BLUETOOTH protocol family
>     [    7.726846] Bluetooth: HCI device and connection manager initialized
>     [    7.726848] Bluetooth: HCI socket layer initialized
>     [    7.726850] Bluetooth: L2CAP socket layer initialized
>     [    7.726853] Bluetooth: SCO socket layer initialized
>     [    7.726939] mc: Linux media interface: v0.10
>     [    7.730916] AVX2 version of gcm_enc/dec engaged.
>     [    7.730959] AES CTR mode by8 optimization enabled
>     [    7.741154] usbcore: registered new interface driver btusb
>     [    7.752863] Bluetooth: hci0: HW/SW Version: 0x008a008a, Build Time: xxxxxxxxxxxxxx
>     [    7.829804] kvm_amd: TSC scaling supported
>     [    7.829806] kvm_amd: Nested Virtualization enabled
>     [    7.829807] kvm_amd: Nested Paging enabled
>     [    7.829813] kvm_amd: Virtual VMLOAD VMSAVE supported
>     [    7.829813] kvm_amd: Virtual GIF supported
>     [    7.829814] kvm_amd: Virtual NMI enabled
>     [    7.829814] kvm_amd: LBR virtualization supported
>     [    7.837383] MCE: In-kernel MCE decoding enabled.
>     [    7.925523] intel_rapl_common: Found RAPL domain package
>     [    7.925525] intel_rapl_common: Found RAPL domain core
>     [    8.164594] usbcore: registered new interface driver snd-usb-audio
>     [    8.274455] cfg80211: Loading compiled-in X.509 certificates for regulatory database
>     [    8.274596] Loaded X.509 cert 'sforshee: xxxxxxxxxxxxxxxxxx'
>     [    8.274694] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
>     [    8.274697] cfg80211: failed to load regulatory.db
>     [    8.310577] RTL8226B_RTL8221B 2.5Gbps PHY r8169-0-900:00: attached PHY driver (mii_bus:phy_addr=r8169-0-900:00, irq=MAC)
>     [   29.331343] Bluetooth: hci0: Device setup in 21084167 usecs
>     [   29.331347] Bluetooth: hci0: HCI Enhanced Setup Synchronous Connection command is advertised, but not supported.
>     [   29.604845] Bluetooth: hci0: AOSP extensions version v1.00
>     [   29.604847] Bluetooth: hci0: AOSP quality report is supported
>     [  198.084608] firefox[969]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
>     [  245.487028] INFO: task kworker/u66:4:261 blocked for more than 122 seconds.
>     [  245.487033]       Not tainted 6.6.2-arch1-1 #1
>     [  245.487034] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>     [  245.487035] task:kworker/u66:4   state:D stack:0     pid:261   ppid:2      flags:0x00004000
>     [  245.487039] Workqueue: events_power_efficient phy_state_machine [libphy]
>     [  245.487051] Call Trace:
>     [  245.487052]  <TASK>
>     [  245.487054]  __schedule+0x3e8/0x1410
>     [  245.487058]  ? sysvec_apic_timer_interrupt+0xe/0x90
>     [  245.487063]  schedule+0x5e/0xd0
>     [  245.487065]  schedule_preempt_disabled+0x15/0x30
>     [  245.487067]  __mutex_lock.constprop.0+0x39a/0x6a0
>     [  245.487071]  phy_start_aneg+0x1d/0x40 [libphy 93248cd1d88abf54f1b4cc64a990177f549a7710]
>     [  245.487081]  rtl_reset_work+0x1bd/0x3b0 [r8169 08653ab60f23923c3943d53f140b2b697e265b93]
>     [  245.487087]  r8169_phylink_handler+0x5b/0x240 [r8169 08653ab60f23923c3943d53f140b2b697e265b93]
>     [  245.487091]  phy_link_change+0x2e/0x60 [libphy 93248cd1d88abf54f1b4cc64a990177f549a7710]
>     [  245.487101]  phy_check_link_status+0xad/0xe0 [libphy 93248cd1d88abf54f1b4cc64a990177f549a7710]
>     [  245.487110]  phy_state_machine+0x80/0x2c0 [libphy 93248cd1d88abf54f1b4cc64a990177f549a7710]
>     [  245.487119]  process_one_work+0x171/0x340
>     [  245.487123]  worker_thread+0x27b/0x3a0
>     [  245.487125]  ? __pfx_worker_thread+0x10/0x10
>     [  245.487126]  kthread+0xe5/0x120
>     [  245.487129]  ? __pfx_kthread+0x10/0x10
>     [  245.487131]  ret_from_fork+0x31/0x50
>     [  245.487134]  ? __pfx_kthread+0x10/0x10
>     [  245.487135]  ret_from_fork_asm+0x1b/0x30
>     [  245.487141]  </TASK>
> 
> 
> lspci:
> 
>     09:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)
>     Subsystem: ASRock Incorporation RTL8125 2.5GbE Controller
>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>     Latency: 0, Cache Line Size: 64 bytes
>     Interrupt: pin A routed to IRQ 40
>     IOMMU group: 1
>     Region 0: I/O ports at e000 [size=256]
>     Region 2: Memory at fca00000 (64-bit, non-prefetchable) [size=64K]
>     Region 4: Memory at fca10000 (64-bit, non-prefetchable) [size=16K]
>     Capabilities: [40] Power Management version 3
>     Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
>     Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>     Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
>     Address: 0000000000000000  Data: 0000
>     Masking: 00000000  Pending: 00000000
>     Capabilities: [70] Express (v2) Endpoint, MSI 01
>     DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
>     ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 26W
>     DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
>     RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>     MaxPayload 256 bytes, MaxReadReq 4096 bytes
>     DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>     LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
>     ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>     LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
>     ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>     LnkSta: Speed 5GT/s, Width x1
>     TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>     DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
>     10BitTagComp- 10BitTagReq- OBFF Via message/WAKE#, ExtFmt- EETLPPrefix-
>     EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
>     FRS- TPHComp+ ExtTPHComp-
>     AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>     DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
>     AtomicOpsCtl: ReqEn-
>     LnkCap2: Supported Link Speeds: 2.5-5GT/s, Crosslink- Retimer- 2Retimers- DRS-
>     LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
>     Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
>     Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
>     LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
>     EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
>     Retimer- 2Retimers- CrosslinkRes: unsupported
>     Capabilities: [b0] MSI-X: Enable+ Count=32 Masked-
>     Vector table: BAR=4 offset=00000000
>     PBA: BAR=4 offset=00000800
>     Capabilities: [d0] Vital Product Data
>     pcilib: sysfs_read_vpd: read failed: No such device
>     Not readable
>     Capabilities: [100 v2] Advanced Error Reporting
>     UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>     UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>     UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>     CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
>     CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>     AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
>     MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>     HeaderLog: 00000000 00000000 00000000 00000000
>     Capabilities: [148 v1] Virtual Channel
>     Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
>     Arb: Fixed- WRR32- WRR64- WRR128-
>     Ctrl: ArbSelect=Fixed
>     Status: InProgress-
>     VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>     Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>     Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
>     Status: NegoPending- InProgress-
>     Capabilities: [168 v1] Device Serial Number xx-xx-xx-xx-xx-xx-xx-xx
>     Capabilities: [178 v1] Transaction Processing Hints
>     No steering table available
>     Capabilities: [204 v1] Latency Tolerance Reporting
>     Max snoop latency: 0ns
>     Max no snoop latency: 0ns
>     Capabilities: [20c v1] L1 PM Substates
>     L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
>      PortCommonModeRestoreTime=150us PortTPowerOnTime=150us
>     L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
>       T_CommonMode=0us LTR1.2_Threshold=306176ns
>     L1SubCtl2: T_PwrOn=150us
>     Capabilities: [21c v1] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
>     Kernel driver in use: r8169
>     Kernel modules: r8169
> 
> 
> Best regards,
> Gregor Mlakar


Thanks for the report. A very similar, or even same, issue has been reported already.
Are you using a jumbo mtu?
Could you please test whether the following fixes the issue for you?

---
 drivers/net/ethernet/realtek/r8169_main.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 0aed99a20..e32cc3279 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -575,6 +575,7 @@ struct rtl8169_tc_offsets {
 enum rtl_flag {
 	RTL_FLAG_TASK_ENABLED = 0,
 	RTL_FLAG_TASK_RESET_PENDING,
+	RTL_FLAG_TASK_RESET_NO_QUEUE_WAKE,
 	RTL_FLAG_TASK_TX_TIMEOUT,
 	RTL_FLAG_MAX
 };
@@ -4494,6 +4495,8 @@ static void rtl_task(struct work_struct *work)
 reset:
 		rtl_reset_work(tp);
 		netif_wake_queue(tp->dev);
+	} else if (test_and_clear_bit(RTL_FLAG_TASK_RESET_NO_QUEUE_WAKE, tp->wk.flags)) {
+		rtl_reset_work(tp);
 	}
 out_unlock:
 	rtnl_unlock();
@@ -4527,7 +4530,7 @@ static void r8169_phylink_handler(struct net_device *ndev)
 	} else {
 		/* In few cases rx is broken after link-down otherwise */
 		if (rtl_is_8125(tp))
-			rtl_reset_work(tp);
+			rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_NO_QUEUE_WAKE);
 		pm_runtime_idle(d);
 	}
 
@@ -4603,7 +4606,7 @@ static int rtl8169_close(struct net_device *dev)
 	rtl8169_down(tp);
 	rtl8169_rx_clear(tp);
 
-	cancel_work_sync(&tp->wk.work);
+	cancel_work(&tp->wk.work);
 
 	free_irq(tp->irq, tp);
 
-- 
2.43.0




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ