lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3023fe74-29c7-4a41-b805-c6b00fb0b3cc@intel.com>
Date: Tue, 1 Jul 2025 14:46:18 +0300
From: "Lifshits, Vitaly" <vitaly.lifshits@...el.com>
To: En-Wei WU <en-wei.wu@...onical.com>, Tony Nguyen
	<anthony.l.nguyen@...el.com>, Przemek Kitszel <przemyslaw.kitszel@...el.com>,
	Jesse Brandeburg <jesse.brandeburg@...el.com>, <netdev@...r.kernel.org>,
	<intel-wired-lan@...ts.osuosl.org>
CC: <regressions@...ts.linux.dev>, <stable@...r.kernel.org>,
	<sashal@...nel.org>
Subject: Re: [REGRESSION] Packet loss after hot-plugging ethernet cable on HP
 Zbook (Arrow Lake)

On 7/1/2025 8:31 AM, En-Wei WU wrote:
> Hi,
> 
> I'm seeing a regression on an HP ZBook using the e1000e driver
> (chipset PCI ID: [8086:57a0]) -- the system can't get an IP address
> after hot-plugging an Ethernet cable. In this case, the Ethernet cable
> was unplugged at boot. The network interface eno1 was present but
> stuck in the DHCP process. Using tcpdump, only TX packets were visible
> and never got any RX -- indicating a possible packet loss or
> link-layer issue.
> 
> This is on the vanilla Linux 6.16-rc4 (commit
> 62f224733431dbd564c4fe800d4b67a0cf92ed10).
> 
> Bisect says it's this commit:
> 
> commit efaaf344bc2917cbfa5997633bc18a05d3aed27f
> Author: Vitaly Lifshits <vitaly.lifshits@...el.com>
> Date:   Thu Mar 13 16:05:56 2025 +0200
> 
>      e1000e: change k1 configuration on MTP and later platforms
> 
>      Starting from Meteor Lake, the Kumeran interface between the integrated
>      MAC and the I219 PHY works at a different frequency. This causes sporadic
>      MDI errors when accessing the PHY, and in rare circumstances could lead
>      to packet corruption.
> 
>      To overcome this, introduce minor changes to the Kumeran idle
>      state (K1) parameters during device initialization. Hardware reset
>      reverts this configuration, therefore it needs to be applied in a few
>      places.
> 
>      Fixes: cc23f4f0b6b9 ("e1000e: Add support for Meteor Lake")
>      Signed-off-by: Vitaly Lifshits <vitaly.lifshits@...el.com>
>      Tested-by: Avigail Dahan <avigailx.dahan@...el.com>
>      Signed-off-by: Tony Nguyen <anthony.l.nguyen@...el.com>
> 
>   drivers/net/ethernet/intel/e1000e/defines.h |  3 +++
>   drivers/net/ethernet/intel/e1000e/ich8lan.c | 80
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
>   drivers/net/ethernet/intel/e1000e/ich8lan.h |  4 ++++
>   3 files changed, 82 insertions(+), 5 deletions(-)
> 
> Reverting this patch resolves the issue.
> 
> Based on the symptoms and the bisect result, this issue might be
> similar to https://lore.kernel.org/intel-wired-lan/20250626153544.1853d106@onyx.my.domain/
> 
> 
> Affected machine is:
> HP ZBook X G1i 16 inch Mobile Workstation PC, BIOS 01.02.03 05/27/2025
> (see end of message for dmesg from boot)
> 
> CPU model name:
> Intel(R) Core(TM) Ultra 7 265H (Arrow Lake)
> 
> ethtool output:
> driver: e1000e
> version: 6.16.0-061600rc4-generic
> firmware-version: 0.1-4
> expansion-rom-version:
> bus-info: 0000:00:1f.6
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: yes
> 
> lspci output:
> 0:1f.6 Ethernet controller [0200]: Intel Corporation Device [8086:57a0]
>          DeviceName: Onboard Ethernet
>          Subsystem: Hewlett-Packard Company Device [103c:8e1d]
>          Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx+
>          Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>          Latency: 0
>          Interrupt: pin D routed to IRQ 162
>          IOMMU group: 17
>          Region 0: Memory at 92280000 (32-bit, non-prefetchable) [size=128K]
>          Capabilities: [c8] Power Management version 3
>                  Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
>                  Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
>          Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>                  Address: 00000000fee00798  Data: 0000
>          Kernel driver in use: e1000e
>          Kernel modules: e1000e
> 
> The relevant dmesg:
> <<<cable disconnected>>>
> 
> [    0.927394] e1000e: Intel(R) PRO/1000 Network Driver
> [    0.927398] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
> [    0.927933] e1000e 0000:00:1f.6: enabling device (0000 -> 0002)
> [    0.928249] e1000e 0000:00:1f.6: Interrupt Throttling Rate
> (ints/sec) set to dynamic conservative mode
> [    1.155716] e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized):
> registered PHC clock
> [    1.220694] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width
> x1) 24:fb:e3:bf:28:c6
> [    1.220721] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network Connection
> [    1.220903] e1000e 0000:00:1f.6 eth0: MAC: 16, PHY: 12, PBA No: FFFFFF-0FF
> [    1.222632] e1000e 0000:00:1f.6 eno1: renamed from eth0
> 
> <<<cable connected>>>
> 
> [  153.932626] e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps Half
> Duplex, Flow Control: None
> [  153.934527] e1000e 0000:00:1f.6 eno1: NIC Link is Down
> [  157.622238] e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps Full
> Duplex, Flow Control: None
> 
> No error message seen after hot-plugging the Ethernet cable.
> 

Thank your for the report.

We did not encounter this issue during our patch testing. However, we 
will attempt to reproduce it in our lab.

One detail that caught my attention is that flow control is disabled in 
both scenarios. Could you please check whether the issue persists when 
flow control is enabled? This might require connecting to a link partner 
that supports flow control.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ