lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250701154423.1917c3de@onyx.my.domain>
Date: Tue, 1 Jul 2025 15:44:23 +0300
From: Timo Teras <timo.teras@....fi>
To: "Lifshits, Vitaly" <vitaly.lifshits@...el.com>
Cc: En-Wei WU <en-wei.wu@...onical.com>, Tony Nguyen
 <anthony.l.nguyen@...el.com>, Przemek Kitszel
 <przemyslaw.kitszel@...el.com>, Jesse Brandeburg
 <jesse.brandeburg@...el.com>, <netdev@...r.kernel.org>,
 <intel-wired-lan@...ts.osuosl.org>, <regressions@...ts.linux.dev>,
 <stable@...r.kernel.org>, <sashal@...nel.org>
Subject: Re: [Intel-wired-lan] [REGRESSION] Packet loss after hot-plugging
 ethernet cable on HP Zbook (Arrow Lake)

On Tue, 1 Jul 2025 14:46:18 +0300
"Lifshits, Vitaly" <vitaly.lifshits@...el.com> wrote:

> On 7/1/2025 8:31 AM, En-Wei WU wrote:
> > Hi,
> > 
> > I'm seeing a regression on an HP ZBook using the e1000e driver
> > (chipset PCI ID: [8086:57a0]) -- the system can't get an IP address
> > after hot-plugging an Ethernet cable. In this case, the Ethernet
> > cable was unplugged at boot. The network interface eno1 was present
> > but stuck in the DHCP process. Using tcpdump, only TX packets were
> > visible and never got any RX -- indicating a possible packet loss or
> > link-layer issue.
> > 
> > This is on the vanilla Linux 6.16-rc4 (commit
> > 62f224733431dbd564c4fe800d4b67a0cf92ed10).
> > 
> > Bisect says it's this commit:
> > 
> > commit efaaf344bc2917cbfa5997633bc18a05d3aed27f
> > Author: Vitaly Lifshits <vitaly.lifshits@...el.com>
> > Date:   Thu Mar 13 16:05:56 2025 +0200
> > 
> >      e1000e: change k1 configuration on MTP and later platforms
> > 
> >      Starting from Meteor Lake, the Kumeran interface between the
> > integrated MAC and the I219 PHY works at a different frequency.
> > This causes sporadic MDI errors when accessing the PHY, and in rare
> > circumstances could lead to packet corruption.
> > 
> >      To overcome this, introduce minor changes to the Kumeran idle
> >      state (K1) parameters during device initialization. Hardware
> > reset reverts this configuration, therefore it needs to be applied
> > in a few places.
> > 
> >      Fixes: cc23f4f0b6b9 ("e1000e: Add support for Meteor Lake")
> >      Signed-off-by: Vitaly Lifshits <vitaly.lifshits@...el.com>
> >      Tested-by: Avigail Dahan <avigailx.dahan@...el.com>
> >      Signed-off-by: Tony Nguyen <anthony.l.nguyen@...el.com>
> > 
> >   drivers/net/ethernet/intel/e1000e/defines.h |  3 +++
> >   drivers/net/ethernet/intel/e1000e/ich8lan.c | 80
> > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
> >   drivers/net/ethernet/intel/e1000e/ich8lan.h |  4 ++++
> >   3 files changed, 82 insertions(+), 5 deletions(-)
> > 
> > Reverting this patch resolves the issue.
> > 
> > Based on the symptoms and the bisect result, this issue might be
> > similar to
> > https://lore.kernel.org/intel-wired-lan/20250626153544.1853d106@onyx.my.domain/
> > 
> > 
> > Affected machine is:
> > HP ZBook X G1i 16 inch Mobile Workstation PC, BIOS 01.02.03
> > 05/27/2025 (see end of message for dmesg from boot)
> > 
> > CPU model name:
> > Intel(R) Core(TM) Ultra 7 265H (Arrow Lake)
> > 
> > ethtool output:
> > driver: e1000e
> > version: 6.16.0-061600rc4-generic
> > firmware-version: 0.1-4
> > expansion-rom-version:
> > bus-info: 0000:00:1f.6
> > supports-statistics: yes
> > supports-test: yes
> > supports-eeprom-access: yes
> > supports-register-dump: yes
> > supports-priv-flags: yes
> > 
> > lspci output:
> > 0:1f.6 Ethernet controller [0200]: Intel Corporation Device
> > [8086:57a0] DeviceName: Onboard Ethernet
> >          Subsystem: Hewlett-Packard Company Device [103c:8e1d]
> >          Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > ParErr- Stepping- SERR- FastB2B- DisINTx+
> >          Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
> > >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> >          Latency: 0
> >          Interrupt: pin D routed to IRQ 162
> >          IOMMU group: 17
> >          Region 0: Memory at 92280000 (32-bit, non-prefetchable)
> > [size=128K] Capabilities: [c8] Power Management version 3
> >                  Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> > PME(D0+,D1-,D2-,D3hot+,D3cold+)
> >                  Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1
> > PME- Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> >                  Address: 00000000fee00798  Data: 0000
> >          Kernel driver in use: e1000e
> >          Kernel modules: e1000e
> > 
> > The relevant dmesg:
> > <<<cable disconnected>>>
> > 
> > [    0.927394] e1000e: Intel(R) PRO/1000 Network Driver
> > [    0.927398] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
> > [    0.927933] e1000e 0000:00:1f.6: enabling device (0000 -> 0002)
> > [    0.928249] e1000e 0000:00:1f.6: Interrupt Throttling Rate
> > (ints/sec) set to dynamic conservative mode
> > [    1.155716] e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized):
> > registered PHC clock
> > [    1.220694] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width
> > x1) 24:fb:e3:bf:28:c6
> > [    1.220721] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network
> > Connection [    1.220903] e1000e 0000:00:1f.6 eth0: MAC: 16, PHY:
> > 12, PBA No: FFFFFF-0FF [    1.222632] e1000e 0000:00:1f.6 eno1:
> > renamed from eth0
> > 
> > <<<cable connected>>>
> > 
> > [  153.932626] e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps
> > Half Duplex, Flow Control: None
> > [  153.934527] e1000e 0000:00:1f.6 eno1: NIC Link is Down
> > [  157.622238] e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps
> > Full Duplex, Flow Control: None
> > 
> > No error message seen after hot-plugging the Ethernet cable.
> >   
> 
> Thank your for the report.
> 
> We did not encounter this issue during our patch testing. However, we 
> will attempt to reproduce it in our lab.
> 
> One detail that caught my attention is that flow control is disabled
> in both scenarios. Could you please check whether the issue persists
> when flow control is enabled? This might require connecting to a link
> partner that supports flow control.

I wrote the other similar report from Dell Pro referenced earlier.
Additional testing on the Dell provided the following insight:

- A fast cable out/in will work. The cable should be disconnected
  for 10-15 seconds for the issue to trigger.

- Sometimes the first spurious link up is 1000 mbps/half and sometimes
  10 mbps/half.

- Using ethtool -r to renegotiate the link will make things work in
  the defunct state.

And yes, my issue seems to be exactly the same.

Thanks,
Timo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ