lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAMqyJG1bt_p9trGrtk-_xqfEF954TrSKoV6QokuadJK8ga80xA@mail.gmail.com>
Date: Fri, 4 Jul 2025 15:15:02 +0800
From: En-Wei WU <en-wei.wu@...onical.com>
To: Timo Teras <timo.teras@....fi>
Cc: "Lifshits, Vitaly" <vitaly.lifshits@...el.com>, Tony Nguyen <anthony.l.nguyen@...el.com>, 
	Przemek Kitszel <przemyslaw.kitszel@...el.com>, 
	Jesse Brandeburg <jesse.brandeburg@...el.com>, netdev@...r.kernel.org, 
	intel-wired-lan@...ts.osuosl.org, regressions@...ts.linux.dev, 
	stable@...r.kernel.org, sashal@...nel.org
Subject: Re: [Intel-wired-lan] [REGRESSION] Packet loss after hot-plugging
 ethernet cable on HP Zbook (Arrow Lake)

Thank you all for your quick response. Sorry for the delay.

I ran two independent tests:

1. The same experiment as Timo said: When the packet-loss problem
occurs (by hot-plugging the Ethernet cable), running the following
command fixes the issue
$ ethtool -r # trigger a re-negotiation

2. As Vitaly suggests: By enabling flow control, we no longer observe
any packet loss.
e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow
Control: Rx/Tx


>From the power management perspective, I can confirm that the Ethernet
controller stays D0 at all times. But I’m not sure if it’s the case
for PHY, as I’m not familiar with how to check the power state of a
PHY.

Thanks,
En-Wei.

On Tue, 1 Jul 2025 at 20:44, Timo Teras <timo.teras@....fi> wrote:
>
> On Tue, 1 Jul 2025 14:46:18 +0300
> "Lifshits, Vitaly" <vitaly.lifshits@...el.com> wrote:
>
> > On 7/1/2025 8:31 AM, En-Wei WU wrote:
> > > Hi,
> > >
> > > I'm seeing a regression on an HP ZBook using the e1000e driver
> > > (chipset PCI ID: [8086:57a0]) -- the system can't get an IP address
> > > after hot-plugging an Ethernet cable. In this case, the Ethernet
> > > cable was unplugged at boot. The network interface eno1 was present
> > > but stuck in the DHCP process. Using tcpdump, only TX packets were
> > > visible and never got any RX -- indicating a possible packet loss or
> > > link-layer issue.
> > >
> > > This is on the vanilla Linux 6.16-rc4 (commit
> > > 62f224733431dbd564c4fe800d4b67a0cf92ed10).
> > >
> > > Bisect says it's this commit:
> > >
> > > commit efaaf344bc2917cbfa5997633bc18a05d3aed27f
> > > Author: Vitaly Lifshits <vitaly.lifshits@...el.com>
> > > Date:   Thu Mar 13 16:05:56 2025 +0200
> > >
> > >      e1000e: change k1 configuration on MTP and later platforms
> > >
> > >      Starting from Meteor Lake, the Kumeran interface between the
> > > integrated MAC and the I219 PHY works at a different frequency.
> > > This causes sporadic MDI errors when accessing the PHY, and in rare
> > > circumstances could lead to packet corruption.
> > >
> > >      To overcome this, introduce minor changes to the Kumeran idle
> > >      state (K1) parameters during device initialization. Hardware
> > > reset reverts this configuration, therefore it needs to be applied
> > > in a few places.
> > >
> > >      Fixes: cc23f4f0b6b9 ("e1000e: Add support for Meteor Lake")
> > >      Signed-off-by: Vitaly Lifshits <vitaly.lifshits@...el.com>
> > >      Tested-by: Avigail Dahan <avigailx.dahan@...el.com>
> > >      Signed-off-by: Tony Nguyen <anthony.l.nguyen@...el.com>
> > >
> > >   drivers/net/ethernet/intel/e1000e/defines.h |  3 +++
> > >   drivers/net/ethernet/intel/e1000e/ich8lan.c | 80
> > > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
> > >   drivers/net/ethernet/intel/e1000e/ich8lan.h |  4 ++++
> > >   3 files changed, 82 insertions(+), 5 deletions(-)
> > >
> > > Reverting this patch resolves the issue.
> > >
> > > Based on the symptoms and the bisect result, this issue might be
> > > similar to
> > > https://lore.kernel.org/intel-wired-lan/20250626153544.1853d106@onyx.my.domain/
> > >
> > >
> > > Affected machine is:
> > > HP ZBook X G1i 16 inch Mobile Workstation PC, BIOS 01.02.03
> > > 05/27/2025 (see end of message for dmesg from boot)
> > >
> > > CPU model name:
> > > Intel(R) Core(TM) Ultra 7 265H (Arrow Lake)
> > >
> > > ethtool output:
> > > driver: e1000e
> > > version: 6.16.0-061600rc4-generic
> > > firmware-version: 0.1-4
> > > expansion-rom-version:
> > > bus-info: 0000:00:1f.6
> > > supports-statistics: yes
> > > supports-test: yes
> > > supports-eeprom-access: yes
> > > supports-register-dump: yes
> > > supports-priv-flags: yes
> > >
> > > lspci output:
> > > 0:1f.6 Ethernet controller [0200]: Intel Corporation Device
> > > [8086:57a0] DeviceName: Onboard Ethernet
> > >          Subsystem: Hewlett-Packard Company Device [103c:8e1d]
> > >          Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > > ParErr- Stepping- SERR- FastB2B- DisINTx+
> > >          Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
> > > >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> > >          Latency: 0
> > >          Interrupt: pin D routed to IRQ 162
> > >          IOMMU group: 17
> > >          Region 0: Memory at 92280000 (32-bit, non-prefetchable)
> > > [size=128K] Capabilities: [c8] Power Management version 3
> > >                  Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> > > PME(D0+,D1-,D2-,D3hot+,D3cold+)
> > >                  Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1
> > > PME- Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> > >                  Address: 00000000fee00798  Data: 0000
> > >          Kernel driver in use: e1000e
> > >          Kernel modules: e1000e
> > >
> > > The relevant dmesg:
> > > <<<cable disconnected>>>
> > >
> > > [    0.927394] e1000e: Intel(R) PRO/1000 Network Driver
> > > [    0.927398] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
> > > [    0.927933] e1000e 0000:00:1f.6: enabling device (0000 -> 0002)
> > > [    0.928249] e1000e 0000:00:1f.6: Interrupt Throttling Rate
> > > (ints/sec) set to dynamic conservative mode
> > > [    1.155716] e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized):
> > > registered PHC clock
> > > [    1.220694] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width
> > > x1) 24:fb:e3:bf:28:c6
> > > [    1.220721] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network
> > > Connection [    1.220903] e1000e 0000:00:1f.6 eth0: MAC: 16, PHY:
> > > 12, PBA No: FFFFFF-0FF [    1.222632] e1000e 0000:00:1f.6 eno1:
> > > renamed from eth0
> > >
> > > <<<cable connected>>>
> > >
> > > [  153.932626] e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps
> > > Half Duplex, Flow Control: None
> > > [  153.934527] e1000e 0000:00:1f.6 eno1: NIC Link is Down
> > > [  157.622238] e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps
> > > Full Duplex, Flow Control: None
> > >
> > > No error message seen after hot-plugging the Ethernet cable.
> > >
> >
> > Thank your for the report.
> >
> > We did not encounter this issue during our patch testing. However, we
> > will attempt to reproduce it in our lab.
> >
> > One detail that caught my attention is that flow control is disabled
> > in both scenarios. Could you please check whether the issue persists
> > when flow control is enabled? This might require connecting to a link
> > partner that supports flow control.
>
> I wrote the other similar report from Dell Pro referenced earlier.
> Additional testing on the Dell provided the following insight:
>
> - A fast cable out/in will work. The cable should be disconnected
>   for 10-15 seconds for the issue to trigger.
>
> - Sometimes the first spurious link up is 1000 mbps/half and sometimes
>   10 mbps/half.
>
> - Using ethtool -r to renegotiate the link will make things work in
>   the defunct state.
>
> And yes, my issue seems to be exactly the same.
>
> Thanks,
> Timo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ