[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250701154423.1917c3de@onyx.my.domain>
Date: Tue, 1 Jul 2025 15:44:23 +0300
From: Timo Teras <timo.teras@....fi>
To: "Lifshits, Vitaly" <vitaly.lifshits@...el.com>
Cc: En-Wei WU <en-wei.wu@...onical.com>, Tony Nguyen
<anthony.l.nguyen@...el.com>, Przemek Kitszel
<przemyslaw.kitszel@...el.com>, Jesse Brandeburg
<jesse.brandeburg@...el.com>, <netdev@...r.kernel.org>,
<intel-wired-lan@...ts.osuosl.org>, <regressions@...ts.linux.dev>,
<stable@...r.kernel.org>, <sashal@...nel.org>
Subject: Re: [Intel-wired-lan] [REGRESSION] Packet loss after hot-plugging
ethernet cable on HP Zbook (Arrow Lake)
On Tue, 1 Jul 2025 14:46:18 +0300
"Lifshits, Vitaly" <vitaly.lifshits@...el.com> wrote:
> On 7/1/2025 8:31 AM, En-Wei WU wrote:
> > Hi,
> >
> > I'm seeing a regression on an HP ZBook using the e1000e driver
> > (chipset PCI ID: [8086:57a0]) -- the system can't get an IP address
> > after hot-plugging an Ethernet cable. In this case, the Ethernet
> > cable was unplugged at boot. The network interface eno1 was present
> > but stuck in the DHCP process. Using tcpdump, only TX packets were
> > visible and never got any RX -- indicating a possible packet loss or
> > link-layer issue.
> >
> > This is on the vanilla Linux 6.16-rc4 (commit
> > 62f224733431dbd564c4fe800d4b67a0cf92ed10).
> >
> > Bisect says it's this commit:
> >
> > commit efaaf344bc2917cbfa5997633bc18a05d3aed27f
> > Author: Vitaly Lifshits <vitaly.lifshits@...el.com>
> > Date: Thu Mar 13 16:05:56 2025 +0200
> >
> > e1000e: change k1 configuration on MTP and later platforms
> >
> > Starting from Meteor Lake, the Kumeran interface between the
> > integrated MAC and the I219 PHY works at a different frequency.
> > This causes sporadic MDI errors when accessing the PHY, and in rare
> > circumstances could lead to packet corruption.
> >
> > To overcome this, introduce minor changes to the Kumeran idle
> > state (K1) parameters during device initialization. Hardware
> > reset reverts this configuration, therefore it needs to be applied
> > in a few places.
> >
> > Fixes: cc23f4f0b6b9 ("e1000e: Add support for Meteor Lake")
> > Signed-off-by: Vitaly Lifshits <vitaly.lifshits@...el.com>
> > Tested-by: Avigail Dahan <avigailx.dahan@...el.com>
> > Signed-off-by: Tony Nguyen <anthony.l.nguyen@...el.com>
> >
> > drivers/net/ethernet/intel/e1000e/defines.h | 3 +++
> > drivers/net/ethernet/intel/e1000e/ich8lan.c | 80
> > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
> > drivers/net/ethernet/intel/e1000e/ich8lan.h | 4 ++++
> > 3 files changed, 82 insertions(+), 5 deletions(-)
> >
> > Reverting this patch resolves the issue.
> >
> > Based on the symptoms and the bisect result, this issue might be
> > similar to
> > https://lore.kernel.org/intel-wired-lan/20250626153544.1853d106@onyx.my.domain/
> >
> >
> > Affected machine is:
> > HP ZBook X G1i 16 inch Mobile Workstation PC, BIOS 01.02.03
> > 05/27/2025 (see end of message for dmesg from boot)
> >
> > CPU model name:
> > Intel(R) Core(TM) Ultra 7 265H (Arrow Lake)
> >
> > ethtool output:
> > driver: e1000e
> > version: 6.16.0-061600rc4-generic
> > firmware-version: 0.1-4
> > expansion-rom-version:
> > bus-info: 0000:00:1f.6
> > supports-statistics: yes
> > supports-test: yes
> > supports-eeprom-access: yes
> > supports-register-dump: yes
> > supports-priv-flags: yes
> >
> > lspci output:
> > 0:1f.6 Ethernet controller [0200]: Intel Corporation Device
> > [8086:57a0] DeviceName: Onboard Ethernet
> > Subsystem: Hewlett-Packard Company Device [103c:8e1d]
> > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > ParErr- Stepping- SERR- FastB2B- DisINTx+
> > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
> > >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> > Latency: 0
> > Interrupt: pin D routed to IRQ 162
> > IOMMU group: 17
> > Region 0: Memory at 92280000 (32-bit, non-prefetchable)
> > [size=128K] Capabilities: [c8] Power Management version 3
> > Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> > PME(D0+,D1-,D2-,D3hot+,D3cold+)
> > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1
> > PME- Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> > Address: 00000000fee00798 Data: 0000
> > Kernel driver in use: e1000e
> > Kernel modules: e1000e
> >
> > The relevant dmesg:
> > <<<cable disconnected>>>
> >
> > [ 0.927394] e1000e: Intel(R) PRO/1000 Network Driver
> > [ 0.927398] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
> > [ 0.927933] e1000e 0000:00:1f.6: enabling device (0000 -> 0002)
> > [ 0.928249] e1000e 0000:00:1f.6: Interrupt Throttling Rate
> > (ints/sec) set to dynamic conservative mode
> > [ 1.155716] e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized):
> > registered PHC clock
> > [ 1.220694] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width
> > x1) 24:fb:e3:bf:28:c6
> > [ 1.220721] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network
> > Connection [ 1.220903] e1000e 0000:00:1f.6 eth0: MAC: 16, PHY:
> > 12, PBA No: FFFFFF-0FF [ 1.222632] e1000e 0000:00:1f.6 eno1:
> > renamed from eth0
> >
> > <<<cable connected>>>
> >
> > [ 153.932626] e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps
> > Half Duplex, Flow Control: None
> > [ 153.934527] e1000e 0000:00:1f.6 eno1: NIC Link is Down
> > [ 157.622238] e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps
> > Full Duplex, Flow Control: None
> >
> > No error message seen after hot-plugging the Ethernet cable.
> >
>
> Thank your for the report.
>
> We did not encounter this issue during our patch testing. However, we
> will attempt to reproduce it in our lab.
>
> One detail that caught my attention is that flow control is disabled
> in both scenarios. Could you please check whether the issue persists
> when flow control is enabled? This might require connecting to a link
> partner that supports flow control.
I wrote the other similar report from Dell Pro referenced earlier.
Additional testing on the Dell provided the following insight:
- A fast cable out/in will work. The cable should be disconnected
for 10-15 seconds for the issue to trigger.
- Sometimes the first spurious link up is 1000 mbps/half and sometimes
10 mbps/half.
- Using ethtool -r to renegotiate the link will make things work in
the defunct state.
And yes, my issue seems to be exactly the same.
Thanks,
Timo
Powered by blists - more mailing lists