lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <919f6fb5-aef1-5377-4789-082a97574ec6@kupper.org>
Date:   Mon, 7 Feb 2022 19:04:23 +0100
From:   Thomas Kupper <thomas@...per.org>
To:     Shyam Sundar S K <Shyam-sundar.S-k@....com>,
        Tom Lendacky <thomas.lendacky@....com>
Cc:     netdev@...r.kernel.org
Subject: Re: AMD XGBE "phy irq request failed" kernel v5.17-rc2 on V1500B
 based board


Am 07.02.22 um 16:19 schrieb Shyam Sundar S K:
>
> On 2/7/2022 8:02 PM, Tom Lendacky wrote:
>> On 2/5/22 12:14, Thomas Kupper wrote:
>>> Am 05.02.22 um 16:51 schrieb Tom Lendacky:
>>>> On 2/5/22 04:06, Thomas Kupper wrote:
>>>> Reloading the module and specify the dyndbg option to get some
>>>> additional debug output.
>>>>
>>>> I'm adding Shyam to the thread, too, as I'm not familiar with the
>>>> configuration for this chip.
>>>>
>>> Right after boot:
>>>
>>> [    5.352977] amd-xgbe 0000:06:00.1 eth0: net device enabled
>>> [    5.354198] amd-xgbe 0000:06:00.2 eth1: net device enabled
>>> ...
>>> [    5.382185] amd-xgbe 0000:06:00.1 enp6s0f1: renamed from eth0
>>> [    5.426931] amd-xgbe 0000:06:00.2 enp6s0f2: renamed from eth1
>>> ...
>>> [    9.701637] amd-xgbe 0000:06:00.2 enp6s0f2: phy powered off
>>> [    9.701679] amd-xgbe 0000:06:00.2 enp6s0f2: CL73 AN disabled
>>> [    9.701715] amd-xgbe 0000:06:00.2 enp6s0f2: CL37 AN disabled
>>> [    9.738191] amd-xgbe 0000:06:00.2 enp6s0f2: starting PHY
>>> [    9.738219] amd-xgbe 0000:06:00.2 enp6s0f2: starting I2C
>>> ...
>>> [   10.742622] amd-xgbe 0000:06:00.2 enp6s0f2: firmware mailbox
>>> command did not complete
>>> [   10.742710] amd-xgbe 0000:06:00.2 enp6s0f2: firmware mailbox reset
>>> performed
>>> [   10.750813] amd-xgbe 0000:06:00.2 enp6s0f2: 10GbE SFI mode set
>>> [   10.768366] amd-xgbe 0000:06:00.2 enp6s0f2: 10GbE SFI mode set
>>> [   10.768371] amd-xgbe 0000:06:00.2 enp6s0f2: fixed PHY configuration
>>>
>>> Then after 'ifconfig enp6s0f2 up':
>>>
>>> [  189.184928] amd-xgbe 0000:06:00.2 enp6s0f2: phy powered off
>>> [  189.191828] amd-xgbe 0000:06:00.2 enp6s0f2: 10GbE SFI mode set
>>> [  189.191863] amd-xgbe 0000:06:00.2 enp6s0f2: CL73 AN disabled
>>> [  189.191894] amd-xgbe 0000:06:00.2 enp6s0f2: CL37 AN disabled
>>> [  189.196338] amd-xgbe 0000:06:00.2 enp6s0f2: starting PHY
>>> [  189.198792] amd-xgbe 0000:06:00.2 enp6s0f2: 10GbE SFI mode set
>>> [  189.212036] genirq: Flags mismatch irq 69. 00000000 (enp6s0f2-pcs)
>>> vs. 00000000 (enp6s0f2-pcs)
>>> [  189.221700] amd-xgbe 0000:06:00.2 enp6s0f2: phy irq request failed
>>> [  189.231051] amd-xgbe 0000:06:00.2 enp6s0f2: phy powered off
>>> [  189.231054] amd-xgbe 0000:06:00.2 enp6s0f2: stopping I2C
>>>
>> Please ensure that the ethtool msglvl is on for drv and probe. I was
>> expecting to see some additional debug messages that I don't see here.
>>
>> Also, if you can provide the lspci output for the device (using -nn and
>> -vv) that might be helpful as well.
>>
>> Shyam will be the best one to understand what is going on here.
> On some other platforms, we have seen similar kind of problems getting
> reported. There is a fix sent for validation.
>
> The root cause is that removal of xgbe driver is causing interrupt storm
> on the MP2 device (Sensor Fusion Hub).
>
> Shall submit a fix soon to upstream once the validation is done, you may
> give it a try with that and see if that helps.
>
> Thanks,
> Shyam
>
>> Thanks,
>> Tom

Sorry, forgot the 'lspci -nn -vv' output. Here it goes:

$ ethtool -i enp6s0f2
driver: amd-xgbe
version: 5.17.0-rc2-tk
firmware-version: 17.118.33
expansion-rom-version:
bus-info: 0000:06:00.2
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

$ lspci -nn -vv -s 0:6:0.2
06:00.2 Ethernet controller [0200]: Advanced Micro Devices, Inc. [AMD] 
Device [1022:1458]
         Subsystem: Advanced Micro Devices, Inc. [AMD] Device [1022:1458]
         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B- DisINTx+
         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
<TAbort- <MAbort- >SERR- <PERR- INTx-
         Latency: 0, Cache Line Size: 64 bytes
         Interrupt: pin C routed to IRQ 69
         Region 0: Memory at d0020000 (32-bit, non-prefetchable) [size=128K]
         Region 1: Memory at d0000000 (32-bit, non-prefetchable) [size=128K]
         Region 2: Memory at d0080000 (64-bit, non-prefetchable) [size=8K]
         Capabilities: [48] Vendor Specific Information: Len=08 <?>
         Capabilities: [50] Power Management version 3
                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                 Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
         Capabilities: [64] Express (v2) Endpoint, MSI 00
                 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s 
<4us, L1 unlimited
                         ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- 
SlotPowerLimit 0.000W
                 DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                         RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                         MaxPayload 128 bytes, MaxReadReq 512 bytes
                 DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ 
AuxPwr- TransPend-
                 LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, 
Exit Latency L0s <64ns, L1 <1us
                         ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                 LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                 LnkSta: Speed 8GT/s (ok), Width x16 (ok)
                         TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                 DevCap2: Completion Timeout: Not Supported, TimeoutDis- 
NROPrPrP- LTR-
                          10BitTagComp- 10BitTagReq- OBFF Not Supported, 
ExtFmt- EETLPPrefix-
                          EmergencyPowerReduction Not Supported, 
EmergencyPowerReductionInit-
                          FRS- TPHComp- ExtTPHComp-
                          AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- 
LTR- OBFF Disabled,
                          AtomicOpsCtl: ReqEn-
                 LnkSta2: Current De-emphasis Level: -3.5dB, 
EqualizationComplete- EqualizationPhase1-
                          EqualizationPhase2- EqualizationPhase3- 
LinkEqualizationRequest-
                          Retimer- 2Retimers- CrosslinkRes: unsupported
         Capabilities: [a0] MSI: Enable- Count=1/8 Maskable- 64bit+
                 Address: 0000000000000000  Data: 0000
         Capabilities: [c0] MSI-X: Enable+ Count=7 Masked-
                 Vector table: BAR=2 offset=00000000
                 PBA: BAR=2 offset=00001000
         Capabilities: [100 v1] Vendor Specific Information: ID=0001 
Rev=1 Len=010 <?>
         Capabilities: [150 v2] Advanced Error Reporting
                 UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- 
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- 
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                 UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- 
UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                 CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- 
AdvNonFatalErr+
                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- 
AdvNonFatalErr+
                 AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- 
ECRCChkCap- ECRCChkEn-
                         MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                 HeaderLog: 00000000 00000000 00000000 00000000
         Capabilities: [2a0 v1] Access Control Services
                 ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- 
UpstreamFwd- EgressCtrl- DirectTrans-
                 ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- 
UpstreamFwd- EgressCtrl- DirectTrans-
         Kernel driver in use: amd-xgbe
         Kernel modules: amd_xgbe

/Thomas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ